Assessment of an Experiment in Teaching Geography Online

  Christine M. Rodrigue

Professor and Chair
Department of Geography
California State University
Long Beach, CA 90840-1101
1(626) 985-4895
rodrigue@csulb.edu
https://home.csulb.edu/~rodrigue/

Presentation to the
California Geographical Society
Lone Pine, CA, 3-5 May 2002

~~~~~~~~~~

Abstract

In Fall of 2000, I volunteered to teach the first completely online geography course at CSULB, an introductory physical geography section. In Spring 2001, I offered one section of the class online and another in the traditional lecture/lab format and decided to do a pre-test/post-test study of the two sections to assess the differences in outcomes. The pre-test consisted of 30 questions drawn from the four exams given in the class, which I sprang on the students in their first meetings. The results were predictably terrible, but, more importantly, there was no statistically significant difference between the two groups of students. At the end of the semester, I evaluated the overall class means and standard deviations. The online class scored less than the lecture/lab section, but the difference was not statistically meaningful.

~~~~~~~~~~

Introduction

Institutional politics over space, scheduling, and faculty resources have led to pressure on many departments in many institutions to explore online "'content' delivery." The Department of Geography at California State University, Long Beach, found itself responding to similar stresses and decided to experiment with an online format course in the Fall of 2000. Because I've used the Internet in my classes since Spring of 1994 and am proficient in HTML, I volunteered to be the one to conduct this experiment.

There are many plusses and minuses to online course delivery, however. The plusses include more efficient utilization of physical space on an impacted campus and convenience for students. Balancing these plusses are a raft of minuses.

The first of these is faculty workload. There is, unfortunately, a world of difference between putting some of your materials online and doing an entire class in the online format, a difference that led to a staggering amount of not fully anticipated work on my part. For example, putting my lectures online led to a great deal of work simply typing in my handwritten notes. Something I hadn't anticipated, however, was the obsessive concern I developed for verifying the accuracy and currency of every single factoid that went up there -- I felt compelled to recheck every lapse rate, dobson unit, and formula against the latest NASA, NOAA, or USGS data.

Another huge problem was the graphic material. Turning my chalkboard drawings into decent GIFs or JPEGs proved far too burdensome. I decided to trawl the Internet for pre-existing drawings or photographs to illustrate key concepts, which is time-consuming in its own right and then leads to copyright issues. You can't just grab images you like, save them on your own machine, and then upload them to your web account.

One way of dealing with this is to write each source for permission. Unfortunately, most of them don't own the copyright to the images they've grabbed from somewhere else, either, and it becomes impossible to track them down to their original sources. What I did was link my lectures to other folks' machines to illustrate my lectures with a note identifying the immediate source when the reader's cursor passes over the image. This way, students are viewing materials on other authors' or artists' own web accounts rather than on mine.

This, of course, leads to the need to monitor the lectures constantly, looking for dead links as the sources fiddle with their own web pages and delete or move images from the URL my page points to. Whenever possible, I prefer images that are in the public domain, posted on a government web site (and these tend to be a bit more stable, too).

So, among the work involved in simply transcribing my notes online, verifying each fact and concept, and then finding or making stable illustrations and trying to comply with copyright, I found myself spending anywhere from 25 to 45 hours to develop each lecture, and there were eventually 37 of them! Fall 2000 was just a painful blur. Luckily, maintaining the built course is easier than building it in the first place, but it still entails a surprising amount of work, fixing dead links and adding or improving lecture and lab materials, not to mention the enormous amount of e-mail with students.

E-mail somehow doesn't enter the minds of administrative enthusiasts for online learning. My experimental classes have been small, somewhere between 12 and 25 students. The e-mail generated by one to two dozen students is very time-consuming, easily several hours a week. Students also have the expectation that they'll get instant feedback from e-mail, and they become upset if the response comes in 2 hours or 2 days. A class with 50 or more students could become a sheer physical impossibility because of the e-mail traffic it would generate.

The foregoing have been mainly logistical and organizational problems. There is some debate about whether there are serious pædagogical shortcomings as well. Ebeling, for example, did a quasi-experimental pre-test/post-test study of statistics courses with and without computer-aided instruction and found that the students with access to online content and exercises actually did significantly worse than those taught with traditional lecture and pencil and calculator-based lab exercises (Ebeling 1998). Schutte came to exactly opposite conclusions using a fully experimental pre-test/post- test methodology, again in a statistics course: His online students scored 20 percent higher at the end of the class than their traditional peers (Schutte 1997). Schulman and Sims (1999), using a quasi-experimental methodology, found that there were significant differences between the students signing up for online courses and traditional courses, in five pairs of classes from a variety of disciplines, each pair taught by the same instructor. The students volunteering for the online versions did significantly better than those in the corresponding traditional classes on the pre-tests. At the end of the term, however, the two groups were statistically indistinguishable, which suggests that better students are drawn to online classes but that their improvement during the classes does not match that of those in the traditional classes.

~~~~~~~~~~

Hypotheses

With these contradictory results about the effectiveness of online education, I could not justify a directional hypothesis. I could only hypothesize that there would be no meaningful difference in performance between students in my online section and those in my regular lecture/discussion/lab section.

~~~~~~~~~~

Data and Methods

The research utilized a quasi-experimental pre-test and post-test design. That is, the students are not randomly assigned to sections: They pick the sections in which they enroll. I arranged to be scheduled for two sections of introductory physical geography in the spring of 2001, one completely online except for exams and the other my usual lecture/discussion/online lab mix. The regular class was offered on a Tuesday/Thursday schedule and the online class was shown in the schedule as a three hour Friday section.

To establish that the two different schedules did not draw from different populations of students, I conducted an anonymous pre-test on the first day of the regular class and the orientation meeting for the online class, after explaining to the students the purpose of this unexpected and voluntary trauma. All students present on the first day in both sections graciously participated in the pre-test. There were 31 student respondents in the regular class and 16 in the online class. Because of the anonymity of the pre-test, the research design could not compare individuals longitudinally as matched pre-and-post pairs.

The post-test consisted of a comparison of the final medians, means, and standard deviations of the two classes' final scores, rather than comparison of individual student improvement. The pre-test and the post-test groups in each class do not perfectly overlap. The post-test for the regular class included the grades for 33 students and 14 online students, meaning that I had picked up 2 students in the regular class and lost 2 in the online class, and it's anyone's guess which of the original 31 and 16 respondents were present at the end of the classes, given the liberal add and drop policies of CSULB.

In both the pre-test and the post-test, I evaluated the standard deviations through analysis of variance and then the means through a t-test. The particular form of the t-test for the difference of means depends on the F ratio calculated by ANOVA (basically, whether the separate variances can be pooled or not). To test the difference in medians, I used the Wilcoxon Rank Sum W test. For all tests, I used the common 95 percent confidence level to judge whether results were significant: Any probability less than 0.05, thus, was deemed significant. Because of the non-directionality of the hypothesis, I used two-tailed tests to evaluate the differences between the classes in both the pre-test and the post-test.

~~~~~~~~~~

Results

The pre-test consisted of 30 questions drawn fairly evenly from the four exams normally given in the class (Figure 1). The median score of the 31 respondents in the regular section was 15.17, while the mean was 15.81, with a standard deviation of 2.96. The 16 respondents in the online section yielded a median score of 15.25, a mean score of 15.00, and a standard deviation of 2.42.

ANOVA resulted in an F ratio of 1.49. The F ratio, with a prob-value of 0.21, was nowhere near significant at the 0.05 level, thus establishing that the variances in the two groups were effectively the same. I could, thus, pool the variances in the t-test used to establish whether the two classes were drawn from the same student population.

The resulting t-score was 0.94, which gives a prob-value of 0.35, drastically higher than the 0.05 standard. The Wilcoxon W gave a prob-value of 0.28, reinforcing the t-test of the difference of means. The two classes, thus, are statistically indistinguishable, so that any differences between them at the end of the semester can reasonably be attributed at least in part to the two different "content delivery" systems used.

At the end of the semester, the online class did slightly worse than the regular class, measured on a 100 point scale (Figure 2). The median score of the 33 students completing the regular lecture/discussion/lab course was 66.22, while the median of the 14 students in the online section was 61.95. The mean of the regular section was 65.75, with a standard deviation of 9.04; the mean for the online section was 61.35, with a standard deviation of 13.46. The question is whether this difference of roughly 4 out of 100 points on all three measures is significant.

To decide which form of the t-test to use, I performed ANOVA and got an F-ratio of 2.22, which yielded a prob-value of 0.03. This meant the variances were significantly different and should not be pooled in the t-test of the difference of means. The t-test calculated with separate variances produced a t-score of 1.22 and a prob-value of 0.28. The Wilcoxon Rank Sum W produced a prob-value of 0.19, again reinforcing the impression that there was no significant difference in the central tendencies of the two score distributions.

~~~~~~~~~~

Discussion

This quasi-experimental research design, then, established through the pre-test that the two student groups were clearly drawn from the same population, being virtually identical in their medians, means, and standard deviations. Concurrent online and traditional delivery of the same class content by the same instructor to two classes of the same kinds of student did not produce significantly different results. While the magnitude of the differences in mean and median was greater than in the pre-test, they did not prove statistically significant.

Interestingly enough, however, the standard deviations of the scores in the two course sections did diverge. The standard deviations at the outset were indistinguishable, according to the ANOVA performed at that time. By the end of the semester, however, there proved to be greater differentiation of students within the online class than within the regular section. It would seem that good students did better in the more autonomous and self-paced milieu of the online class, but the poorer students did more poorly in this environment than they might have in the face-to-face course. Class outcomes, thus, were more polarized in the online section.

~~~~~~~~~~

Conclusions

This study, with its findings of no significant difference in mean and median student performance between the online and traditional versions of the same class but a significant difference in the variability of student performance, simply adds to the confusing and contradictory results being reported in the assessment literature at this early stage in the adoption of online learning. Evaluations of the outcomes of online learning through many more such empirical studies are needed before clear patterns can emerge to guide our teaching.

In the meanwhile, I recommend that faculty move cautiously into this new frontier because of lingering questions about the ultimate impact of the online format on various kinds of student and because of the staggering amount of work needed to prepare materials for this format. Copyright remains a murky area, both in terms of materials used in your classes and in terms of your own intellectual property in your web work. Security of exam materials remains a grey area.

Furthermore, I worry about how this environment interacts with the different learning styles of students. In a classroom, students' learning is multi-modal -- they hear my lecture, they can figure out what's important from my body language and voice, they can analyze comments enough to write down the highlights during class, they can read their notes and remember what they saw and heard, they can make friends with other people in class and some of them can form study groups. I construct my visuals on a chalkboard step by step, so they can see how it's done and actively try to replicate the process hands-on, rather than just passively scanning already perfectly realized visuals on the web. The best I've been able to come up with is a long orientation session at the beginning, where I discuss the many benefits of online learning and warn students that their best learning modes may not be possible in a completely visual-dominant medium and that they should compensate accordingly.

~~~~~~~~~~

References

Ebeling, Jon S. 1998.
Final results of an experiment in the use of computer assisted instruction to teach social science statistics at CSU, Chico. Presented to the Western Political Science Association, Los Angeles.
Schulman, A. H. and Sims, R. L. 1999.
Learning in an online format versus an in-class format: An experimental study. T.H.E. Journal 26, 11(June): 54-56.
Schutte, Jerald G. 1997.
Virtual teaching in higher education: The new intellectual superhighway or just another traffic jam? This widely cited but unpublished study is available at: http://www.csun.edu/sociology/virexp.htm.

~~~~~~~~~~

Figure 1 -- Pre-Test

Characteristics of traditional and online sections of introductory
physical geography, beginning of Spring 2001 semester, CSULB,
C.M. Rodrigue
     
  traditional online
     
possible 30 30
minima 6 11
maxima 20 20
medians 15.17 15.25
means 15.81 15.00
st dev 2.96 2.42
variances 8.76 5.87
n 31 16
df 30 15
     
calculated F 1.49 ANOVA
prob-value 0.21 ANOVA
     
calculated t 0.94 difference of means t-test
prob-value 0.35 difference of means t-test
     
W 1.09 Wilcoxon Rank Sum W
prob-value 0.28 Wilcoxon Rank Sum W
     
No significant difference in variances, means, or medians

~~~~~~~~~~

Figure 2 -- Post-Test

Characteristics of traditional and online sections of introductory
physical geography, end of Spring 2001 semester, CSULB,
C.M. Rodrigue
     
  traditional online
     
possible 100.00 100.00
minima 39.61 32.20
maxima 80.57 82.50
medians 66.22 61.95
means 65.75 61.35
st dev 9.04 13.46
variances 81.70 181.07
n 33 14
df 32 13
     
calculated F 2.22 ANOVA
prob-value ** 0.03 ANOVA
     
calculated t 1.22 difference of means t-test
prob-value 0.28 difference of means t-test
     
W 1.33 Wilcoxon Rank Sum W
prob-value 0.19 Wilcoxon Rank Sum W
     
    No significant difference in means or medians
** Significant difference in variances

~~~~~~~~~~

Maintained by Dr. Christine M. Rodrigue
First placed on the web: 04/29/02
Last revised: 07/23/02

~~~~~~~~~~