This paper describes two studies of class size effects on achievement (on a standardized exam) and dropout rates in calculus at the university level. The first, or preliminary, study was conducted in 1995 at a large land-grant institution in the southern United States, and involved one teacher in both a large and a small section of calculus and a total of 293 students. The second, main study was conducted at a large private university in the western United States in 1997 and 1998, and included four teachers, each in both large and small sections of calculus, and a total of 2,118 students. After accounting for other significant factors, we found that class size itself was not a significant factor in student achievement in calculus at the university level, nor were students more likely to drop from large sections than they were from small sections. However, individual teachers did vary in their effectiveness in different class sizes--some were more effective in large classes than in small ones, while others were less effective in large classes than in small ones. More importantly, the most effective teachers in large sections were more effective than almost all of the remaining teachers in small sections. Implications for university administrators are discussed.
Tyler J. Jarvis
March 17, 2000
Class size and its effect on students have been researched repeatedly in recent years, but most of these studies have focused on elementary and secondary schools rather than on university-level teaching. And very few studies have dealt with mathematics. Yet [Smith and Glass, 1980] gives evidence that class size effects vary with students' age. And others studies indicate that class size effect varies with subject matter--even within a discipline [McConnell and Sosin, 1984,Raimondo et al., 1990]. This indicates a need for a specific study on the effects of class size in university-level mathematics courses. This paper begins to treat this need by studying class size effects on achievement and dropout rates in calculus classes at two large universities.
In a meta-analysis of class-size studies, Glass and Smith [Glass and Smith, 1979,Smith and Glass, 1980] showed for elementary school children that the benefit of small classes is a logarithmic function of size with the marginal benefit of reducing class size being most significant for classes of size 20 and fewer. Moreover, the marginal benefit is very small when classes are larger than 25 or 30 students; that is, there is only little, if any, benefit to reducing class size if the small class has more than 25-30 students. Since most universities cannot afford to reduce class size in introductory mathematics courses to much below 30 students, Glass and Smith's results, if applicable to university-level instruction, would suggest that little or no benefit would be derived from reducing class sizes from relative large to small classes of approximately 30. This study confirms that hypothesis.
Studies of university-level economics and accounting instruction have repeatedly shown little or no significant effect on student achievement from reduced class size [Bellante, 1972,Hill, 1998,Kennedy and Siegfried, 1997]. But again, since class size effects vary with subject and discipline, it was important to study the effect of class size on student achievement in introductory-level mathematics. Our results agree with those obtained in other disciplines.
One concern with many available class-size studies is the fact that although one would expect the effect of class size to vary substantially with the teacher, few studies account for this teacher effect. Some studies include just one teacher (e.g., [Thompson, 1991]). These have the problem that they only show the significance of a size effect for the one teacher involved. But for many reasons, one teacher might be much less effective in a large class than another is. Other studies (e.g., [Williams et al., 1985]) include many teachers, some in large and some in small classes, but without accounting for the teacher in the study. However, one would expect, and our study confirms, that the effect on student achievement due to variation among teachers is much larger than any effect due to class size; thus any model that does not correct for the effect of different teachers will be unable to accurately identify a class-size effect. Indeed, the studies that fail to account for teacher effects generally have a very poor fit between their model and their data.
This study shows that class size in university calculus classes matters only in relation to teacher. In particular, averaging over all teachers in the study, class size had no significant effect on students' achievement in calculus. However, some teachers were substantially less effective in a large class than other teachers were, and one teacher was more effective in a large class than in a small one. More significant is the conclusion of a second analysis comparing the relative effectiveness of different teachers. We found that three teachers of large sections were more effective than all but four of 24 teachers of small sections. Consequently, a student would have been better served in the large section with the better teachers, than in all but four of the small sections.
We should note here that in the main study both large and small sections had associated review sections which met twice weekly. These review sections were all small--about 30 students each. In the preliminary study neither the large nor small sections had associated review sections. It is possible that, although the size of the main lecture is not a significant factor, the size of the review section might have a statistically significant effect on students' achievement. Indeed, Kennedy and Siegfried [Kennedy and Siegfried, 1997] cite work of Attiyeh and Lumsden from 1972, which shows some evidence that this is the case in introductory economics classes. Nevertheless, with or without review sessions, we found no effect on student achievement due to class (main lecture) size.
Of course, student achievement is not the only measure of teaching success. Students' attitudes may also be important, even when they are achieving more in the large sections. Studies of students' attitudes in large and small courses give conflicting results. For example, Wood [Wood et al., 1974] concludes that student ratings of instructors declined as enrollment increased to 240, but beyond that point they began to improve. But others [Marsh et al., 1979] found little correlation between class size and students' attitudes about the course. And Sweeney, et. al. [Sweeney et al., 1983] found that large economics courses were actually preferred over small ones.
Because student evaluations and attitude surveys are relatively unreliable measures and appear to vary widely from class to class, and from day to day within a class, we chose to use student dropout rates as a more objective measure of student attitude.
In an analysis of student dropout rates for large and small classes (with identical instructors) we found that class size was not a significant factor. Again (aside from students' preparation for the course and natural aptitude) the teacher appears to be the chief factor affecting dropout rates.
Since, both in terms of student achievement and in terms of students' likelihood to complete the course, the teacher was a significant factor, and class size was not, we conclude that even if a university can afford to offer calculus in small sections with first-rate teachers, it may still not be the best strategy. One of those teachers may well be more effective in a large class than most or all of the others in a small class; and therefore, students would be better served in the large section than in most or all of the small ones. At most universities, however, small classes are achieved only by hiring adjuncts, graduate students, and less-qualified faculty. In such a case, reducing class size may be doing students a disservice while simultaneously increasing instruction costs.
The object of the preliminary study, which was performed at a large land-grant institution in the southern United States (University A ), was to study the effect of class size (S) on the performance of Calculus I students, while adjusting for other important effects such as students' general mathematical ability, as measured by their ACT score (A), students' sex (G), students' ethnic group (E), teacher(T), and semester (M). Performance was measured principally by a standardized departmental final exam, but we were also interested to see if students' success (as measured by their grade) in Calculus II might be differently affected by class size. The following statistical models were used for analysis of the data.
Let yl(i,j,k) be the performance on a standardized departmental exam of the student l in the class k taught by the teacher j in a class of type i (large if i=1, small if i=0).
There are several factors that might affect yl(i,j,k). These are
There are also potential effects due to interaction between different effects, for example
Also, there is an adjustment for initial aptitude and preparation, as measured by the ACT score Al and scaled by a linear factor
Letting be the overall mean, the basic model is
Here is the random error term associated to each class, and the random error associated to each student.
The initial factors of sex and ethnic group were decided on because other studies have found them to be significant (see, for example [McConnell and Sosin, 1984]). It also seemed likely that the teacher would have a significant effect on achievement, and so teacher was also included as an initial factor.
This statistical model is similar to a standard split-plot model, but differs in that we have an adjustment for the covariate (ACT score). The use of the covariate is important to compensate for the fact that various factors beyond our control may affect the types of students enrolled in the different-sized classes. For example, the small classes fill up more quickly, leaving those who register late in the large classes (this may also partially account for the common perception that large classes are worse). The covariate helps to account for these differences.
The preceding model describes the effects of class size on student mastery of the material in Calculus I as measured by the departmental final exam. But it also seemed possible that class size in Calculus I might have a different effect on students' performance in the follow-up course, Calculus II, than it did on performance on the Calculus I final exam. The model we used to test this hypothesis was similar to the preceding model, but it used the restricted data set of only those students that continued on to Calculus II. Success in Calculus II was measured using students' final grade in Calculus II, after adjusting for the additional effect of different teachers for Calculus II and adjusting for the higher order interactions that might be associated with this teacher effect.
Although it would be interesting and important to know the effects of large class size on subsequent enrollment in mathematics, it is probably impossible to draw conclusions about such effects. The difficulty arises because student enrollment in subsequent mathematics courses, especially in Calculus II, is primarily dependent upon program requirements rather than student preference.
In this study, one instructor taught both a large (ca. 90 students) and a small (ca. 35 students) section of introductory calculus in the course of one academic year (1995-6). Eight other instructors taught small sections, and their data are used to standardize the final exam, in particular to account for differences between semesters. Their data also helped identify some changes that needed to be made for the main study at University B. This set contains data for 293 students.
It was impossible to measure the interaction of teacher and class size because only one teacher taught both large and small sections. However, for that one teacher, class size was not statistically significant. The interaction effects of class size with ethnic group and with sex were also not significant. See Tables 1 and 2 for more details.
|2|c| University A: Initial model.|
|R-Square||Pr > F|
|Source||Pr > F|
|A = ACTMATH||0.0001|
|T = TEACHER||0.0002|
|M = SEMESTER||0.0287|
|E = ETHN||0.3037|
|G = SEX||0.6897|
|S = SIZE||0.5010|
|ET = ETHN*TEACHER||0.0908|
|EM = ETHN*SEMESTER||0.3148|
|AE = ACTMATH*ETHN||0.9240|
|GT = SEX *TEACHER||0.0289|
|GM = SEX *SEMESTER||0.3812|
|AG = ACTMATH*SEX||0.9755|
|SE = ETHN*SIZE||0.2297|
|SG = SEX *SIZE||0.6945|
|2|c| University A: Final model.|
|R-Square||Pr > F|
|Source||Pr > F|
By far the most significant influences on student performance were initial preparation and aptitude, as measured by the ACT math score, and teacher (although only one teacher taught both large and small sections, eight others taught small sections). The fact that the teacher would have an effect is not surprising. But the magnitude of this effect was large compared to all others (except the ACT scores). This seemed to indicate that class size effects cannot be effectively measured without carefully adjusting for teacher. Moreover, the widely varying nature of results of other studies that did not adjust for teacher effect (or which only include one teacher) indicate the potential for a large interaction between size and teacher.
Another large effect was associated with the semester in which the course was taught. This is probably due to the fact that students who are well-prepared for Calculus I by their high school program are likely to take Calculus I in the Fall semester of their freshman year, whereas the remaining students are more likely to take Calculus I after first taking a semester of prerequisites.
The time of day the courses were offered varied through the regular school day (8 am to 4 pm), but despite some expectations to the contrary, time was not significant. It appears that the covariate accounted for essentially all variation associated to differences in time of day.
The model measuring the performance of students in the subsequent class, Calculus II, showed no additional information given by using students' grade in Calculus II to measure performance in Calculus I. In fact, after systematic removal of insignificant factors, the best model for student performance in Calculus II appeared to be one that depends only on teacher (of the Calculus II section) and the students' score on the Calculus I final. Because tracking students to Calculus II gave no information about student mastery of Calculus I beyond that given by the final, that aspect was dropped in the main study.
Finally, although the R2 value of .49 for this model was stronger than many that have been published on class size (ranging from R2=.39 [Glass and Smith, 1979] down to R2 =.01 [Williams et al., 1985]) it still seemed relatively weak, indicating a need for a better covariate and for inclusion of other significant factors.
The goal of the main study was to decide if size had a significant effect on student achievement in calculus, and to see if the teacher-size interaction was significant. This study was conducted at a large, private university in the western United States (University B ).
We originally expected the ACT and the pretest to be linearly dependent (at least after accounting for other factors such as students' age). However, a test of this hypothesis showed no significant correlation between the ACT math score and the pretest. This is probably because, as explained above, they actually test different things. Consequently, we included both in the model.
The primary data consist of pretest and final exam scores for 1,984 students in first-semester calculus and 134 students in second-semester calculus, collected over two years at University B. The data also include the various other potential factors described in Section 3.1 that might influence student achievement.
The final exam was written by a departmental committee to represent the core topics and skills that were considered most important for students to know. This was considered a good measure of learning since it represented the consensus of a large number of mathematics instructors about what constitutes successful calculus learning.
For the purposes of this study, small classes are classes with 20-35 students, while large classes contain 150-240 students. Both kinds of classes included review sessions (20-35 students) twice a week with a teaching assistant.
Students who had taken the course previously were not included. Students who dropped the class were also not included, since they did not take the final exam. There was some concern that weak students might be more likely to drop from a large section, but a separate logistic regression showed that for a given teacher, and after adjusting for pretest scores, students drop essentially randomly (see section 5).
The Calculus I data cover four semesters (Fall and Winter of 1997 and 1998), and 27 teachers. One teacher (Teacher Q ) taught both small and large sections in Winter 1997, a different teacher (Teacher L ) taught both small and large sections in Fall 1997, a third teacher (Teacher T ) taught both small and large sections in Winter 1998, and a fourth teacher (Teacher AA ) taught both small and large in Fall 1998. Some of these four taught large and small sections in other semesters (but not simultaneously). One other teacher (Teacher M ) taught only large sections, and the remaining teachers taught only small sections. These other teachers are included for purposes of standardizing the pretest and final, and for estimating the relative magnitude of the teacher-size effect.
This set contains data for 1,984 students. They are divided into six ethnic groups and 11 major colleges (and also the option of an undeclared major).
One teacher taught both small and large sections of Calculus II in Fall 1997. The total number of students in the data set (after removing students who dropped or who had taken the course before) is 134. The same demographic data were included here as those included in the Calculus I data set.
Initially the model considered the potential effects of many factors, as described above. Class size alone was not significant, nor were most of its higher-order interactions (see Table 3). After a standard, systematic elimination of insignificant variables, the model had as its main factors ACT, pretest, teacher, semester, major college, and the teacher-size interaction. Unlike in the case of University A, at University B the interactions between teacher and ethnicity and between teacher and sex are not significant (see Table 4).
|2|c| University B: Calculus I, initial model.|
|R-Square||Pr > F|
|Source||Pr > F|
Major college is probably significant because those who have aptitudes in mathematics are most likely to major in mathematically challenging fields, like engineering. But since interest and aptitude are not easily changed by the university, they are of relatively little interest to teachers and administrators.
The factor that is the most interesting is the (weakly significant) teacher-size interaction term. The fact that size itself is not significant and that this interaction term is weakly significant in the final model shows that the size effect, if there is any size effect at all, depends primarily upon characteristics of each individual teacher.
This final model had an R-squared value of .61--a substantial improvement over the preliminary (University A) study.
|2|c| University B: Calculus I, final model.|
|R-Square||Pr > F|
|Source||Pr > F|
Again, a variety of different potential factors were included and then insignificant terms were systematically eliminated (see Table 5). And again, size is not significant. Teacher-size interaction is not measurable here, since only one teacher is involved.
|2|c| University B: Calculus II, final model.|
|R-Square||Pr > F|
|Source||Pr > F|
Unlike first-semester calculus, age seems to be a factor, as well as a student's current course load. The appearance of age as a factor may be due to the fact that many students appear to delay taking their second semester of calculus; whereas many students take their first semester of calculus in their freshman year. On the other hand, many students with high school background in calculus will take second-semester calculus immediately in their freshman year. This gap between some but not all students' first and second semesters of calculus seems the most likely explanation for the role that students' age plays. The interaction between ACT math and age may be significant because of the fact that students who take second-semester calculus immediately in their freshman year have also just recently taken the ACT, whereas others who delay taking second-semester calculus may have taken the ACT several years earlier, thus its predictive value will likely vary somewhat with the student's age.
The variation from student course load is harder to explain, but a possible explanation would be that students in their first semester at the university will often simply take the recommended general education courses (including calculus I) and the recommended total hours, whereas more experienced students will vary from the norm, perhaps because they know better what they want and how to accomplish it. Consequently, course load in the second-semester calculus courses may reflect a student's personal choices and attitudes toward school work, rather than reflecting the advice of a the university or a counselor.
Major college plays no role in Calculus II, but it was significant in Calculus I. We conjecture the reason is that many majors require only first-semester calculus, which also fulfills some university general education requirements, whereas, most majors that require second-semester calculus are either in Engineering or Physical and Mathematical Sciences, which have relatively similar coefficients in the first-semester model (see table above). Moreover, few students take the course as an elective.
Class size was not significant, and even the teacher-size interaction effect was only weakly significant. No other interaction terms involving size were significant. This suggests that if there is any effect on students' achievement due to class size, it is a function of the individual teacher and her or his ability and attitude, rather than a function of the size alone.
The important question to ask about class size is whether it is in the students' and the university's best interest to increase or decrease class sizes. The insignificance of size as factor in achievement is, taken alone, not enough to answer that question. In particular, we must ask whether some teachers in large classes are more effective than others in small classes. Also, it is important to know if more students drop out of large classes, since their data could not be included in the study without final exam scores (failing students who did not drop were part of the main study). These two questions are the subject of the additional analyses described in Sections 4 and 5.
In order to decide whether a good teacher in a large section was more effective than other teachers in small sections, we solved for the (biased) coefficients in the previous Calculus I model.
|5|c| University B: Calculus I. Estimated coefficients|
|Teacher F||1.42747917||Teacher L large||-6.08437615|
|Teacher G||2.36437273||Teacher L small||0.00000000|
|Teacher H||8.75866094||Teacher Q large||-5.25948571|
|Teacher I||1.99462535||Teacher Q small||0.00000000|
|Teacher J||-1.34914128||Teacher T large||-3.17380207|
|Teacher K||1.36048855||Teacher T small||0.00000000|
|(large and small)||Teacher L||5.36603618||Teacher AA large||2.61853700|
|(large only)||Teacher M||3.67414925||Teacher AA small||0.00000000|
|Teacher O||7.33831105||2c|MAJOR COLLEGE|
|(large and small)||Teacher Q||10.34994654||biology||3.09426616|
|(large and small)||Teacher T||1.01064676||engineering||0.95833944|
|Teacher U||-0.05621085||family science||-8.03393284|
|Teacher Y||-0.33776313||phys/math science||2.44752224|
|Teacher Z||1.50086908||social science||-0.36290885|
|(large and small)||Teacher AA||0.00000000||undeclared||0.00000000|
The results are listed in Table 6. We found that the best teachers in large sections (three of four who taught large sections) were better for student achievement than all but four of the remaining 24 teachers who taught in small sections.
In particular, teacher Q had an effect of 10.3 and a teacher-size interaction effect of -5.2 for large classes, making a total effect of 5.1 to a student's final exam score in teacher Q's large section. Teacher M only taught large sections, and had an effect of 3.7, and teacher AA had a total effect of 2.62 in large classes. However, only four small-section teachers (B=14.1, H=8.8, L=5.4, and O=7.3) had a better effect in their small sections than these three teachers (Q, M and AA) of large sections. The remaining 20 teachers taught only small sections, and they had an effect that ranged from -5.8 up to 2.57.
Also note that the teacher-size coefficient for teacher W is positive--indicating that teacher W was actually more effective in the large class than the small one.
Finally, as one would perhaps expect, the variation due to class size (i.e. the variation among the teacher-size interaction terms) was small (6.08) compared to the variation due to teachers (19.97). This helps explain the differing conclusions (and poor fit between model and data) in many existing class size studies that do not account for variation due to teacher--any size effects are completely masked by teacher effects.
Using a simple logistic regression, we analyze the influence of class size on dropping in both first- and second-semester calculus. For each of the teachers in the University B Calculus I and Calculus II data sets who taught both large and small sections simultaneously, we let Di(j) denote the odds that student i in class j of type k (large or small) will drop the class. Let Pi denote student i's pretest score, which will be scaled by a linear factor ,and let Sk denote the effect due to being in a class of type k on the odds of dropping.
For each teacher we compare the two models
For two of the four Calculus I teachers (teachers Q and AA) the total number of students who dropped was so small, (3 of 183 and 8 of 238, respectively), that no conclusions about dropout rates could reasonably be drawn from their classes. For both of the remaining two teachers of Calculus I and the teacher of Calculus II, the Wald Chi-square indicated that size was not significant in the first model (2), and the value of c did not change much when size was deleted (model (3)). These results are summarized in Table 7.
Calculus I Teacher L
Calculus I Teacher T
c = 0.587
Pr > Wald
Pr > Wald
Pr > Wald
c = 0.658
c = 0.541
c = 0.652
Pr > Wald
Pr > Wald
Pr > Wald
We conclude that, as in the case of achievement, the influence of class size on the odds of students' dropping is small or nonexistent. If it is a factor, it probably varies with the teacher, but it appears to be insignificant for the three teachers involved in this study.
Our main conclusions are that class size itself has no significant effect on performance or dropout rate. There was a mild teacher-size effect on student achievement, but good teachers in large classes were more effective than most of the teachers in small classes. Of the factors the university can control, the teacher is by far the most important--much more so than class size. Students are best served in a large class with a better teacher than in a small class with most teachers.
These results apply only to the difference between classes of about 30 and classes of about 180. It is very possible (and even likely, based on evidence from elementary schools [Glass and Smith, 1979]) that a significant difference might exist between very small classes (ten or fewer students) and those which are small for a university mathematics class (20-30 students).
It is also important to remember that, although both the preliminary and the main studies found no significant effect due to size of the lecture section, all classes in the main study were supplemented by small review sessions (held twice weekly with a teaching assistant). While the preliminary study showed no significant effect due to class size even without these small review sections, the review sections appeared to be helpful to students in both large and small sections alike. Also, as mentioned in the introduction, it is possible that the size of the review session has a significant impact on student achievement, although the size of the main lecture does not. Further research in this direction is warranted.
We conclude that, both in terms of student achievement and also in terms of cost to the university, probably the best strategy for universities offering calculus classes would be to seek out and reward the best teachers of large sections for their work, rather than hiring average or mediocre instructors to reduce class size. Any resources that otherwise might have been used to reduce class size should probably be used instead to reduce the size of review sections and to reward these expert teachers of large sections. This strategy will simultaneously provide better instruction to calculus students and reduce instruction costs.
I am grateful to Govinda Weerakoody and David Whiting for help with the models, and to Ralph Brown, Bruce Collings, Tamara Cooper, Missie Elkins, Pedro Geoffrey, Donald Jarvis, Matt Johnson, and Krishnaswamy Venkata for helpful discussions. Finally, I am grateful to Heidi Jarvis for help with typesetting and proofreading.