Do students in smaller classes perform better than students in larger classes?
The answer might seem obvious. After all, students in smaller classes receive more attention from teachers, and so they should perform better.
We cannot know for sure, however, without looking at actual data on class size and student performance. In order to do so, we could collect data on student performance from various schools whose class sizes vary and look at whether students in smaller classes perform better.
But that wouldn’t be enough to determine whether smaller classes actually cause students to perform better. Correlation is not causation, and it could be the case that high-performing students are assigned to smaller classes composed of similar students. Thus, finding a correlation between class size and student performance would not be an indication that smaller classes cause students to perform better — only that school administrators want to put high-performing students in the same classes.
So how are we to know whether smaller classes actually cause students to perform better? One way could be to create classes of varying sizes (say, classes of 15, 30, 45, and 60 students) and randomly assign students to a given class size at the beginning of the year. Then, we could collect data on student performance on a standardized year-end exam and test whether average student performance is better in smaller than in bigger classes. Unfortunately, such a nice, clean experiment isn’t always feasible.
Enter regression discontinuity designs (RDD), which Wikipedia defines as follows:
[A] regression discontinuity design is a design that elicits the causal effects of interventions by exploiting a given exogenous threshold determining assignment to treatment. By comparing observations lying closely on either side of the threshold, it is possible to estimate the local treatment effect in environments in which randomization was unfeasible.
In other words, suppose there were a rule that said “In any given grade, a class is not to exceed 40 students; once the number of students in a given grade exceeds 40, the class is split in two classes of roughly equal sizes.” So if there were 43 students in a given grade, this means there would be two classes: one with 22 students, and one with 21 students.
It turns out that in Israel, there is just such a rule, which is known as Maimonides’ Rule. That has allowed Angrist and Lavy (1999) to estimate the causal impact of smaller class sizes:
The twelfth century rabbinic scholar Maimonides proposed a maximum class size of 40. This same maximum induces a nonlinear and nonmonotonic relationship between grade enrollment and class size in Israeli public schools today. Maimonides’ rule of 40 is used here to construct instrumental variables estimates of effects of class size on test scores. The resulting identification strategy can be viewed as an application of Donald Campbell’s regression-discontinuity design to the class-size question. The estimates show that reducing class size induces a significant and substantial increase in test scores for fourth and fifth graders, although not for third graders.
The use of RDD is relatively straightforward in cases where there is a single, exogenously determined threshold, such as the maximum class size of 40 of Maimonides’ Rule. But in cases where there are multiple thresholds, it is difficult to nicely roll all those thresholds into one useable threshold. How to deal with those will be the topic of part 2 of this post, which will be published tomorrow.