Skip to content

Category: Methods

Evaluating the Impact of Policies Using Regression Discontinuity Design, Part 1

Do students in smaller classes perform better than students in larger classes?

The answer might seem obvious. After all, students in smaller classes receive more attention from teachers, and so they should perform better.

We cannot know for sure, however, without looking at actual data on class size and student performance. In order to do so, we could collect data on student performance from various schools whose class sizes vary and look at whether students in smaller classes perform better.

But that wouldn’t be enough to determine whether smaller classes actually cause students to perform better. Correlation is not causation, and it could be the case that high-performing students are assigned to smaller classes composed of similar students. Thus, finding a correlation between class size and student performance would not be an indication that smaller classes cause students to perform better — only that school administrators want to put high-performing students in the same classes.

So how are we to know whether smaller classes actually cause students to perform better? One way could be to create classes of varying sizes (say, classes of 15, 30, 45, and 60 students) and randomly assign students to a given class size at the beginning of the year. Then, we could collect data on student performance on a standardized year-end exam and test whether average student performance is better in smaller than in bigger classes. Unfortunately, such a nice, clean experiment isn’t always feasible.

Taubes on the Weakness of Observational Studies, and a Methodological Rant

One caveat is observational studies, where you identify a large cohort of people – say 80,000 people like in the Nurse’s Health Study – and you ask them what they eat. You give them diet and food frequency questionnaires that are almost impossible to fill out and you follow them for 20 years. If you look and see who is healthier, you’ll find out that people who were mostly vegetarians tend to live longer and have less cancer and diabetes than people who get most of their fat and protein from animal products. The assumption by the researchers is that this is causal – that the only difference between mostly vegetarians and mostly meat-eaters is how many vegetables and how much meat they eat.

I’ve argued that this assumption is naïve almost beyond belief. In this case, vegetarians or mostly vegetarian people are more health conscious. That’s why they’ve chosen to eat like this. They’re better educated than the mostly meat-eaters, they’re in a higher socioeconomic bracket, they have better doctors, they have better medical advice, they engage in other health conscious activities like walking, they smoke less. There’s a whole slew of things that goes with vegetarianism and leaning towards a vegetarian diet. You can’t use these observational studies to imply cause and effect. To me, it’s one of the most extreme examples of bad science in the nutrition field.

That’s Gary Taubes in a FiveBooks interview over at The Browser. Taubes is better known for his book Good Calories, Bad Calories, in which he argues that a diet rich in carbohydrates is what makes us fat and, eventually, sick, and in which he argues in favor of an alternative diet rich in fats.

I really don’t know what kind of diet is best for weight loss, but I do want to stress Taubes’ point about the weakness of observational studies, even longitudinal ones. It is not uncommon for social science researchers to say “Well, we’ve been following these people over time, so we can use fixed effects to control for unobserved heterogeneity.” That is, they control for what remains constant for each unit of observation over time, which is made possible because they have more than one observation for each unit of observation. I have certainly been guilty of that.

Linear Regression and Causality for Neophytes

(This is an update on a post I had initially written at the start of the academic year. I figured it would come in handy, as many of us are busy writing our syllabi for the spring semester.)

If you teach in a policy school, a political science department, or in an economics department that grants Bachelor of Arts instead of Bachelor of Science degrees, chances are some of your students are not quite conversant in the quantitative methods used in the social sciences.

Many of the students who sign up for my fall seminar on the Microeconomics of International Development Policy or for my spring seminar on Law, Economics and Organization, for example, are incredibly bright, but they are not familiar with regression analysis, and so they don’t know how to read a regression table.

This makes it difficult to assign empirical papers in World Development for in-class discussion, let alone empirical papers in the Journal of Development Economics.

While I do not have the time to teach basic econometrics to students in those seminars, I have prepared two handouts for them to read in preparation for reading papers containing empirical results, which I thought I should make available to anyone who would rather not spend precious class time teaching the basics of quantitative methods. I have used both these handouts in my development seminar last fall, and my students said that they had learned quite a bit from reading them.

The first handout is a primer on linear regression, which shows analytically and graphically (and hopefully painlessly) what a regression does, and why it is such a useful tool in the social sciences. Perhaps more importantly, this handout also explains how to read a regression table.

The second handout primer on the identification of causal relationships in the social sciences, which discusses the distinction between correlation and causation and explains two ways in which social scientists go about making causal statements (i.e., randomized controlled trials and instrumental variables), with a few examples. I suggest supplementing this handout with a reading of Jim Manzi’s “What Social Science Does–and Doesn’t–Know” in City Journal as well as with Esther Duflo’s TED talk.

Of course, neither handout is a substitute for a course in econometrics or on research design, respectively, but these handouts are intended primarily for undergraduates or Masters students with little to no quantitative background.

(Update: This post by Tom Pepinsky also offers a very good introduction to the identification of causal relationships. HT to Chris Blattman for this great find.)