I teach the second-year PhD research seminar in the Department, and it’s that time of year again when students have to submit a draft of their second-year paper. In case you are not familiar with a second-year paper, it is essentially the widespread practice in applied economics and economics department of having students who are done with their first-year courses to write an entire publishable paper from start to finish.
As such, teaching the second-year paper involves reading a lot of drafts. One of the drafts I read last week did something that always baffles when I see it. This might be a simple question whose answer is obvious, so bear with me, but the practice is so common that I thought I would ask readers whether it is me who is missing something. The practice is as follows (note that I am positing all this for observational data, not experimental data):
- You have a random sample in which you estimate a relationship of interest. Say, you are interested in whether a land title means that a plot is more productive, or whether having a college degree means that an individual makes more money.
- You are interested in heterogeneous treatment effects. Say, you are interested in whether the effect of the land title is different by plot size–say, for plots smaller than one hectare and for plots larger than one hectare. Or you are interested in whether having a college degree has different effects by race.
- To look at treatment heterogeneity, you split your sample up by group and re-estimate your relationship of interest. So you re-estimate your productivity equation once for small plots, and once for larger plots. Or you re-estimate your wage equation once for each race.
My problem is this: You start from a random sample. The minute you split that sample by group, however, your sub-samples are no longer random! Intuitively, it is unclear to me what the estimate you get for each sub-group means given the non-randomness of the sub-sample on which it is based.
To get a treatment heterogeneity, wouldn’t it be better to maintain your sample as is, but to interact your treatment (i.e., land title, college degree, etc.) with groups (i.e., small and large plots, race, etc.), going so far as to omitting the constant in order to be able to retain each group? If anything, you would get much better statistical power from preserving a larger sample. Anyone have any insight as to whether my intuition is right regarding the presumed bias that comes from looking at sub-groups, or is the overall effect simply the weighted effect by group? So, readers: Is the practice legitimate and I simply haven’t seen good applications of it?
Update: Many good comments below. My new colleague Jason Kerwin even came up with his own proof-by-Stata (see the link in his comment below). Comments like the ones on this post are one of the reasons why I love blogging.