‘Metrics Monday: Regressions as Ecosystems
My teaching, service, and editorial responsibilities don’t leave me much time for research, much less for blogging these days, so I thought I would write up a quick observation about econometrics.
An old friend (not an applied econometrician) writes (via Facebook, in case you wonder about the telegraphic style of the query):
Econometrics question – have a M.Sc. student doing a study on conservation agriculture (CA) and is developing instruments for CA component use. Any suggestions on appropriate instruments?
My (less-than-helpful) answer:
What’s the outcome of interest Y? What’s the treatment variable/variable of interest D? What controls X are included? All of those work as a kind of ecosystem–without knowing what are the component parts of it, I can’t come up with a good idea for an instrument Z.
It’s the regression-as-ecosystem comment that I wanted to discuss today. Indeed, if you are interested in causal effects–and who isn’t, these days?–you have to see any regression of interest as an ecosystem where things live or die as a function of other things in the system.
This is especially the case if you don’t have an experiment or a quasi experiment, and you have to rely on an instrumental variable (IV) that is nonrandom. In the “cookbook econometrics” class I teach every other year to our doctoral students, I tell students that an IV lives and dies by the controls it is surrounded with, a point that is obvious once you start thinking about it, but which is made all too rarely. Indeed, here is something that I bet is taking place almost daily throughout the world in economics seminars:
- The presenter is interested in the causal relationship flowing from some treatment D to some outcome Y.
- The presenter recognizes that Y and D are jointly determined, and is thus using an instrument Z to get at it.
- A clever member of the audience says: “Yes, but have you considered [channel through which Z violates the exclusion restriction]?”
- The presenter says: “You’re right–in principle. Because I have [specific variable] in my set of controls X, the exclusion restriction is still met.”
- Clever member of the audience: “Ok, okay.”
Here is a real-life example: In my food prices and food riots paper, in which I was interested in the causal effect of food prices on the extent of social unrest worldwide, I used natural disasters worldwide as an IV for food prices. A few times in seminars, I was asked: “Yes, but you don’t control for the income of food consumers, and that’s an omitted variable.” Notwithstanding the fact that natural disasters are also orthogonal to income (and that it is not clear that you want to include an obviously endogenous control such as income in the regression I was estimating), my response was: “Yes, but I am regressing on the real–not nominal–price of food, which controls for the overall price level and thus, presumably, for wages, which themselves determine most people’s income levels.”
At any rate, I’m not sure I have much more of a point than “All the pieces matter,” to quote fictional detective Lester Freamon, and that when thinking about causality, you have to consider Y = f(D(Z,X), X) + e as a whole, and not just D(Z) or even Y = D(Z).
If anything, that is where the use of directed acyclic graphs (DAGs) comes in handy, and why I advocate that our students (i) read (some of) Judea Pearl’s Causality, and (ii) use DAGs when they start thinking about an empirical problem.