Continuing the ‘Metrics Monday series, and continuing on last week’s theme of control variables discussed in the de Luca et al. working paper, I wanted to discuss endogenous control variables. Note that a lot of what follows is me thinking out loud, and I may well be mistaken about all of this. If so, I welcome comments exploring this topic.
As always, suppose you have observational data, and you are interested in estimating the causal effect of your variable interest D on your outcome of interest Y, and you also have access to a vector of control variables X. For the sake of argument, let’s assume there is only one control variable in the equation
(1) Y = a + bX + cD + e.
The parameter of interest is c. If you have observational data, then you know that in most cases E(D’e) is different from zero–that is, D is endogenous to Y in equation 1, and c does not capture the causal effect of D on Y.