‘Metrics Monday: Why You Should Show a Regression of Y on Z
Happy New Year to everyone once again. Here is a quick ‘Metrics Monday post to start the year on the right foot.
Dave Giles had a neat post, sandwiched between Christmas and New Year, titled “Correlation Isn’t Necessarily Transitive.” If you are not familiar with the concept of transitivity,* what this means is that when Y is correlated with D, and D is correlated with Z, Y isn’t necessarily correlated with Z.
From my choice of labels for those variables, you have probably guess why this is matters for applied econometrics: It is perfectly possible that in a regression of Y on D where you are interested in the causal relationship flowing from D to Y, you have an otherwise valid instrument Z (i.e., a variable that is plausibly exogenous to outcome Y and which is also relevant in explaining D). Obviously, Z being relevant means that it is correlated with D. Assuming that D is also correlated with Y, the fact that correlation isn’t necessarily transitive means that the IV is not necessarily correlated with the outcome.
What this means in practice is that when doing IV, you should always show a reduced-form regression of Y on Z, whose purpose is to reassure your readers that your IV actually affects your outcome variable. This is a point that is made by Angrist and Pischke in Mostly Harmless Econometrics (and also most likely in Mastering ‘Metrics, but I can’t remember for sure).
You might be tempted to think that finding that the coefficient on Z is not statistically significant in a reduced-form regression of Y on Z is a good thing, because it proves that the IV is uncorrelated with the outcome of interest. Drawing such a conclusion would be misguided, however, first because non-rejection of the null of no statistical significance in this case is not definitive proof that the two are uncorrelated–null results can be about evidence of absence, but they can also be about absence of evidence–and second because what we want here is not for Z to be uncorrelated with Y. Though we often people say that a good IV is uncorrelated with Y, what we actually want is for the IV to be correlated with Y, but only through D.
So what should you do in cases where your reduced-from regression of Y on Z shows that the two do not appear to be correlated? Much like there are three ways to get to play at Carnegie Hall–practice, practice, and practice–you should probably strive to explain, explain, and explain why the IV is still valid in such cases. One way out of this might be to explain that failing to reject the null in this case is simply absence of evidence, but this is probably only convincing in cases where your sample is small.
* Most economics students are typically exposed to the concept of transitivity when they first learn about how preferences are (generally) transitive, meaning that if someone prefers consumption bundle X to consumption bundle Y, and they prefer consumption bundle Y to consumption bundle Z, they will necessarily prefer consumption bundle X to consumption bundle Z.