Skip to content

Category: Uncategorized

‘Metrics Monday: 2SLS–Chronicle of a Death Foretold?

Last week I discussed how it is generally not possible to compare 2SLS estimates with OLS estimates because the two estimates apply to different groups of observations. Given that, it makes sense that I should write this week about a new working paper by Alwyn Young that has been making the rounds these past few months.

The paper is titled “Consistency without Inference: Instrumental Variables in Practical Application.” In it, Young uses the bootstrap to conduct a meta-analysis of 1,400 2SLS coefficients across 32 papers published in the AEA journals, and to essentially ask: “Is 2SLS all that it is cracked up to be?”

‘Metrics Monday: You Can’t Compare OLS with 2SLS

“Apples and Oranges” (1899) by Paul Cézanne.

Suppose you are interested in the effect of a treatment variable D on some outcome Y, and you have some controls X. You can thus estimate the following equation by ordinary least squares (OLS):

(1) Y = a + bX + cD + e.

As it so often is the case in the social sciences, the problem is that it is not true that E(D’e) = 0, i.e., D is endogenous to Y, and so estimating equation 1 by OLS means that the estimated coefficient–let’s call it c_{OLS}, for simplicity–is biased, meaning that it will not be equal to the true value c of the coefficient.

Suppose further that you have an instrumental variable (IV) Z for the (endogenous) treatment variable D. Assume Z is a valid IV: it explains enough of the variation in D (i.e., it is not weak) and, perhaps more importantly, it meets the exclusion restriction in that it only affects Y through D. You can thus estimate the following two equations by two-stage least squares (2SLS):

(2) D = f + gX + hZ + u, and

(3) Y = a’ + b’X + c’D + e.

Let’s re-label the coefficient c’ and call it c_{2SLS} for simplicity.

One thing I still read in manuscripts or hear in seminars way too often is people comparing c_{OLS} and c_{2SLS} as though they estimate the same thing.

It usually goes something like this: Someone presents OLS and 2SLS results, and then they (or someone in the audience) will compare the OLS and 2SLS coefficients. If the c_{OLS} > (<) c_{2SLS}, something like “Ignoring endogeneity concerns leads to overstating (understating) the relationship between D and Y.”

The problem is that you can’t compare OLS and 2SLS coefficients. At least not that way.

‘Metrics Monday: When (Not) to Cluster?

I had a few hours of free time this weekend, which I used to read a new working paper by Abadie et al. (2017) titled “When Should You Adjust Standard Errors for Clustering?” I’m a little bit late to the party–David McKenzie blogged about this almost a full month ago–but doing all of my teaching in one semester leaves me considerably behind on reading new papers, which then leaves me behind on blogging.

This is a very nice paper, and it has seriously changed my understanding of clustering. Abadie et al. start with two common misconceptions regarding clustering:

  1. Clustering matters only if the residuals and the regressors are both correlated within clusters, and
  2. If clustering makes a difference in your standard errors, you should cluster.