Never Too LATE, Part 3: Observational Data

Last week I wrote two posts about the local average treatment effect (LATE). Click here for part 1, and here for part 2, in which I respectively discuss the difference between the ATE and the LATE, and the difficulty of comparing results across studies if different studies rely on different instrumental variables (IV).

This brings me to the topic of this post. After I posted part 2 last week, a reader — an economist who has been out of school for some time — emailed me with the following:

I can’t recall learning about this while in grad school. Surely it was mentioned and it’s just receded into a dark corner of my memory? It seems like a pretty important concept to consider, although I guess it’s a bigger concern for experimental economics?

The emphasis is mine. An equally emphatic answer would be: “No, it’s actually a huge problem with nonexperimental data.”

Wages, Education, and the Vietnam War

To see this, consider the classic IV example — Angrist’s (1990) study of the impact of education on wages. Because wages and education are jointly determined — if anything, there is reverse causality because people choose to go to spend time in school based on the expectation of a higher wage — Angrist used a respondent’s Vietnam draft lottery number as an IV for the respondent’s education.

How does this work? Recall that a valid IV has two properties. First, it must explain a significant amount of variation in the endogenous variable (here, education). Second, it must be exogenous to the dependent variable (here, wage). Put another way, this second property says that the IV affects the dependent variable only through the endogenous variable. Here, a respondent’s Vietnam draft lottery number should affect his wage only through how much education he chose to acquire.

Okay, but how does a respondent’s draft lottery number affect his education to begin with? The lower your lottery number, the likelier you were to get drafted to serve in Vietnam. When you served in Vietnam and lived to tell about it, you were eligible to a specific amount of time in college paid for by the US government under the provisions of the GI Bill.

But then, using a respondent’s Vietnam draft lottery number as an IV for his education ought to yield the ATE of education on wages, right?

Unfortunately, no. Recall the definition of the LATE I had in the first post in this series, in which I had a fictitious example of a randomized trial in which people are randomly assigned to a treatment group in which they are instructed to eat breakfast or a control group in which they are instructed to skip breakfast:

The LATE is the ATE for those subjects who were (i) induced to eat breakfast because they are assigned to the treatment group, or (ii) induced to skip breakfast because they are assigned to the control group. In other words, the LATE is the ATE for the sub-population of compliers.

In the Angrist (1990) example, what you would obtain is the LATE: the ATE for compliers, i.e., those respondents who were induced to acquire education by the fact that they served in Vietnam.

In other words, you wouldn’t know anything about those respondents who did not acquire education even though they served in Vietnam, or about those respondents who acquired education even though they did not serve in Vietnam, i.e., the noncompliers.

The problem is that it is very difficult, if not impossible, to tell compliers from noncompliers. And so the problem of LATE is far from specific to experimental data.