Last summer, Advice to Writers, one of my favorite blogs, had a post titled “Read Bad Stuff.” Given that Advice to Writers posts are usually very short, I reproduce the post here in full:
If you are going to learn from other writers don’t only read the great ones, because if you do that you’ll get so filled with despair and the fear that you’ll never be able to do anywhere near as well as they did that you’ll stop writing. I recommend that you read a lot of bad stuff, too. It’s very encouraging. “Hey, I can do so much better than this.” Read the greatest stuff but read the stuff that isn’t so great, too. Great stuff is very discouraging. — Edward Albee.
This applies to many other areas of life, and it academic research is no exception.
Over the years, I have found that besides learning by doing (i.e., writing your own papers), one of the best ways to improve as a researcher is learn from others. Obviously, this means that you should read good papers–but not good papers exclusively.
The issue, as I see it, is that doctoral courses tend to have students read only the very best papers on any given topic. At best, a doctoral course will have students referee current working papers as an assignment, but even then, those current working papers usually tend to be selected from those of researchers who produce high-quality work.
If you were interested in knowing what makes some people poor and others not, you would need to sample both poor people and people who aren’t poor. Likewise, if you are interested in knowing what makes a piece of research good and another one not as good, it helps to read widely, and to make some time for reading bad papers. For most people, this comes in the form of refereeing, especially early on in their career.
(When I started out, a journal editor told me that “like referees like,” and I’ve found that to be true. That is, early-career researchers often review the work of other early-career researchers, and senior researchers often review the work of other senior researchers. So if you have ever asked yourself “When will I get better papers to referee?,” the answer is generally “Just wait,” assuming of course that the quality of academic output increases with time spent in a discipline.)
Many scholars–economists, in particular–see refereeing as an unfortunate tax they need to pay in order to get their own papers reviewed and published. Unlike a tax, however, there is almost always something to be learned from refereeing, and from refereeing bad papers in particular.
My last post in this series, on how to use Pearl’s front-door criterion in a regression context, generated lots of page views as well as lots of commentary on Twitter–enough so that I thought a follow-up post might be useful.
Recall that with outcome Y, treatment X, mechanism M, and an unobserved confounder U affecting both Y and X but not M, the method I outlined in the post is pretty simple:
Regress M on X, get b_{MX}, the coefficient on X.
Regress Y on X and M, get b_{YM}, the coefficient on M.
The product of b_{MX} and b_{YM} is the effect of treatment X on outcome Y estimated by the front-door criterion.
One of the things that came up on Twitter was whether someone should use the procedure outlined above, or do the following instead:
Regress M on X, get \hat{e}, the residual.
Regress Y on \hat{e}, get b_Ye, the coefficient on \hat{e}.
Regress M on X, get b_{MX}, the coefficient on X.
The product of b_{Ye} andb_{MX} is the effect of treatment X on outcome Y estimated by the front-door criterion.
Note that the two methods yield the exact same treatment effect. Here is a Kerwinian proof by Stata:
clear drop _all set obs 1000
set seed 123456789 gen u = rnormal(0,1) gen treat = u + rnormal(0,1) gen mech = -0.3 * treat + rnormal(0,1) gen outcome = 0.5 * mech + u + rnormal(0,1)
reg outcome e matrix a = _b[e] reg mech treat matrix b = _b[treat] matrix c = a*b
matrix list c
Note how the estimate obtained in the line that begins with -nlcom- and the line that begins with -matrix list- are identical. In terms of implementation, I prefer the (somewhat old-fashioned, I realize) use of seemingly unrelated regression, since it allows the error terms to be correlated across the two component regressions.
To reiterate what I talked about at the end of my last post: Caveat emptor. When it comes to observational data, rare is the scenario where one can claim that the mechanism M whereby an endogenous treatment X is entirely unaffected by the unobserved confounders U that simultaneously affect treatment X and outcome Y. So this post and the previous one are really meant to be illustrative of something that might work in some rare situations more than an encouragement to apply the front-door criterion unthinkingly as a means of identifying a causal relationship on the cheap. In this as in so many things, TINSTAAFL.
On Twitter, Daniel Millimet adds:
Nice post, but in addition to the caveat at the end about M being indpt of U, imho you should state that this estimates only one aspect of the TE unless one can specify all Ms thru which X affects Y.
(Update: There was a mistake in the original post. Thanks to Peter Hull, Paul Hünermund, Vincent Arel-Bundock, and Daniel Millimet, who provided enlightening comments on the original post, the proper procedure is given at the bottom, along with some ideas for implementation in Stata.)
If you have been reading this blog for a while, you are undoubtedly familiar with the usual methods used by economists to identify causal relations (e.g., randomized controlled trials, instrumental variables, difference-in-differences, etc.)
One method that you may not have heard of, or that you might only have heard of in passing, is Pearl’s (2000) front-door criterion, which Pearl discusses in a more intuitive way in The Book of Why, the popular-press book he has recently published in which he discusses his work on causality (Pearl, 2018). In fact, in The Book of Why, Pearl goes so far as to assert that the use of the front-door criterion might help end the hegemony of randomized controlled trials when it comes to identifying causal impacts!
Consider the following figure, where X denotes a treatment variable, Y denotes the outcome of interest, M denotes a mechanism through which X causes Y, and U represents unobserved confounders.
Directed acyclic graph illustrating the front-door criterion.
Let’s ignore M for a minute. If you have ever seen a graph like the one in the figure above–a directed acyclic graph, or DAG–that type of graph is used by causality researchers to look at the structure underlying a causal model. Here, the identification problem is illustrated by the fact that U affects both X and Y (i.e., there are arrows from U to both X and Y), and that is the reason why identifying the causal relationship flowing from X to Y is difficult, i.e., because any correlation between the two cannot be argued to be causal because of the presence of U. In such cases, an economist’s first instinct would often be to find a variable Z which is correlated with X but which is not affected by U–a setup which would allow identifying the causal effect of X on Y, and which you have probably recognized as an instrumental variable (IV) setup.
Pearl, however, came up with a clever way of identifying the causal effect of X on Y which tends to be somewhat less demanding than having to find a credible IV. Looking at the figure above, Pearl’s method involves finding a mechanism M whereby X causes Y, but which is itself not affected by unobserved confounders. (Indeed, notice that there is no arrow from U to M in the figure above.)
That is essentially the idea of the front-door criterion: To find a mechanism M whereby X causes Y but which is not itself affected by unobserved confounders. In an old post, Alex Chinco, an assistant professor of finance at the University of Illinois, explains how even in the presence of self-selection of units into treatment, if you can credibly make the case that treatment intensity is not affected by the unobserved confounders that drive both the uptake of treatment and your outcome of interest, you can identify the effect of the treatment on that outcome.
Intuitively, what the front-door criterion does is kind of like what IV does, except that it moves the variable that purges the variation in X from its correlation with U in front of X (hence the name front-door criterion), or between X and Y such that you have X -> M -> Y, instead of behind it (as in a traditional 2SLS setup) where you have Z -> X -> Y.
It took me a while to sit down to write this post, because the idea behind this series of posts is to present things that one can use in a regression context, and whatever I have read from Pearl usually presents the front-door criterion in a simple binary treatment, binary outcome, and binary mechanism example involving smoking as treatment, lung cancer as outcome, and the rate of tar accumulation in the lungs as mechanisms, in which case you can recover the treatment effect by multiplying conditional probabilities.
But applied economists usually are interested in examples that involve more than just binary variables, and it took me a while to find a discussion of how to do this in a regression context. Even the recent paper by Glynn and Kashkin (2017) comparing front and backdoor criteria does not go into the details of how to do that. Luckily, the Alex Chinco post I refer to above goes into the details of how to do that. Specifically, Chinco discusses a two-step procedure, as follows. In a discussion about how to implement this in practice on Twitter, here is what came up (though reddit user /u/unistata came up with it six months before):
Regress M on X and a constant.
Regress Y on M, X, and a constant.
Multiply the coefficient on X in step 1 by the coefficient on M in step 2.
The result of step 3 is then the front-door criterion estimate of the causal effect of treatment X on outcome Y.
How would you implement this in Stata. A quick and dirty way to do it would to estimate
. sureg (m x) (y m x)
. nlcom [m]_b[x]*[y]_b[m]
As always, there is no free lunch, and in order to apply the front-door criterion, one has to make the case that M really is not affected by U the way X and Y are, which might be a difficult case to make. But if you have an application where self-selection into treatment compromises the identification of the causal effect of treatment on your outcome of interest and you can find a variable that measures the intensity of that treatment which is not driven by the same confounding factors as those affecting treatment and outcome, you might have a good case for using the front-door criterion.