Last updated on June 9, 2013
Last Monday’s post, in which I ranted a bit about the opposition to estimating linear probability models (LPM) instead of probits and logits, turned out to be very popular. In fact, that post is now in my top three most popular posts ever.
Last Monday morning, when my wife left for work, I told her I was expecting a meager number of page views that day given my choice of post topic. I was wrong: people really care about binary dependent variables.
The post generated quite a bit of commentary. Some said that if you have experimental data, you would not want to run a regression. But that’s not completely true. Sure, with experimental data, you can run a t-test comparing the means of the control and treatment groups. But I can think of many cases where you would still want to run a regression in order to increase the precision of your estimate of the treatment effect.
The most interesting response came from Penn State’s Christopher Zorn in a post on his blog. If you want to read the remainder of this post, I suggest you read his post and come back for mine (and make sure to add his blog to your RSS feed while you’re at it).
Done? Okay, here goes:
- I did not read the King and Roberts working paper Christopher links to (with my upcoming move halfway across the country, I need all the time I can get to work on my own research), but in the comments to my post, Conner responds “The King and Roberts results are more relevant for the case when identification of all parameters of interest requires that we have the correct model, e.g., forecasting probabilities. This isn’t the case when looking at binary treatment assignment and are interested in estimating average treatment effects. You just need the expectation of the error term to be the same in the treatment and control groups. King and Roberts more or less make this point themselves on page 3 of their paper.” Moreover, the probit standard errors model one kind of variance (that due to the Bernoulli structure of the dependent variable), but they are not robust to other kinds of heteroskedasticity. And with heteroskedasticity, the probit and logit coefficients are inconsistent, even with robust standard errors (ht: Tim Beatty.)
- I must insist that forecasting probabilities is not what I am interested in. Most of the time, I’m interested in getting as close as possible to the average treatment effect. If you are interested in forecasting probabilities, then by all means estimate a probit or a logit.
- On nonlinear functional forms, in the example Christopher gives (the likelihood that extremely poor or extremely wealthy people will purchase a TV will not change much if their income increases by $1,000, but the change will be much larger for someone with an average income), the nonlinear function is best modeled by including both income and its square, so as to model nonlinearities in the impact of income on the likelihood that someone purchases a television. But even then, this assumes that we know the exact shape of the nonlinear relationship.
And in response to Christopher’s last two points, I do not use R, but I plugged his example in Stata. What happens here is that the LPM will give a coefficient estimate of 0.6, but the logit omits x altogether because in those cases where x = 1, y is perfectly predicted, i.e., whenever x = 1, y = 1. Having compaed the LPM with fixed effects with the conditional logit with fixed effects for one of the applications I have worked on, the latter does the same thing.
I don’t know that not dropping those observations is a bad thing, though: Even in Christopher’s example, those observations contain a lot of information about the relationship between x and y, namely that the two are highly positively correlated. To see this, supposed you wanted to know the likelihood that people who smoke will die of lung cancer. You collect individual data on smoking and on causes of death, and you find that everyone who smokes dies of lung cancer, but that only about half of the people who don’t smoke die of lung cancer.
If you wanted to know how the decision to start smoking changes the likelihood that someone will die of lung cancer, would you throw away all the observations for which an individual is a smoker? I wouldn’t, as they contain valuable information that help us quantify the marginal impact of the decision to smoke on the likelihood of dying from lung cancer.
With that said, I add the caveat that I am not an econometrician and that, to paraphrase a soon-to-be-colleague, I have strong opinions that are weakly held. It looks as though one’s preferred estimator for binary dependent variables is really all a matter of disciplinary cultural norms (economist love probits; other social scientists, not so much), if not of field cultural norms within disciplines.
So ultimately (and this deserved to be in bold), because no estimator is perfect, you should you always estimate all three (LPM, probit, and logit) and compare their results to make sure nothing is amiss.