Goodness of Fit in Binary Choice Models [Technical]

In econometrics, goodness-of-fit measures tell us what percentage of the variation in a dependent variable is explained by the explanatory variables. If you’ve ever taken a statistics class, you are almost surely familiar with the R-square measure. In a regression of, say the logarithm of wage on age, gender, and education level, the R-square is simply the fraction of the total variation in wage that is explained by variation in age, gender, and education level.

Given the foregoing, you’d think R-square is a great measure, right? I mean, it tells you how much of the variation in Y all of your X‘s explain! Yeah, no… R-square is actually not all that interesting, because you can thrown in any variable on the right-hand side — for example, the color of one’s underwear in the log wage regression above — and R-square can only increase, because there is bound to be a (spurious) correlation between the color of one’s underwear and one’s wage. Even the adjusted R-square, which corrects for how many variables there are in X, isn’t that great, since that correction is somewhat arbitrary.

With binary outcomes (i.e., when Y can only be equal to one or zero, such as in questions answered by “Yes” or “No”), people often like to use the percentage of ones and zeroes correctly predicted, and report that as a measure of goodness-of-fit. Kennedy, in his classic econometric intuition-building treatise, argued that this was not a very good measure:

It is tempting to use the percentage of correct predictions as a measure of goodness of fit. This temptation should be resisted: a naïve predictor, for example that every y = 1, could do well on this criterion. A better measure along these lines is the sum of the fraction of zeros correctly predicted plus the fraction of ones correctly predicted, a number which should exceed unity if the prediction method is of value. See McIntosh and Dorfman (1992).

This could use a bit of explanation: Suppose we have Y = (0, 1, 1, 1, 1, 1, 1, 1), and we have a vector of predicted values of be (0, 1, 1, 0, 1, 1, 1, 0). The usual percentage-of-correct-predictions measure would be 0.75, since 75% of observations are correctly predicted, or 6 out of 8. But one can do even better by guessing “all ones.” Indeed, if I were to guess all ones, I’d get 87.5% goodness of fit, or 7 out of 8.

What McIntosh and Dorfman (1992) suggested instead was to add up (i) the fraction of correctly predicted zeros (in my example, 100%) and (ii) the fraction of correctly predicted ones (in my example, 50%). In my example, then, the total McIntosh-Dorfman goodness-of-fit measure would be 1.5 which, by McIntosh and Dorfman criterion standards, would be deemed a good fit, since it exceeds 1.

Now, if your reaction to the above was this:

PeterGriffin

Consider the following example from a referee report I received on my 2012 World Development article about the welfare impacts of participation in contract farming.

In that referee report, the reviewer was faulting me for low pseudo R-square measures on a probit, and suggested that I report the percentage of correct predictions. Notwithstanding the fact that that pseudo R-square measures are pretty bad (see Estrella, 1998 on that point), I responded with the Kennedy quote above, and in the published version of my paper, in table 5, I actually report three measures: the pseudo R-square (0.081), the percentage of correct predictions (0.63), and the Dorfman-McIntosh measure (1.29). Note that although the percentage of correct predictions and the McIntosh-Dorfman measure are consistent with one another (if I assume I predicted 63% of both ones and zeros correctly, I get a McIntosh-Dorfman of 0.126), the pseudo R-square tells me that only 8 percent of the variance in the dependent variable is explained by my left-hand side variable, which does strike me as misleading in this case.

No related content found.

3 comments

  1. Nathan P.

    Does our friend from College Station (Stata) have a McIntosh-Dorfman test? And what does one do if the data are continuous in nature, rather than ordinal?

  2. Marc F. Bellemare

    Nathan, thanks for your questions. Stata does not have a McIntosh-Dorfman command, but it’d be a pretty simple thing to code (you only need to compute the proportion of ones and zeros correctly predicted and sum them up, which can be done with the predict, generate and count commands, if I’m not mistaken). As for a continuous dependent variable, this wouldn’t work given that the likelihood of y taking any particular value in this case is equal to zero.

  3. Kindred Winecoff

    In poli sci the norm seems to be to use ROC curves of true and false positives, along with AUC scores. I’ve recently started using separation plots (Ward and Greenhill, AJPS in I think 2011), as I find them to be a bit more intuitive, with the added virtue of saving space for discussion of substantive results. More and more I’m seeing out-of-sample testing, which is a great thing to do when available.

    Either way, getting out of the world of R-squared is almost always a good idea.