## Goodness of Fit in Binary Choice Models [Technical]

In econometrics, goodness-of-fit measures tell us what percentage of the variation in a dependent variable is explained by the explanatory variables. If you’ve ever taken a statistics class, you are almost surely familiar with the R-square measure. In a regression of, say the logarithm of wage on age, gender, and education level, the R-square is simply the fraction of the total variation in wage that is explained by variation in age, gender, and education level.

Given the foregoing, you’d think R-square is a great measure, right? I mean, it tells you how much of the variation in Y all of your X‘s explain! Yeah, no… R-square is actually not all that interesting, because you can thrown in any variable on the right-hand side — for example, the color of one’s underwear in the log wage regression above — and R-square can only increase, because there is bound to be a (spurious) correlation between the color of one’s underwear and one’s wage. Even the adjusted R-square, which corrects for how many variables there are in X, isn’t that great, since that correction is somewhat arbitrary.

With binary outcomes (i.e., when Y can only be equal to one or zero, such as in questions answered by “Yes” or “No”), people often like to use the percentage of ones and zeroes correctly predicted, and report that as a measure of goodness-of-fit. Kennedy, in his classic econometric intuition-building treatise, argued that this was not a very good measure:

It is tempting to use the percentage of correct predictions as a measure of goodness of fit. This temptation should be resisted: a naïve predictor, for example that every y = 1, could do well on this criterion. A better measure along these lines is the sum of the fraction of zeros correctly predicted plus the fraction of ones correctly predicted, a number which should exceed unity if the prediction method is of value. See McIntosh and Dorfman (1992).

This could use a bit of explanation: Suppose we have Y = (0, 1, 1, 1, 1, 1, 1, 1), and we have a vector of predicted values of be (0, 1, 1, 0, 1, 1, 1, 0). The usual percentage-of-correct-predictions measure would be 0.75, since 75% of observations are correctly predicted, or 6 out of 8. But one can do even better by guessing “all ones.” Indeed, if I were to guess all ones, I’d get 87.5% goodness of fit, or 7 out of 8.

What McIntosh and Dorfman (1992) suggested instead was to add up (i) the fraction of correctly predicted zeros (in my example, 100%) and (ii) the fraction of correctly predicted ones (in my example, 50%). In my example, then, the total McIntosh-Dorfman goodness-of-fit measure would be 1.5 which, by McIntosh and Dorfman criterion standards, would be deemed a good fit, since it exceeds 1.