In econometrics, goodness-of-fit measures tell us what percentage of the variation in a dependent variable is explained by the explanatory variables. If you’ve ever taken a statistics class, you are almost surely familiar with the R-square measure. In a regression of, say the logarithm of wage on age, gender, and education level, the R-square is simply the fraction of the total variation in wage that is explained by variation in age, gender, and education level.
Given the foregoing, you’d think R-square is a great measure, right? I mean, it tells you how much of the variation in Y all of your X‘s explain! Yeah, no… R-square is actually not all that interesting, because you can thrown in any variable on the right-hand side — for example, the color of one’s underwear in the log wage regression above — and R-square can only increase, because there is bound to be a (spurious) correlation between the color of one’s underwear and one’s wage. Even the adjusted R-square, which corrects for how many variables there are in X, isn’t that great, since that correction is somewhat arbitrary.