Suppose you have the following estimable equation:

(1) [math]y_{it} = \alpha_{i} + \beta {x}_{it} + \epsilon_{it}[/math].

This is a pretty standard equation when dealing with panel data: [math]i[/math] denotes an individual in the set [math]i \in \{1,…,N\}[/math], [math]t[/math] denotes the time period in the set [math]t \in \{1,…,T\}[/math], [math]y[/math] is an outcome of interest (say, wage), [math]x[/math] is a variable of interest (say, an indicator variable for whether someone has a college degree), [math]\alpha[/math] is an individual fixed effect, and [math]\epsilon[/math] is an error term with mean zero. Normally with longitudinal data, it is the case that [math]N > T[/math], so that there are more individuals in the data than there are time periods. (If [math]T > N[/math], you are likely dealing more with a time-series problem than with a typical applied micro problem.)

Though we are normally interested in estimating and identifying the relationship between the variable of interest [math]x[/math] and the outcome variable [math]y[/math], I wanted to focus today on heteroskedasticity.*

Under ideal circumstances, the variance of the error term, [math]Var(u_{i}|x_{i}) = \sigma^{2}[/math]. This is what we mean when we say that the errors are “spherical,” which alludes to the shape of the scatter plot around the regression line. In this case, the variance of the outcome variable [math]y[/math] around the regression line is said to be constant across the range of values taken by [math]x[/math].

It is most often the case, however, that [math]Var(\epsilon_{i}|x_{i}) = \sigma_{i}^{2}[/math]. That is, it often happens that the errors are not spherical, and that the variance of the outcome variable [math]y[/math] around the regression line is not constant across the range of values taken by [math]x[/math].

As I said, we are almost always interested in estimating equation (1) above and call it a day. In such cases, in the presence of heteroskedasticity, the standard errors around [math]\alpha[/math] and [math]\beta[/math] are off, and our inferences are mistaken.

Luckily, this is easily corrected by using the Huber-White sandwich standard error correction. (The names Huber and White refer to a statistician and an econometrician, respectively; the name “sandwich” refers to how the variance-covariance matrix is “sandwiched” in the middle of the relevant estimator. For applied econometricians, the classic read is White, 1980.)

But every so often, it happens that heteroskedasticity has empirical content–that is, studying the variance of the error term [math]Var(\epsilon_{i}|x_{i})[/math], and how it varies as [math]x[/math] varies, can tell us something useful about the world.

In the wage-education example above, the variance of the error term has useful empirical content. Indeed, the regression

(2) [math]\hat{\epsilon}^2_{it} = \delta_{i} + \gamma {x}_{it} + \nu_{it}[/math],

which linearly projects the variance of the error term in equation 1 on the regressors of equation (1), can be useful in studying how variable an individual’s wage is depending on whether that person has a college degree. Again, if the outcome variable in equation (1) is an individual’s wage and the variable of interest is a dummy variable for whether that person has a college degree, rejecting the null hypothesis [math]H_{0} : \gamma = 0[/math] in favor of the alternative hypothesis [math]H_{A} : \gamma < 0[/math] is useful, in that it tells us that having a college degree is associated with a less variable wage. (The intuition here is to picture the scatter and regression line for equation (1), in which case it is easy to see that the variance of the error term–how its distance from the regression line varies across the domain of the variable of interest–is the variance of the outcome variable in the same dimension.) For someone who is risk-averse and who prefers a stable income to an unstable one, this is useful information when deciding whether to go to college.

The general idea is that although in most cases we are interested in the first moment [math]E(y|x)[/math], it is sometimes the case that the second moment [math]Var(y|x)[/math] is useful in and of itself. There are many such possible applications, in which heteroskedasticity can be exploited to generate useful empirical content. I am currently working on such an application with two of my doctoral students, wherein we show that smallholder farmers who participate in modern agricultural value chains as growers not only have higher incomes, they also have less variable income. This is surprising: If you believe in efficient markets, there shouldn’t be any assets (here, the contract is the asset) that have both higher mean *and* lower variance.

* It can also be spelled “heteroscedasticity.” My understanding is that, much like for “skeptic” vs. “sceptic,” this is primarily a difference between American and British English, which respectively refer to the concept as heteroskedasticity and heteroscedasticity. But what does a guy named Marc whose name is routinely misspelled “Mark” know?