Suppose you have the following regression model:

(1) Y = a + b_{1}X_{1} + b_{2}X_{2} + …+ b_{K}X_{K} + e.

You have *N* observations which you use to estimate the regression. If *N* < *K*, you will not be able to estimate the vector of parameters b = (b_{1}, b_{2}, …, b_{K}). That’s because you have fewer equations than you have unknowns in your system–recall from your middle-school algebra classes that you need at least as many equations as you have unknowns in order to solve for those unknowns. So in econometrics, *N* < *K *means that you cannot “solve” for b (i.e., it is under-determined), *N* = *K *means that your equation has a unique solution for b (i.e., it is exactly determined), and *N* > *K *means that your equation has several solutions for b (i.e., it is over-determined).

Multicollinearity is the problem that arises when *N* is too small relative to *K*, or what Arthur Goldberger called “micronumerosity,” referring to too small a number of observations relative to the number of parameters. The most extreme version of multicollinearity is *N* < *K*, in which case you cannot estimate anything. Continue reading →