One of the things I often tell students when discussing whether to use linear regression or a more complicated nonlinear (i.e., maximum likelihood-based) procedure is that one advantage of linear regression is that it prevents identification by functional form.
By “identification via functional form,” what I mean is that the distributional or functional form assumptions made in the context of more complicated nonlinear procedures can lead you to estimate a coefficient which is purely identified because of those distributional or functional form assumptions.
I always had a hard time clearly explaining the intuition behind this, until my colleague Arne Henningsen, with whom I co-taught my advanced econometrics class at the University of Copenhagen, gave a really good example to the class. Here is that example.
Suppose you want to estimate a two-stage setup wherein the first stage is an instrumenting regression, which conditions your endogenous treatment variable on a presumably exogenous instrumental variable (IV) or set of IVs, and wherein the second stage regresses your outcome of interest on the exogenized (i.e., instrumented) version of your treatment variable.
Should you choose to estimate, say, a Heckman selection model or a treatment regression (respectively heckman and treatreg in Stata), you can actually estimate the whole thing without there being a variable in the first stage which is excluded from the second stage. If you do that, your treatment effect will be identified via functional form instead of from the data.
By contrast, if you choose to estimate a 2SLS (i.e., linear) specification (ivreg or ivreg2 in Stata), the procedure will not work if you choose not to exclude anything in the first stage, because the predicted value of your treatment variable will then be perfectly collinear to your vector of control variables in the second stage, since your predicted treatment will then be a linear combination of your control variables–that is, the very definition of collinearity!
The foregoing offers a cautionary tale: Even with a good reason to go for a nonlinear, maximum likelihood-based procedure, it is always best to at least benchmark the results of that procedure against a linear specification, to ensure that there isn’t anything funny going on.
And since we are on the topic of identification, Arthur Lewbel (Boston College) has paper that is forthcoming in the Journal of Economic Literature titled “The Identification Zoo,” about the many meanings of the word “identification” in econometrics. (Reader beware: Set aside at least a few hours to go through the whole thing, as the working paper is 112 pages!)
UPDATE: New reader Daumantas Bloznelis writes:
I would like to point out a detail that might (…) merit a more careful treatment. A model’s functional form is conceptually rather different from the estimation technique employed. For example, a linear model y=a+bx+e can be estimated by many different techniques, e.g. by maximizing the likelihood, minimizing the sum of squared or absolute errors, etc. A nonlinear model, say, y=f(x,b)+e, can also be estimated using the same techniques, although some details might differ (OLS becomes NLS or similar).
Crucially, the linearity of the model by itself has little to say about the estimation technique that should be used. Similarly, knowing the estimation technique does little in helping identify what functional form of the model is used.
Correct–and once again, this speaks to my difficulty in translating my intuition into language on this issue. Most intro courses will show students how to estimate a linear regression by OLS, by MLE, and by GMM, and the estimation technique is distinct from the functional form.