Skip to content

Marc F. Bellemare Posts

‘Metrics Monday: Lagged Variables as Instruments

A few years ago, Taka Masaki, Tom Pepinsky, and I published an article in the Journal of Politics titled “Lagged Explanatory Variables and the Estimation of Causal Effects,” where we looked at the phenomenon (then relatively widespread in political science, less so in economics) of lagging an explanatory variable in an effort to exogenize it–that is, the phenomenon whereby one replaces x_{it} by x_{it-1}.

That article was well-received, and it has since also become well-cited, but one of the things we did not touch upon was the phenomenon (seemingly relatively more widespread in economics, less so in political science) of lagging an explanatory variable to use it as an instrumental for itself–that is, the phenomenon whereby one instruments x_{it} with x_{it-1}.

In work concurrently written with our 2017 Journal of Politics article, Reed (2015) had also looked at the use of lags as controls, but he concluded by suggesting the use of lagged variables as instrumental variables (IVs) instead of as controls.

In a new working paper with my PhD student Yu Wang titled “Lagged Variables as Instruments,” we build on the work in Reed (2015) and on the structure of my earlier work to look at the use of lagged variables as IVs. Whereas Reed only consider simultaneity as a source of bias, however, we generalize to look at any unobserved confounder.

Here is the abstract of this new paper:

Lagged explanatory variable remain commonly used as instrumental variables (IVs) to address endogeneity concerns in empirical studies with observational data. Few theoretical studies, however, address whether lagged IVs mitigate endogeneity. We develop a theoretical setup in which dynamics among the endogenous explanatory variable and the unobserved confounders cannot be ruled out and look at the consequences of lagged IVs for bias and the root mean square error (RMSE). We then use Monte Carlo simulations to illustrate our analytical findings. We show that when lagged explanatory variables have no direct causal effect on the dependent variable or on the unobserved confounders, the lagged IV method mitigates the endogeneity problem by reducing both bias and the root mean squared errors given specific parameter values relative to the naïve OLS case. If either or both of the causal relationships above are present, however lagged IVs increase both bias and the RMSE relative to OLS, and they virtually blow up the likelihood of a Type I error to one.

‘Metrics Monday: Least Squares Is But One Approach to Linear Regression

One of the things I learned as an undergraduate at Montreal is the equivalence of ordinary least squares (OLS), maximum likelihood (ML), and the generalized method of moments (GMM) when it comes to linear regression.

This is something which I suspect a lot of people have lost track of in the wake of the Credibility Revolution, which has emphasized the use of linear methods.

(For instance, when I taught my causal inference with observational data class last semester, I remember showing my students a likelihood function and asking them whether they had covered maximum likelihood estimation in their first-year courses, drawing a number of blank stares.)

The idea is simple. When estimating the equation

(1) y = bx + e,

one has a choice of estimator, viz. OLS, ML, or GMM.

Intuitively,

  1. OLS picks b so as to minimize the sum of squared residuals,
  2. ML picks b so as to maximize the likelihood that the estimation sample is a random sample from a population of interest, and
  3. GMM picks b by solving for what is known as a moment condition which, in the case of OLS, is such that E(x’e) = 0. That is, it chooses b to solve E(x'(y – xb)) = 0. Note that this is simply assuming that the regressors are uncorrelated with the errors, or an assumption of exogeneity.

If e is distributed normally, the OLS and ML estimators of b in equation 1 are identical. If the observations are independent and identically distributed (iid), the OLS and GMM estimators of b in equation 1 are equivalent.

Here is a bit of code that offers proof by Stata:

* Linear Regression Three Ways

clear
drop _all
set obs 1000
set seed 123456789
gen x = rnormal(0,1)
gen y = rnormal(5,1) + 10*x + rnormal(0,1)

* Ordinary Least Squares

reg y x

* Maximum Likelihood

capture program drop ols
program ols
  args lnf xb lnsigma
  local y "$ML_y1"
  quietly replace `lnf' = ln(normalden(`y', `xb',exp(`lnsigma')))
end 

ml model lf ols (xb: y = x) (lnsigma:)
ml maximize

* Generalized Method of Moments
 
gmm (y - x*{beta} - {alpha}), instruments(x) vce(unadjusted)

In that code, I generate a variable x distributed N(0,1), and a variable y equal to a constant distributed N(5,1) added to 10 times variable x plus an error term e distributed N(0,1). Note that, by default, GMM estimates an IV-like setup; with OLS, this collapses to x serving as an instrument for itself.

The OLS, ML, and GMM estimators all yield the same point estimates of 5.014798 for the constant and 9.932382 for the slope coefficient. The standard errors in the example above are identical for the MLE and GMM cases, and they differ only very, very slightly for the OLS case.

A few things to note:

  • MLE is a special case of GMM where a specific distribution is imposed on the data. In the ML example above, the normal distribution has been imposed on the data.
  • Both GMM and MLE are iterative procedures, meaning that they start from a guess as to the value of b, then go on from there. In contrast, OLS does not guess, as its formula immediately solves for the value of b that minimizes the sum of squared residuals.