# ‘Metrics Monday: Dealing with Duration Data

It sometimes happens that in the general regression equation

(1) $y_{i}=\alpha+\beta{x}_{i}+\epsilon_{i}$,

your outcome of interest will be a length of time, or duration. Classic examples from labor economics are the duration of individual unemployment spells, or the duration of a strike.

The problem with duration data is that they do not look like the continuous outcome variable ranging from minus to plus infinity (ideally normally distributed) found in most introductory textbooks. In the unemployment spell example, we typically know when someone loses their job, and we know when they find another one. Sometimes, however, the duration is censored; that is, we know when someone loses their job, but they remain unemployed when we record the data.

In both cases, the data look nothing like the textbook outcome variable, and so special care might be required in how we deal with a duration on the left-hand side of equation (1). Typically, this is done with duration analysis, as it is known in economics. Those are also known as survival models–a term that comes from the biostatistics, in which researchers are often interested in how long someone survives after some event of interest happens–but that is only one of the many names given to duration analysis.* Continue reading

# ‘Metrics Monday: Heteroskedasticity and Its Content

Suppose you have the following estimable equation:

(1) $y_{it}=\alpha_{i}+\beta{x}_{it}+\epsilon_{it}$.

This is a pretty standard equation when dealing with panel data: $i$ denotes an individual in the set $i\in\{1,...,N\}$$t$ denotes the time period in the set $t\in\{1,...,T\}$$y$ is an outcome of interest (say, wage), $x$ is a variable of interest (say, an indicator variable for whether someone has a college degree), $\alpha$ is an individual fixed effect, and $\epsilon$ is an error term with mean zero. Normally with longitudinal data, it is the case that $N>T$, so that there are more individuals in the data than there are time periods. (If $T>N$, you are likely dealing more with a time-series problem than with a typical applied micro problem.)

Though we are normally interested in estimating and identifying the relationship between the variable of interest $x$ and the outcome variable $y$, I wanted to focus today on heteroskedasticity.* Continue reading

# ‘Metrics Monday: Dealing with Imperfect Instruments II

Last week, in the first half of this two-part post, I talked about the method developed by Conley et al. (2012) to deal with departures from the assumption of strict exogeneity of an instrumental variable (IVs)–that is, to deal with what Conley et al. (2012) refer to as “plausibly exogenous” IVs.

How to deal with an imperfect instrument was an idea whose time apparently had come in 2012: In the same volume of the same journal, Nevo and Rosen (2012) develop an alternative method for dealing with imperfect IVs, which is what I wanted to discuss this week. Continue reading