# ‘Metrics Monday: Dealing with Duration Data

It sometimes happens that in the general regression equation

(1) ,

your outcome of interest will be a length of time, or duration. Classic examples from labor economics are the duration of individual unemployment spells, or the duration of a strike.

The problem with duration data is that they do not look like the continuous outcome variable ranging from minus to plus infinity (ideally normally distributed) found in most introductory textbooks. In the unemployment spell example, we typically know when someone loses their job, and we know when they find another one. Sometimes, however, the duration is censored; that is, we know when someone loses their job, but they remain unemployed when we record the data.

In both cases, the data look nothing like the textbook outcome variable, and so special care might be required in how we deal with a duration on the left-hand side of equation (1). Typically, this is done with duration analysis, as it is known in economics. Those are also known as survival models–a term that comes from the biostatistics, in which researchers are often interested in how long someone survives after some event of interest happens–but that is only one of the many names given to duration analysis.* Continue reading