Skip to content

Category: Uncategorized

‘Metrics Monday: Fixed Effects, Random Effects, and (Lack of) External Validity

Very early mornings, before our entire households is awake, are when I get all of my professional reading done. Last Monday, I read a recent published paper in my discipline. I am remaining purposely vague about that paper, because the research question was interesting and the findings pretty useful; it’s just that the econometrics weren’t great.

Anyway, at some point the authors make the following argument:

  • Our random effects findings are almost identical to our fixed effects findings;
  • Random effects should be used with a random sample from a population of interest and fixed effects in the absence of such a random sample;
  • This means our (small, highly selected sample) is representative of the population of interest;
  • Thus, this means we can use findings from our (small, highly selected sample) to make inferences about the population as a whole.

Farmers Markets and Food-Borne Illness, Finally Finished

A little over a year ago, I published an op-ed in the New York Times titled “Farmers Markets and Food-Borne Illness.”

That op-ed was based on the findings of a similarly titled working paper of mine, which one of the New York Times editors had gotten wind of after I first discussed it on this blog during the summer of 2015.

In my op-ed, however, I mentioned that I would soon post an updated version of our paper. But things got busy, and though I worked quite a bit on it here and there, I did not get to finish it until a few weeks ago.

(And by “finish,” I mean “stop working on it until it is returned to us with reviewer comments about how to improve it before it can get published.”)

Here is the new version. The major innovation is that we now exploit both the longitudinal nature of the data as well as a source of plausibly exogenous variation for the number of farmers markets in a given state in a given year. This obviously makes for much stronger results than we used to have. Here is the abstract of this latest version:

‘Metrics Monday: Dealing with Duration Data

Salvador Dali, “The Persistence of Time.”

It sometimes happens that in the general regression equation

(1) [math]y_{i} = \alpha + \beta {x}_{i} + \epsilon_{i}[/math],

your outcome of interest will be a length of time, or duration. Classic examples from labor economics are the duration of individual unemployment spells, or the duration of a strike.

The problem with duration data is that they do not look like the continuous outcome variable ranging from minus to plus infinity (ideally normally distributed) found in most introductory textbooks. In the unemployment spell example, we typically know when someone loses their job, and we know when they find another one. Sometimes, however, the duration is censored; that is, we know when someone loses their job, but they remain unemployed when we record the data.

In both cases, the data look nothing like the textbook outcome variable, and so special care might be required in how we deal with a duration on the left-hand side of equation (1). Typically, this is done with duration analysis, as it is known in economics. Those are also known as survival models–a term that comes from the biostatistics, in which researchers are often interested in how long someone survives after some event of interest happens–but that is only one of the many names given to duration analysis.*