Skip to content

Category: Uncategorized

Farmers Markets and Food-Borne Illness, Finally Finished

A little over a year ago, I published an op-ed in the New York Times titled “Farmers Markets and Food-Borne Illness.”

That op-ed was based on the findings of a similarly titled working paper of mine, which one of the New York Times editors had gotten wind of after I first discussed it on this blog during the summer of 2015.

In my op-ed, however, I mentioned that I would soon post an updated version of our paper. But things got busy, and though I worked quite a bit on it here and there, I did not get to finish it until a few weeks ago.

(And by “finish,” I mean “stop working on it until it is returned to us with reviewer comments about how to improve it before it can get published.”)

Here is the new version. The major innovation is that we now exploit both the longitudinal nature of the data as well as a source of plausibly exogenous variation for the number of farmers markets in a given state in a given year. This obviously makes for much stronger results than we used to have. Here is the abstract of this latest version:

‘Metrics Monday: Dealing with Duration Data

Salvador Dali, “The Persistence of Time.”

It sometimes happens that in the general regression equation

(1) [math]y_{i} = \alpha + \beta {x}_{i} + \epsilon_{i}[/math],

your outcome of interest will be a length of time, or duration. Classic examples from labor economics are the duration of individual unemployment spells, or the duration of a strike.

The problem with duration data is that they do not look like the continuous outcome variable ranging from minus to plus infinity (ideally normally distributed) found in most introductory textbooks. In the unemployment spell example, we typically know when someone loses their job, and we know when they find another one. Sometimes, however, the duration is censored; that is, we know when someone loses their job, but they remain unemployed when we record the data.

In both cases, the data look nothing like the textbook outcome variable, and so special care might be required in how we deal with a duration on the left-hand side of equation (1). Typically, this is done with duration analysis, as it is known in economics. Those are also known as survival models–a term that comes from the biostatistics, in which researchers are often interested in how long someone survives after some event of interest happens–but that is only one of the many names given to duration analysis.*

‘Metrics Monday: Heteroskedasticity and Its Content

Suppose you have the following estimable equation:

(1) [math]y_{it} = \alpha_{i} + \beta {x}_{it} + \epsilon_{it}[/math].

This is a pretty standard equation when dealing with panel data: [math]i[/math] denotes an individual in the set [math]i \in \{1,…,N\}[/math], [math]t[/math] denotes the time period in the set [math]t \in \{1,…,T\}[/math], [math]y[/math] is an outcome of interest (say, wage), [math]x[/math] is a variable of interest (say, an indicator variable for whether someone has a college degree), [math]\alpha[/math] is an individual fixed effect, and [math]\epsilon[/math] is an error term with mean zero. Normally with longitudinal data, it is the case that [math]N > T[/math], so that there are more individuals in the data than there are time periods. (If [math]T > N[/math], you are likely dealing more with a time-series problem than with a typical applied micro problem.)

Though we are normally interested in estimating and identifying the relationship between the variable of interest [math]x[/math] and the outcome variable [math]y[/math], I wanted to focus today on heteroskedasticity.*