Skip to content

Category: Methods

A Nifty Fix for When Your Treatment Variable Is Measured with Error (Technical)

One of the advantages of having really smart colleagues — the kind who exhibit genuine intellectual curiosity, and who are truly interested in doing things well — is that you get to learn a lot from them.

I was recently having a conversation with my colleague and next-door office neighbor Joe Ritter in which we were discussing the possibility that the (binary) treatment variable in a paper I am working on might suffer from some misclassification. That is, my variable D = 1 if an individual has received the treatment and D = 0 otherwise, but it is possible that some people for whom D = 1 actually report D = 0, and that some people for whom D = 0 actually report D = 1.

When the possibility that my treatment variable might suffer from misclassification (or measurement error) arose, Joe recalled that he’d read a paper by Christopher R. Bollinger about this a while back. A few hours later, he sent me an email to which he’d attached the paper. Here is the abstract:

Love It or Logit, or: Man, People *Really* Care About Binary Dependent Variables

Last Monday’s post, in which I ranted a bit about the opposition to estimating linear probability models (LPM) instead of probits and logits, turned out to be very popular. In fact, that post is now in my top three most popular posts ever.

(Credit: xkcd.)
(Credit: xkcd.)

Last Monday morning, when my wife left for work, I told her I was expecting a meager number of page views that day given my choice of post topic. I was wrong: people really care about binary dependent variables.

A Rant on Estimation with Binary Dependent Variables (Technical)

Suppose you are trying to explain some outcome [math]y[/math], where [math]y[/math] is equal to 0 or 1 (e.g., whether someone is a nonsmoker or a smoker). You also have data on a vector of explanatory variables [math]x[/math] (e.g., someone’s age, their gender, their level of education, etc.) and on a treatment variable [math]D[/math], which we will also assume is binary, so that [math]D[/math] is equal to 0 or 1 (e.g., whether someone has attended an information session on the negative effects of smoking).

If you were interested in knowing what the effect of attending the information session on the likelihood that someone is a smoker, i.e., the impact of [math]D[/math] on [math]y[/math] The equation of interest in this case is