Last week the Midwest Economics Association (MEA) meetings were taking place in Minneapolis. Because a few friends were presenting at MEA, I decided to go check out the sessions at which they were presenting.
At one of the sessions I attended, a graduate student presented a very cool paper in which he had run a randomized controlled trial to determine the effect of a treatment variable D on an outcome Y, randomizing D and collecting information on a number of control variables X in addition to collecting information on Y.
The graduate student came from a good department, so he carefully motivated his paper by talking about the policy relevance of the relationship between D and Y, explaining that policy makers cared deeply about said relationship, and how they made a big deal of it.
When presenting his results, the presenter did what we commonly do in economics, which is to show a table presenting several specifications of the regression of interest, from the most parsimonious (i.e., a simple regression of Y on just D) to the least parsimonious (i.e., a complex regression of Y on D and all the available controls X).
The problem, however, was that the R-squared measure–the regression’s coefficient of determination–for the simple regression of Y on just D (i.e., the most parsimonious specification) was about 0.01, meaning that the treatment variable D explained about 1 percent of the outcome of interest.