Skip to content

Marc F. Bellemare Posts

Dietary Carbohydrate Intake and Mortality: Not All that Glitters is Gold

A few months ago, The Lancet Public Health published a much-ballyhooed article by Seidelmann et al. linking dietary carbohydrate consumption that is either too low or too high with an increased risk of mortality.

I first heard about that article when I saw an article about it on CNN.com which, as is often the case with popular-press pieces about splashy public health findings, played fast and loose with the passage from correlation to causation.

When I read the CNN article, I didn’t think much of it, but my Allegheny College colleague Amelia Finaret got in touch with me asking me if I’d be interested in writing a short piece commenting on the Seidelmann et al. study for submission to The Lancet, the gist of which would be about the difficulty posed by making causal inference from observational data.

After a few days of iterating on our manuscript, we submitted it to The Lancet. I was happy to see that it was published yesterday, alongside several other comments on the Seidelmann et al. study. Best of all is the fact that our comment is open-access, meaning anyone with an Internet connection can read it. (Thank goodness for the fact that the publication costs for comments are cross-subsidized by the authors of original research articles!)

Here is a link to the .pdf of our comment; here is a link to the web version. The gist of our argument is that because of the presence of unobserved confounders, one cannot make a causal statement about the relationship between carbohydrate consumption and mortality. In other words, not all that glitters is gold.

 

The Art of Research Discovery and Writing Good Articles

Tom Reardon is one of my favorite agricultural economists. Not only is he incredibly productive (he has over published over 150 articles and his work has garnered over 27,000 Google Scholar citations), his work also has real-world policy impact (he was the first agricultural economist invited to the World Economic Forum in Davos). Over the years, Tom has been a wonderful mentor, and he has become a very good friend.

To know Tom is to love him, and if you know Tom well, you know that he has laser-like focus when it comes to his research, but that it can be hard to get him to focus on something that is not the writing of whatever he is currently working on right now. Over dinner, he is likely to go from discussing the etymology of an obscure French word to how e-commerce is disrupting food systems to how he has been struggling to make good brisket sous vide… all within five minutes!

So I was particularly happy to receive an email from Tom earlier this week in which he linked to a talk in which he focuses 75-minutes on the art of research discovery and writing good articles. If you are a researcher, whether early-career or seasoned, this is one of those rare occasions where a master craftsman takes the time to generously share some deep insights into his craft.

‘Metrics Monday: Goodness of Fit with Panel Data in Stata

With panel data, it is not uncommon to present regression results by starting with a pooled ordinary least squares (OLS) regression, then moving on to a specification with fixed effects (FE). If anything, this helps the reader see how important time-invariant unobserved heterogeneity is to your coefficient estimates.

Let y denote your outcome variable, x denote your control variables, and unit denote the unit of observation within which you have variation. If you use Stata, one of the problem that comes from using

xtreg y x, fe i(unit)

instead of

reg y x i.unit

is that none of the R-square measures returned by Stata after the former are in no way comparable to the R-square returned by Stata after the latter. From the “Assessing goodness of fit” section of the xtreg entry in the Stata manual (click on the image to enlarge it):

 

 

What this means in practice is that if you don’t pay attention to what is going on when making tables of result, you often end up with tables where the R-square in your OLS specification is higher than the R-square in your FE specification. But this is impossible–with the same outcome and control variables, including unit FEs will necessarily raise the R-square since a (usually much) higher percentage of the variation in the outcome is explained by variables on the RHS when using FEs.

This isn’t too bad in and of itself, but of course the first time I noticed this was when someone asked me in a seminar: “Why is your R-square going down instead of up when including fixed effects?,” and I had no good answer other than “I’ll have to check and get back to you on this,” which is seminar-speak for “Beats me.”

Here is a simple (if not terribly elegant) workaround I have come up with and have used and reused in papers where I use the xtreg set of commands. After estimating

xtreg y x, fe i(unit)

I add the following lines of code

egen ybar = mean(y)
gen y2 = (y - ybar)^2
predict resid, e
gen e2 = resid^2
drop resid
egen sse = sum(e2)
egen sst = sum(y2)
gen r2 = 1 - sse/sst
sum r2
drop sse sst y2 e2 ybar r2

The variable r2 is then “right” (i.e., comparable to OLS) R-square.