As I have mentioned a few times on this blog, I teach the second-year qualifying research paper seminar in our PhD program.
As is the case of most other agricultural and applied economics (or even economics) programs worth their salt, after they are done with the bulk of their coursework, our students must demonstrate that they have made suitable proficiency as researchers by writing an entire research paper from start to finish, with the guidance of a faculty advisor of their choosing.
Because our program is an applied economics program, and because the bulk of agricultural and applied economics nowadays consists of applied microeconomics of the empirical kind, one of the questions I often have to answer in student emails is of the form: “Should I do A or B?” Specifically, questions like
- Should I estimate a linear probability model, a probit, or a logit?
- Should I use sampling weights or not?
- Should I cluster my standard errors or not?
- Should I take the logarithm of my dependent variable, or just use its level?
- Should I estimate my spline regression with three, four, five, or more knots?
- Should I estimate this in level or in first differences?
- Should I express my variables in per capita terms, or just include them as is and control for population size?
Often, the question is asked as though there is a single answer. This might have to do with the fact that, fresh off of two years of coursework that is heavy on the theory (even for econometrics, they mostly go through the theory of econometrics), our students are conditioned to think there is a single answer to every question (though to be fair, the most important realization I had during my first-year coursework was that there was more than one answer to a research question, even theoretical ones, depending on the assumptions one is willing to make).
But this is where economics becomes more art than science, more rhetoric than dialectic, and where students have to learn that there is more than one way to skin a cat. In applied work, especially in applied work relying on observational data (a second-year paper rarely allows one the time to collect one’s own experimental data), the key to convincing your readers that X causes Y is to show that your core result holds over and over, no matter how you slice the data, and to show that if there are some cases where X does not seem to cause Y, you have a good story for why that is the case.
But often, your core specification involves many interchangeable parts. For example, you might be regressing a dichotomous variable for whether a district elected a democrat over the last five elections on a number of right-hand side variables (RHS; for example, district characteristics) expressed in per capita terms. So there are at least six specifications you could estimate here:
- Linear probability model with RHS variables in per capita terms
- Linear probability model with RHS variables in levels
- Probit with RHS variables in per capita terms
- Probit with RHS variables in levels
- Logit with RHS variables in per capita terms
- Logit with RHS variables in levels
“But wait, there’s more!,” as they say in late-night infomercials: You would also want to show your results with and without clustering at the district level (for a total of 12 specifications), and you might want to show this with and without state and/or district fixed effects (for a total of 36 specifications). So what to do?
My answer is almost always “Do both,” put your preferred results in the paper, and put the robustness checks in an appendix “for reviewers’ eyes only.”
Just like there is the Magic Appendix Trick when giving a talk (you can put a lot of stuff in an appendix that is not for presentation, but only there in case someone asks), there is also a Magic Appendix Trick when submitting papers for review, and it consists of having an appendix for reviewers where you put everything you have done. Running additional regressions is virtually costless, as is the cost of formatting those regressions for inclusion in a paper, thanks to commands like -outreg- or -outtex- in Stata. Having such an appendix sends a signal that you conducted careful empirical work, that you have not cherry picked your results and, ultimately, that your finding is credible.