Between the Introduction and the Conclusion: The “Middle Bits” Formula for Applied Papers
Perfect. Now we just need a formula for the middle bits and we are done.
— Rob Greer (@robgreer1) January 12, 2018
Rob most likely meant to joke, but there actually is such a thing as a formula for the so-called “middle bits”–at least for the kind of paper I usually write.
Let’s look into the outline of the typical paper. When I write a new paper, the first thing I do in LaTex is to create the following sections:
- Theoretical Framework
- Empirical Framework
- Data and Descriptive Statistics
- Results and Discussion
The “middle bits” are everything that is not the introduction or the conclusion, so here is a formula for those. Now, I am not saying “here is the right formula for those”; this is something that has worked for me by reducing the amount of uncertainty I face when writing the typical paper. Let’s flesh out each section.
- Primitives: What are the preferences and/or technology like?
- Variables: What are the choice (i.e., theoretically endogenous) variables? What are the parameters (i.e., theoretically exogenous variables)?
- Assumptions: What assumptions are you making about preferences and/or technology? What assumptions are you making about the choice variables? What assumptions are you making about the parameters?
- Maximization Problem: What are the agents you are studying maximizing? What is the Lagrangian?
- First-Order Conditions: Self-explanatory. In some cases where it is not obvious that you are solving for a maximum or a minimum, you’ll want to show the second-order conditions as well.
- Testable Prediction: State your main testable prediction. Generally, this should map one-to-one with the empirical framework.
- Proof: Prove your main testable prediction. Here, go for simplicity rather than elegance–why go for a proof by construction when a proof by contradiction will do just fine?
- Other Results and Proofs: There might be some side results you can both demonstrate in theory and test empirically. Generally, I think one paper should do one big thing–but there are exceptions.
- Estimation Strategy: What equations will you estimate? How will you estimate them? How will you treat the standard errors? What is the hypothesis test of interest for your main testable prediction? This is why there should generally be a one-to-one mapping from the main testable prediction to the empirical framework. If your outcome variable or variable of interest needs to be constructed or estimated, this is where you’d discuss it.
- Identification Strategy: What would the ideal data set look like to study your question? How close are you to that ideal, and what prevents you from getting closer? Then, discuss in turn how your identification strategy deals or not with (i) unobserved heterogeneity, (ii) reverse causality or simultaneity, and (iii) measurement error. Also think about what a violation of the stable unit treatment value assumption looks like here (does one observation getting treated somehow affect the outcome of another observation?), and whether you can somehow test for it.
Data and Descriptive Statistics
- Data: When was it collected? Where? Why? By whom? How was the sample selected? Who was interviewed, or how were the data collected? What is the sample size? How does it compare to the population of interest? Do you lose any observations? Why? Did you have to impute any values and, if so, how did you do it? Are any variables proxies for the real thing? What does each variable measure, exactly, or how was it constructed?
- Descriptive Statistics: This is simple enough. If you choose to describe the contents of your table of descriptive statistics, tell a story about them, don’t just write up a boring enumeration of means.
- Balance Tests: In cases where you’re looking at a dichotomous (or categorical) variable of interest, how do the treatment and comparison sub-samples differ along the mean of the variables discussed under the previous sub-section?
Results and Discussion
- Preliminary (Nonparametric?) Results: An image is worth 1,000 words. If you can somehow plot the relationship of interest in a two-way scatter with a regression line fit through it, or using kernel density estimates for treatment and comparison, it helps people see for themselves that there is a difference in outcomes in response to your variable of interest.
- Core (Parametric) Results: This is your core test of your main testable prediction. Here, there is no need to go into a discussion of the sign of each significant control variable, unless such a discussion is somehow germane to your core testable prediction.
- Robustness Checks: Those are as important as your core results. Do not neglect them. Slice and dice the data in as many ways as possible, sticking many of these results in an appendix, to show that the main testable predictions is supported by the data and that you haven’t cherry-picked your results. If you use an IV, this is where you’d entertain potential violations of the exclusion restrictions, and test for them one by one. Or maybe you can test for the mechanisms through which your variable of interest affects your outcome of interest.
- Extensions: This is where I might explore treatment heterogeneity, or split my sample between men and women, rural and urban, or by industry.
- Limitations: No empirical result is perfect. How is internal validity limited? How is external validity limited? What are your results not saying, i.e., what mistakes might people make in interpreting them?
There it is. It does not get much more complicated than that, and the above skeleton is the right structure for the papers that I write about 90 percent of the time. Note:
- No separate “literature review” section. In a thesis, you would definitely want a literature review between the introduction and the theoretical framework. But in a paper to be submitted to a journal, your literature review should be a one-paragraph affair in your introduction explaining how your work relates to the closest five to seven studies on the topic.
- You might want to have a section titled “Background” between the introduction and the theoretical framework. This is especially so when you study a legislative change, a new policy whose details are important or, in an IO paper, the features of the market you are studying. This can either be a substitute for or a complement to the theoretical framework.
- You might not need a theoretical framework. Some questions are old (e.g., the effects of land rights on agricultural productivity) and the theory behind them is well documented and does not need to be restated.
- The order between the “Empirical Framework” and “Data and Descriptive Statistics” sections can sometimes be switched. Go with what is logical here.
- You might have noticed that I list “limitations” both under “Results and Discussion” and in my conclusion formula. I really think limitations should be emphasized that way. This is especially true if your work has any policy relevance; you don’t want anyone to interpret your results in ways they should not be interpreted.