I was sitting in my office on Friday afternoon when one of our third-year PhD students dropped by with an applied econometric question: “When should I use weights?”
After telling her to go read Solon et al.’s 2015 piece in the JHR symposium on empirical methods, I decided to reread that paper for myself and blog about it this week. In the near future, in part II, I’m hoping to tackle Andrews and Oster’s new NBER working paper on weighting for external validity.
Before I begin, some clarification: throughout this post, I’ll be discussing the use of sampling weights. If you are a Stata user, this refers to that statistical package’s -pweight-, i.e., “weights that denote the inverse of the probability that the observation is included because of the sampling design.” I have never had to rely on -aweight-, -fweight-, or -iweight-, so I wouldn’t know when to use them.
Suppose you oversample a specific group in order to get more precise estimates for that group. For instance, suppose you are interested in the opinion of LGBTQ students. If you randomly sample individuals from a given population of students, you may not have enough LGBTQ respondents in your sample, and so whatever descriptive statistics you come up with for that sub-group might be too noisy. Thus, you may wish to over-sample LGBTQ respondents in order to improve precision. What I mean by this is that you would randomly sample respondents from each group–LGBTQ and non-LGBTQ–until you have the right number. So if you target a sample size of n=100 and you’d like 50% respondents from each group, you split the population in two groups (assuming that’s easy to do; in the case of LGBTQ students, it might not be easy to do) and sample from each until each group has 50 observations.
Here, sampling weights are easy to compute: population proportion divided by sample proportion. So if your sample has 50% LGBTQ respondents and 50% non-LGBTQ respondents but the population has 10% LGBTQ respondents and 90% LGBTQ respondents, the weight on an LGBTQ observation is equal to 0.10/0.50 = 0.2 and the weight on a non-LGBTQ observation is equal to 0.90/0.50 = 1.8.
In a sample of n = 100, this means that the sample mean of the sampling weight is equal to (0.2*50 + 1.8 *50)/100 = 1. The mean of your sampling weight variable should be equal to one.
So when should you use sampling weights? Solon et al. divide empirical work in two rough categories, viz. descriptive statistics and causal inference. For descriptive statistics, when you have a sample that is non-random because some groups were oversampled for precision as in my LGBTQ example, if you want to compute descriptive statistics for the entire population, you need to use sampling weights.
For descriptive statistics, Solon and his coauthors have a really good analogy involving the Panel Study of Income Dynamics (PSID):
A visualization of how this works is that the PSID sample design views the US population through a funhouse mirror that exaggerates the low-income population. Weighted estimation views the sample through a reverse funhouse mirror that undoes the original exaggeration.
For causal inference, Solon et al. list three reasons you’d want to use sampling weights in your estimations:
- Precision: Weights can be used to correct for heteroskedasticity. Most students learn about this in their first econometrics class–this is the weighted least squares (WLS) estimator–but they soon forget about it once they learn about the White (1980) correction for heteroskedasticity–the famous “comma robust” of Stata lore. A recent Journal of Econometrics article by Romano and Wolf purports to resurrect WLS. Here, Solon et al. suggest comparing OLS, WLS, and OLS with robust (either White or clustered) standard errors and discussing the differences in precision when conducting applied work.
- Consistency: If you have endogenous sampling–that is, if units of observation are selected on the basis of your outcome of interest–in my job-market paper and 2012 Land Economics article, for instance, I selected units of observation on the basis of their choice of land-rental contract, which was my outcome of interest–you need to weight in order to get consistent estimates. There is a slight caveat in the Solon et al. article in cases where your model is correctly specified, but… when does that actually happen?
- Identifying Average Partial Effects: This is for cases where you’re interested in a particular average of heterogeneous treatment effects. Since I have little to no applied experience doing this, I won’t be discussing it beyond encouraging you to read that part if that’s what you’re interested in.
As always, there are exceptions to those rules, and Solon et al. encourage you to always “do both,” show results with and without weights even in cases where they are undoubtedly necessary. But the article is very enlightening, even if you have spent a significant amount of time studying sampling weights (in one of the applied econometrics courses I took during my PhD, we spent what felt like a whole month on sampling weights, and I still learned a whole lot from the Solon et al. article).