{"id":12676,"date":"2017-09-25T05:00:02","date_gmt":"2017-09-25T10:00:02","guid":{"rendered":"http:\/\/marcfbellemare.com\/wordpress\/?p=12676"},"modified":"2017-09-25T06:39:21","modified_gmt":"2017-09-25T11:39:21","slug":"metrics-monday-good-things-come-to-those-who-weight-part-i","status":"publish","type":"post","link":"https:\/\/marcfbellemare.com\/wordpress\/12676","title":{"rendered":"&#8216;Metrics Monday: Good Things Come to Those Who Weight&#8211;Part I"},"content":{"rendered":"<p>I was sitting in my office on Friday afternoon when one of our third-year PhD students dropped by with an applied econometric question: &#8220;When should I use weights?&#8221;<\/p>\n<p>After telling her to go read <a href=\"http:\/\/jhr.uwpress.org\/content\/50\/2\/301.short\">Solon et al.&#8217;s 2015 piece in the JHR symposium on empirical methods<\/a>, I decided to reread that paper for myself and blog about it this week. In the near future, in part II, I&#8217;m hoping to tackle Andrews and Oster&#8217;s new NBER\u00a0<a href=\"https:\/\/www.brown.edu\/research\/projects\/oster\/sites\/brown.edu.research.projects.oster\/files\/uploads\/paper.pdf\">working paper on weighting for external validity<\/a>.<\/p>\n<p>Before I begin, some clarification: throughout this post, I&#8217;ll be discussing the use of sampling weights. If you are a Stata user, this refers to that statistical package&#8217;s -pweight-, i.e., &#8220;weights that denote the inverse of the probability that the observation is included because of the sampling design.&#8221; I have never had to rely on -aweight-, -fweight-, or -iweight-, so I wouldn&#8217;t know when to use them.<\/p>\n<p>Suppose you oversample a specific group in order to get more precise estimates for that group. For instance, suppose you are interested in the opinion of LGBTQ students. If you randomly sample individuals from a given population of students, you may not have enough LGBTQ respondents in your sample, and so whatever descriptive statistics you come up with for that sub-group might be too noisy. Thus, you may wish to over-sample LGBTQ respondents in order to improve precision. What I mean by this is that you would randomly sample respondents from each group&#8211;LGBTQ and non-LGBTQ&#8211;until you have the right number. So if you target a sample size of n=100 and you&#8217;d like 50% respondents from each group, you split the population in two groups (assuming that&#8217;s easy to do; in the case of LGBTQ students, it might not be easy to do) and sample from each until each group has 50 observations.<!--more--><\/p>\n<p>Here, sampling weights are easy to compute: population proportion divided by sample proportion. So if your sample has 50% LGBTQ respondents and 50% non-LGBTQ respondents but the population has 10% LGBTQ respondents and 90% LGBTQ respondents, the weight on an LGBTQ observation is equal to 0.10\/0.50 = 0.2 and the weight on a non-LGBTQ observation is equal to 0.90\/0.50 = 1.8.<\/p>\n<p>In a sample of n = 100, this means that the sample mean of the sampling weight is equal to (0.2*50 + 1.8 *50)\/100 = 1. The mean of your sampling weight variable should be equal to one.<\/p>\n<p>So when should you use sampling weights? Solon et al. divide empirical work in two rough categories, viz. descriptive statistics and causal inference. For <em>descriptive statistics<\/em>, when you have a sample that is non-random because some groups were oversampled for precision as in my LGBTQ example, if you want to compute descriptive statistics for the entire population, you need to use sampling weights.<\/p>\n<p>For descriptive statistics, Solon and his coauthors have a really good analogy involving the Panel Study of Income Dynamics (PSID):<\/p>\n<blockquote><p>A visualization of how this works is that the PSID sample design views the US population through a funhouse mirror that exaggerates the low-income population. Weighted estimation views the sample through a reverse funhouse mirror that undoes the original exaggeration.<\/p><\/blockquote>\n<p>For causal inference, Solon et al. list three reasons you&#8217;d want to use sampling weights in your estimations:<\/p>\n<ol>\n<li><em>Precision<\/em>: Weights can be used to correct for heteroskedasticity. Most students learn about this in their first econometrics class&#8211;this is the weighted least squares (WLS) estimator&#8211;but they soon forget about it once they learn about the White (1980) correction for heteroskedasticity&#8211;the famous &#8220;comma robust&#8221; of Stata lore. A recent <em>Journal of Econometrics<\/em> article by Romano and Wolf purports to\u00a0<a href=\"http:\/\/www.sciencedirect.com\/science\/article\/pii\/S030440761630197X\">resurrect WLS<\/a>. Here, Solon et al. suggest comparing OLS, WLS, and OLS with robust (either White or clustered) standard errors and discussing the differences in precision when conducting applied work.<\/li>\n<li><em>Consistency<\/em>: If you have endogenous sampling&#8211;that is, if units of observation are selected on the basis of your outcome of interest&#8211;in <a href=\"http:\/\/le.uwpress.org\/content\/88\/1\/155.short\">my job-market paper and 2012 <em>Land Economics<\/em> article<\/a>, for instance, I selected units of observation on the basis of their choice of land-rental contract, which was my outcome of interest&#8211;you need to weight in order to get consistent estimates. There is a slight caveat in the Solon et al. article in cases where your model is correctly specified, but&#8230; when does that actually happen?<\/li>\n<li><em>Identifying Average Partial Effects<\/em>: This is for cases where you&#8217;re interested in a particular average of heterogeneous treatment effects. Since I have little to no applied experience doing this, I won&#8217;t be discussing it beyond encouraging you to read that part if that&#8217;s what you&#8217;re interested in.<\/li>\n<\/ol>\n<p>As always, there are exceptions to those rules, and Solon et al. encourage you to always &#8220;do both,&#8221; show results with and without weights even in cases where they are undoubtedly necessary. But the article is very enlightening, even if you have spent a significant amount of time studying sampling weights (in one of the applied econometrics courses I took during my PhD, we spent what felt like a whole month on sampling weights, and I still learned a whole lot from the Solon et al. article).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was sitting in my office on Friday afternoon when one of our third-year PhD students dropped by with an applied econometric question: &#8220;When should<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/marcfbellemare.com\/wordpress\/12676\">Continue reading<span class=\"screen-reader-text\">&#8216;Metrics Monday: Good Things Come to Those Who Weight&#8211;Part I<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-12676","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPg8-3is","_links":{"self":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/12676","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/comments?post=12676"}],"version-history":[{"count":8,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/12676\/revisions"}],"predecessor-version":[{"id":12685,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/12676\/revisions\/12685"}],"wp:attachment":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/media?parent=12676"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/categories?post=12676"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/tags?post=12676"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}