{"id":11037,"date":"2015-06-08T05:00:57","date_gmt":"2015-06-08T09:00:57","guid":{"rendered":"http:\/\/marcfbellemare.com\/wordpress\/?p=11037"},"modified":"2015-06-07T12:06:52","modified_gmt":"2015-06-07T16:06:52","slug":"control-variables-more-isnt-necessarily-better","status":"publish","type":"post","link":"https:\/\/marcfbellemare.com\/wordpress\/11037","title":{"rendered":"Control Variables: More Isn&#8217;t Necessarily Better"},"content":{"rendered":"<p>My experience with blogging tells me that a post on applied econometrics is always a good way to start the week by generating a large number of views, so let me do a\u00a0&#8216;Metrics Monday yet again this week.<\/p>\n<p>A few weeks ago, Google Scholar alerted me that a new <a href=\"http:\/\/www.eief.it\/files\/2015\/05\/wp-05-on-the-ambiguous-consequences-of-omitting-variables.pdf\">working paper<\/a>\u00a0by Giuseppe de Luca, Jan Magnus, and Franco Peracchi\u00a0might be of interest to me, given the research topics associated with my profile. (I was under the impression that their paper was forthcoming in the <em>Journal of Labor Economics<\/em>, but I somehow cannot find any evidence that this is so. No matter, this is an important contribution.)<\/p>\n<p>Let me first present the abstract of the article, which even after reading three times, I had a hard time making heads or tails of given how cryptic it was. Then, I will present\u00a0the first few paragraphs of the article, which illustrate the point much better. I&#8217;ll then go into the results, which are actually pretty important for applied econometrics.<\/p>\n<p>Here is de Luca et al.&#8217;s abstract:<\/p>\n<blockquote><p>This paper studies what happens when we move from a short\u00a0regression to a long regression (or vice versa), when the long regression is\u00a0shorter than the data-generation process. In the special case where the long\u00a0regression equals the data-generation process, the least-squares estimators\u00a0have smaller bias (in fact zero bias) but larger variances in the long regression\u00a0than in the short regression. But if the long regression is also misspecified,\u00a0the bias may not be smaller. We provide bias and mean squared error comparisons\u00a0and study the dependence of the differences on the misspecification\u00a0parameter.<\/p><\/blockquote>\n<p>Somewhat cryptic, at least to my applied mind. The first two paragraphs of the introduction provide a better idea of what&#8217;s going on:<!--more--><\/p>\n<blockquote><p>Ludwig van Beethoven composed nine symphonies. Suppose a tenth symphony\u00a0is discovered. There is no full score, only three parts are available:\u00a0first violin, cello, and clarinet. This version is recorded and creates a big hit.\u00a0Of course everybody realizes that many instruments are missing \u2014 still, it\u00a0seems one gets a good idea of Beethoven\u2019s tenth. Now the trumpet part is\u00a0discovered and a new recording is made. The new recording is received less\u00a0enthusiastically than the first recording and music experts claim that adding\u00a0the trumpet moves us away from how the real symphony should sound.<\/p>\n<p>This creates a puzzle and a debate among scientists of various disciplines.\u00a0How is it possible that getting closer to the true instrumentation does not\u00a0get us closer to the true sound? Of course, adding all instruments to the\u00a0score creates the true sound, but it seems that adding only some of them\u00a0may not lead to an improvement. An addition in itself is not necessarily an\u00a0improvement, it must be a \u2018balanced addition\u2019.<\/p><\/blockquote>\n<p>So here is what&#8217;s going on: Suppose the true data-generating process (DGP) for an outcome variable\u00a0<em>Y\u00a0<\/em>is composed of three variables, viz.<em> X1<\/em>, <em>X2<\/em>, and <em>X3<\/em>. Ideally, you would want to have all three variables, because regressing <em>Y<\/em> on all three of them yields an unbiased estimate (albeit one that has larger variance, but that is something one can live with) of the coefficient on <em>X1<\/em>. But if you only have access to <em>Y<\/em> and <em>X1<\/em>, you know your estimate of the coefficient on the latter to be biased.<\/p>\n<p>Now suppose you get your hands on <em>X2<\/em>. &#8220;Sweet!,&#8221; you think, &#8220;Throwing this new variable in will reduce the bias in the estimated coefficient for <em>X1<\/em>.&#8221;\u00a0Right?<\/p>\n<p>Not necessarily, actually.\u00a0The point that de Luca et al. make in their paper is that this new addition&#8211;here, <em>X2<\/em>&#8211;has to be &#8220;balanced,&#8221; which their paper aims at defining. Otherwise, its addition might actually increase both the variance\u00a0<em>and<\/em> the bias of your coefficient of interest.<\/p>\n<p>I probably won&#8217;t surprise anyone by saying this is\u00a0actually really important for the practice of econometrics. And in a way, that is something that we intuitively understand and try to insure against when we present increasingly complex sets of results. Ever notice how common it is to present, say, three to five\u00a0columns of results, from the most parsimonious specification&#8211;say, a regression of Y on just D, your variable of interest&#8211;in the first column to the least parsimonious specification&#8211;say, a regression of Y on D, but also different groups of controls, e.g., plot-, individual-, and household-specific controls&#8211;in the last column? The reason we do this is to assess just how stable our estimated coefficient is, and the goal of this exercise is to check whether there aren&#8217;t any wild swings in the estimated coefficient.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>My experience with blogging tells me that a post on applied econometrics is always a good way to start the week by generating a large<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/marcfbellemare.com\/wordpress\/11037\">Continue reading<span class=\"screen-reader-text\">Control Variables: More Isn&#8217;t Necessarily Better<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-11037","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPg8-2S1","_links":{"self":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11037","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/comments?post=11037"}],"version-history":[{"count":4,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11037\/revisions"}],"predecessor-version":[{"id":11041,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11037\/revisions\/11041"}],"wp:attachment":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/media?parent=11037"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/categories?post=11037"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/tags?post=11037"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}