{"id":11057,"date":"2015-06-15T05:00:18","date_gmt":"2015-06-15T09:00:18","guid":{"rendered":"http:\/\/marcfbellemare.com\/wordpress\/?p=11057"},"modified":"2015-06-14T12:36:18","modified_gmt":"2015-06-14T16:36:18","slug":"metrics-monday-what-to-do-with-endogenous-control-variables","status":"publish","type":"post","link":"https:\/\/marcfbellemare.com\/wordpress\/11057","title":{"rendered":"&#8216;Metrics Monday: What to Do with Endogenous Control Variables?"},"content":{"rendered":"<p>Continuing the &#8216;Metrics Monday series, and continuing on last week&#8217;s theme of control variables discussed in the de Luca et al. working paper, I wanted to discuss endogenous control variables. Note that\u00a0a lot of what follows is me thinking out loud, and I may well be mistaken about all of this. If so, I welcome comments exploring this topic.<\/p>\n<p>As always, suppose you have observational data, and you are interested in estimating the causal effect of your variable interest D on your outcome of interest Y, and you also have access to a vector of control variables X. For the sake of argument, let&#8217;s assume there is only one control variable in the equation<\/p>\n<p>(1) Y = a + bX + cD + e.<\/p>\n<p>The parameter of interest is c. If you have observational data, then you know that in most cases E(D&#8217;e) is different from zero&#8211;that is, D is endogenous to Y in equation 1, and c does not capture the causal effect of D on Y.<!--more--><\/p>\n<p>But what about X? It often happens that X is also obviously endogenous to Y&#8211;say, because X is a decision variable which is determined by each individual respondent&#8217;s expectation of Y, which would constitute a case of reverse causality.<\/p>\n<p>In terms of the peer-review process one thing I would not encourage you to do is\u00a0to try to find an instrumental variable for X. Why is that? To put it simply, if a bit cynically: Because D is your variable of interest, and it is difficult enough to deal with the fact that D is endogenous&#8211;that is, how well you do so will determine how well your paper is received by reviewers and editors&#8211;that attempting to deal with the endogeneity of your control variable exponentially expands the\u00a0number of reasons why your reviewers might recommend that your paper be rejected.<\/p>\n<p>Seriously, I still\u00a0sometimes see papers where the authors are looking at the effect of some variable of interest D on some outcome of interest Y, but where they spend a considerable amount of time trying to deal with X (generally, those authors are also waist-deep in likelihood procedures like the Heckman selection model, too, so dealing with X is only one of a laundry list of things they burden the reader with).\u00a0 But that is really besides the point, because it is D that is the variable of interest, not X.<\/p>\n<p>So how do we\u00a0deal with endogenous controls? First, let&#8217;s think about what an endogenous controls means:<\/p>\n<ul>\n<li>An endogenous control X means that E(X&#8217;e) is different from zero, which obviously means that the estimated b in equation 1 will be biased.<\/li>\n<li>An endogenous control X also means that the OLS estimator for c&#8211;the parameter of interest&#8211;will be biased, since X appears in the formula for the OLS estimator of c (see <a href=\"http:\/\/faculty.cas.usf.edu\/mbrannick\/regression\/Reg2IV.html\">here<\/a> for the OLS estimator in a simple, two-variable case). Moreover, see\u00a0<a href=\"http:\/\/onlinelibrary.wiley.com\/doi\/10.1111\/j.1751-5823.2008.00045.x\/abstract\">this article<\/a>\u00a0by Fr\u00f6lich (2008) for a discussion of how both OLS and 2SLS will be inconsistent in the presence of endogenous controls. That is, they do not converge to the true value of the parameter of interest.<\/li>\n<li>Excluding the endogenous control X means that X is now in the error term e, and so if X is correlated with D, then your estimate of c is <em>also\u00a0<\/em>biased.<\/li>\n<\/ul>\n<p>This suggests the following: If D and X are uncorrelated, then it is better to leave X out of your regression altogether, because in that case, it does not bias your estimate of c, <em>no matter how much variation in Y is explained by X<\/em>.<\/p>\n<p>If D and X are correlated, then you\u00a0have a problem either way. Omitting X means that you have an omitted variable bias. Including it means that your estimates are inconsistent. (See <a href=\"https:\/\/en.wikipedia.org\/wiki\/Consistent_estimator#Bias_versus_consistency\">here<\/a> for an enlightening, short discussion of bias vs. consistency.) What should you do? I think the middle-of-the-road approach is the usual &#8220;do both,&#8221; that is to present results both with and without the endogenous control, and see what changes. But even that is not terribly satisfactory, since there is bias in both cases, and &#8220;get a better research design&#8221; is even less helpful.<\/p>\n<p>Ideally, you would find a good (i.e., valid and relevant) IV for X, but those are difficult to find, and if the IVs used\u00a0for endogenous variables of interest D\u00a0in the papers I have seen trying to tackle the of endogenous controls X were usually not the best, the IVs used for those endogenous controls were even worse.<\/p>\n<p>Also see <a href=\"http:\/\/economics.stackexchange.com\/questions\/3194\/what-happens-if-the-control-variables-are-also-endogenous\">here<\/a> for a discussion of this issue which\u00a0I found a bit difficult to follow given the many voices involved (and a few typos, I think). There is also this <a href=\"http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0167715207002131\">article<\/a>\u00a0by Lechner (2008), but it seems specifically geared towards matching methods.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Continuing the &#8216;Metrics Monday series, and continuing on last week&#8217;s theme of control variables discussed in the de Luca et al. working paper, I wanted<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/marcfbellemare.com\/wordpress\/11057\">Continue reading<span class=\"screen-reader-text\">&#8216;Metrics Monday: What to Do with Endogenous Control Variables?<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-11057","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPg8-2Sl","_links":{"self":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11057","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/comments?post=11057"}],"version-history":[{"count":10,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11057\/revisions"}],"predecessor-version":[{"id":11067,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11057\/revisions\/11067"}],"wp:attachment":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/media?parent=11057"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/categories?post=11057"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/tags?post=11057"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}