{"id":10793,"date":"2015-04-08T05:00:43","date_gmt":"2015-04-08T09:00:43","guid":{"rendered":"http:\/\/marcfbellemare.com\/wordpress\/?p=10793"},"modified":"2015-05-24T10:49:41","modified_gmt":"2015-05-24T14:49:41","slug":"the-use-and-misuse-of-r-squared-technical","status":"publish","type":"post","link":"https:\/\/marcfbellemare.com\/wordpress\/10793","title":{"rendered":"The Use and Misuse of R-Squared [Technical]"},"content":{"rendered":"<p>Last week the Midwest Economics Association (MEA) meetings were taking place in Minneapolis. Because a few friends were presenting at MEA, I decided to go check out the sessions at which they were presenting.<\/p>\n<p>At one of the sessions I attended, a graduate student presented a very cool paper in which he had run a randomized controlled trial to determine\u00a0the effect of a treatment variable <em>D<\/em>\u00a0on an outcome <em>Y<\/em>, randomizing <em>D <\/em>and\u00a0collecting information on a number of control variables\u00a0<em>X\u00a0<\/em>in addition to collecting information on\u00a0<em>Y<\/em>.<\/p>\n<p>The graduate student came from a good department, so he carefully motivated his paper by talking about the policy relevance of the relationship between\u00a0<em>D\u00a0<\/em>and\u00a0<em>Y<\/em>, explaining that policy makers cared deeply about said relationship, and how they made a big deal of it.<\/p>\n<p>When presenting his results, the presenter did what we commonly do in economics, which is to show a table presenting several specifications of the regression of interest, from the most\u00a0parsimonious (i.e., a simple regression of\u00a0<em>Y\u00a0<\/em>on just\u00a0<em>D<\/em>) to the least\u00a0parsimonious (i.e., a complex regression of\u00a0<em>Y<\/em> on\u00a0<em>D<\/em> and all the available controls\u00a0<em>X<\/em>).<\/p>\n<p>The problem, however, was that the R-squared measure&#8211;the regression&#8217;s <a title=\"Coefficient of Determination\" href=\"http:\/\/en.wikipedia.org\/wiki\/Coefficient_of_determination\" target=\"_blank\">coefficient of determination<\/a>&#8211;for the simple regression of\u00a0<em>Y\u00a0<\/em>on just\u00a0<em>D <\/em>(i.e., the most parsimonious specification)\u00a0was about 0.01, meaning that the treatment variable\u00a0<em>D<\/em> explained about 1 percent of the outcome of interest.<!--more--><\/p>\n<p>I commented\u00a0that this was interesting, given that if policy makers\u00a0made a big deal about the relationship\u00a0between\u00a0<em>D\u00a0<\/em>and\u00a0<em>Y<\/em>, one of the points\u00a0of the paper (which the author did not already make)\u00a0should be\u00a0that policy makers should really spend their time on other things. Indeed, if\u00a0<em>D\u00a0<\/em>explains only 1 percent of the variation in\u00a0<em>Y<\/em>, focusing on\u00a0<em>D<\/em> in order to stimulate\u00a0<em>Y\u00a0<\/em>is unlikely to be cost effective. In other words, there are other factors out there that explain 99 percent of the variation in\u00a0<em>Y<\/em>, and it is likely that among those factors, at least one or two will play a significant role&#8211;or at least, a role that is much more important than\u00a0<em>D<\/em>.<\/p>\n<p>The foregoing strikes me as a\u00a0useful use of R-squared, but the\u00a0measure is misused more often than not, especially by neophytes and by people outside of economics. Indeed, in his <em><a href=\"http:\/\/www.amazon.com\/gp\/product\/1405182571\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1405182571&amp;linkCode=as2&amp;tag=marfbel-20&amp;linkId=4MV6RHQ3GVCAHGYV\">Guide to Econometrics<\/a><\/em>, which is still my favorite econometrics text, the late Peter Kennedy noted:<\/p>\n<p><a href=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2015\/04\/Kennedy.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-10797\" src=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2015\/04\/Kennedy.jpg\" alt=\"Kennedy\" width=\"610\" height=\"199\" srcset=\"https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2015\/04\/Kennedy.jpg 610w, https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2015\/04\/Kennedy-580x189.jpg 580w\" sizes=\"auto, (max-width: 610px) 100vw, 610px\" \/><\/a><\/p>\n<p>Indeed, I have frequently received referee reports where a reviewer noted that my regressions had a low R-squared because it was somewhere between 0.05 and 0.30. But in applied micro, what we typically care about is identification (which the grad student presenting at MEA had in droves) rather than how good our regression is at cranking out accurate predictions (which is essentially what R-squared tells you).<\/p>\n<p>Besides, in applied micro, any R-squared around 0.25 is considered very good. Given how much unobserved heterogeneity we deal with, anything more than 0.30 is a crazy high R-squared when using cross-sectional data. Time series econometricians, however, often deal with R-squared measures of 0.85 or above, because their variables often tend to move together over time.<\/p>\n<p>At any rate, unless you have a very good reason to do so such as the one I discuss above, you shouldn&#8217;t care either way about the size of your R-squared.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last week the Midwest Economics Association (MEA) meetings were taking place in Minneapolis. Because a few friends were presenting at MEA, I decided to go<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/marcfbellemare.com\/wordpress\/10793\">Continue reading<span class=\"screen-reader-text\">The Use and Misuse of R-Squared [Technical]<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-10793","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPg8-2O5","_links":{"self":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/10793","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/comments?post=10793"}],"version-history":[{"count":7,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/10793\/revisions"}],"predecessor-version":[{"id":10926,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/10793\/revisions\/10926"}],"wp:attachment":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/media?parent=10793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/categories?post=10793"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/tags?post=10793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}