{"id":13183,"date":"2018-10-22T05:00:51","date_gmt":"2018-10-22T10:00:51","guid":{"rendered":"http:\/\/marcfbellemare.com\/wordpress\/?p=13183"},"modified":"2018-10-21T10:25:15","modified_gmt":"2018-10-21T15:25:15","slug":"metrics-monday-goodness-of-fit-with-panel-data-in-stata","status":"publish","type":"post","link":"https:\/\/marcfbellemare.com\/wordpress\/13183","title":{"rendered":"&#8216;Metrics Monday: Goodness of Fit with Panel Data in Stata"},"content":{"rendered":"<p>With panel data, it is not uncommon to present regression results by starting with a pooled ordinary least squares (OLS) regression, then moving on to a specification with fixed effects (FE). If anything, this helps the reader see how important time-invariant unobserved heterogeneity is to your coefficient estimates.<\/p>\n<p>Let <em>y<\/em> denote your outcome variable, <em>x<\/em> denote your control variables, and <em>unit<\/em> denote the unit of observation within which you have variation. If you use Stata, one of the problem that comes from using<\/p>\n<pre>xtreg y x, fe i(unit)<\/pre>\n<p>instead of<\/p>\n<pre>reg y x i.unit<\/pre>\n<p>is that none of the R-square measures returned by Stata after the former are in no way comparable to the R-square returned by Stata after the latter. From the &#8220;Assessing goodness of fit&#8221; section of the xtreg entry in the Stata manual (click on the image to enlarge it):<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2018\/10\/xtreg.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-13185\" src=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2018\/10\/xtreg-580x358.jpg\" alt=\"\" width=\"580\" height=\"358\" srcset=\"https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2018\/10\/xtreg-580x358.jpg 580w, https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2018\/10\/xtreg-768x475.jpg 768w, https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2018\/10\/xtreg-940x581.jpg 940w, https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2018\/10\/xtreg.jpg 1249w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>What this means in practice is that if you don&#8217;t pay attention to what is going on when making tables of result, you often end up with tables where the R-square in your OLS specification is higher than the R-square in your FE specification. But this is impossible&#8211;with the same outcome and control variables, including unit FEs will necessarily raise the R-square since a (usually much) higher percentage of the variation in the outcome is explained by variables on the RHS when using FEs.<\/p>\n<p>This isn&#8217;t too bad in and of itself, but of course the first time I noticed this was when someone asked me in a seminar: &#8220;Why is your R-square going <em>down<\/em>\u00a0instead of up when including fixed effects?,&#8221; and I had no good answer other than &#8220;I&#8217;ll have to check and get back to you on this,&#8221; which is seminar-speak for &#8220;Beats me.&#8221;<\/p>\n<p>Here is a simple (if not terribly elegant) workaround I have come up with and have used and reused in papers where I use the xtreg set of commands. After estimating<\/p>\n<pre>xtreg y x, fe i(unit)<\/pre>\n<p>I add the following lines of code<\/p>\n<pre>egen ybar = mean(y)\r\ngen y2 = (y - ybar)^2\r\npredict resid, e\r\ngen e2 = resid^2\r\ndrop resid\r\negen sse = sum(e2)\r\negen sst = sum(y2)\r\ngen r2 = 1 - sse\/sst\r\nsum r2\r\ndrop sse sst y2 e2 ybar r2<\/pre>\n<p>The variable r2 is then &#8220;right&#8221; (i.e., comparable to OLS) R-square.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>With panel data, it is not uncommon to present regression results by starting with a pooled ordinary least squares (OLS) regression, then moving on to&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/marcfbellemare.com\/wordpress\/13183\">Continue reading<span class=\"screen-reader-text\">&#8216;Metrics Monday: Goodness of Fit with Panel Data in Stata<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-13183","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPg8-3qD","_links":{"self":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/13183","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/comments?post=13183"}],"version-history":[{"count":4,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/13183\/revisions"}],"predecessor-version":[{"id":13188,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/13183\/revisions\/13188"}],"wp:attachment":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/media?parent=13183"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/categories?post=13183"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/tags?post=13183"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}