{"id":11763,"date":"2016-03-07T05:00:06","date_gmt":"2016-03-07T11:00:06","guid":{"rendered":"http:\/\/marcfbellemare.com\/wordpress\/?p=11763"},"modified":"2016-03-06T10:21:54","modified_gmt":"2016-03-06T16:21:54","slug":"metrics-monday-interpreting-coefficients-ii","status":"publish","type":"post","link":"https:\/\/marcfbellemare.com\/wordpress\/11763","title":{"rendered":"&#8216;Metrics Monday: Interpreting Coefficients II"},"content":{"rendered":"<p>Picking up where I left off at the end of\u00a0<a href=\"http:\/\/marcfbellemare.com\/wordpress\/11740\">last week&#8217;s &#8216;Metrics Monday post<\/a>, I wanted to continue discussing the interpretation of coefficients this week.<\/p>\n<p>Specifically, I wanted to discuss the interpretation of coefficients on dummy variables in semi-logarithmic equations. What&#8217;s a semi-logarithmic equation? It&#8217;s an equation of the form<\/p>\n<p>[math]\\ln{y} = \\alpha + \\beta{D} + \\gamma{x} + \\epsilon[\/math],*<!--more--><\/p>\n<p>where [math]y[\/math] is the dependent variable,\u00a0 [math]D[\/math] is a binary (i.e., zero or one) treatment variable, [math]x[\/math] is a vector of control variables, and\u00a0 [math]\\epsilon[\/math] is an error term whose mean is equal to zero. To take a classic example,\u00a0[math]y[\/math] could be an individual&#8217;s wage,\u00a0[math]D[\/math] a variable equal to one if they have a college degree and equal to zero otherwise, and\u00a0[math]x[\/math] their age, gender, etc. The equation above is called &#8220;semi-log&#8221; because we take the logarithm of only one side of the equation.<\/p>\n<p>A log-log equation would regress a logarithm on the left-hand side on a logarithm on the right-hand side, in which case the estimated coefficient is directly interpretable as an elasticity, i.e., a percentage change in\u00a0[math]y[\/math] for a 1% increase in the variable of interest. It is unfortunately not possible to take the log of a binary treatment like [math]D[\/math] above,** because the log of zero is undefined.<\/p>\n<p>As an aside\u00a0if you are interested in the question of why we log some variables (e.g., wages, prices, incomes) but not others (e.g., age, years of education, etc.), see <a href=\"http:\/\/stats.stackexchange.com\/questions\/298\/in-linear-regression-when-is-it-appropriate-to-use-the-log-of-an-independent-va\">this discussion<\/a>.<\/p>\n<p>Perhaps because of the foregoing, a common mistake in interpreting [math]\\beta[\/math] in the equation above is to treat it as a percentage. That is, to claim that [math]\\hat{\\beta}[\/math] tells us by how much [math]\\ln{y}[\/math] changes in percentage terms when an observation goes from untreated to treated, i.e., when [math]D[\/math] goes from zero to on.<\/p>\n<p>In what is perhaps the shortest paper I have ever read, however, <a href=\"https:\/\/ideas.repec.org\/a\/aea\/aecrev\/v71y1981i4p801.html\">Kennedy (1981)<\/a>, who was correcting an earlier mistake in an earlier paper by <a href=\"https:\/\/ideas.repec.org\/a\/aea\/aecrev\/v70y1980i3p474-75.html\">Halvorsen and Palmquist (1980)<\/a>,\u00a0derived\u00a0a formula that allows deriving the effect of the treatment in percentage terms, which is such that<\/p>\n<p>[math]{\\hat{g}} = \\exp[\\hat{\\beta} &#8211; \\frac{1}{2}\\hat{V}(\\hat{\\beta})] &#8211; 1[\/math],<\/p>\n<p>and wherein\u00a0[math]{\\hat{g}}[\/math] is, in Kennedy&#8217;s words &#8220;the percentage impact of the dummy variable on the variable being explained.&#8221;<\/p>\n<p>I thought this was reasonably well-known, but I still review too many papers that directly interpret the coefficient on a dummy in a semi-log equation as the percentage change in [math]y[\/math].<\/p>\n<p>As with so many other things I talk about in this series, Dave Giles had a <a href=\"http:\/\/davegiles.blogspot.com\/2011\/03\/dummies-for-dummies.html\">nice long post<\/a> on this (and other related topics) five years ago. In his post, Dave links to two papers of his: a 1982 paper where he corrects a slight mistake in Kennedy&#8217;s analysis, and a 2011 paper where he discusses exact distributional results for coefficients in semi-log equations. (Among other things, one interesting point Dave&#8217;s post makes is that when you have a log on the left-hand side, discrete changes in one of the explanatory variables will have asymmetric effects.)<\/p>\n<p>Kennedy&#8217;s [math]\\hat{g}[\/math] is easily implementable in Stata as follows:<\/p>\n<pre>. reg y D x\r\n. nlcom exp(_b[D]-0.5*((_se[D])^2))-1<\/pre>\n<p>* Note the spiffy use of TeX in this post. After someone on Reddit noted that the math was hard to understand in my <a href=\"http:\/\/marcfbellemare.com\/wordpress\/11740\">last post<\/a>, I decided to up my blog notation game.<\/p>\n<p>** A common, used instead of a log is the inverse hyperbolic sine (IHS) transformation, which behaves like a log but allows keeping zero and negative values. In my <a href=\"https:\/\/ideas.repec.org\/a\/aea\/aecrev\/v71y1981i4p801.html\">2013 <em>AJAE\u00a0<\/em>article with Barrett and Just on the welfare impacts of price volatility<\/a>, because\u00a0the net sales of each crop (which can take positive, zero, or negative values) were our dependent variables, we used the IHS transformation extensively. The IHS is now <a href=\"https:\/\/chrisblattman.com\/2011\/11\/15\/if-you-know-what-ln1income-is-and-why-its-a-headache-you-should-read-this-post\/\">much more common and acceptable<\/a> than adding 1, 0.01, etc. to your variable of interest.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Picking up where I left off at the end of\u00a0last week&#8217;s &#8216;Metrics Monday post, I wanted to continue discussing the interpretation of coefficients this week.<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/marcfbellemare.com\/wordpress\/11763\">Continue reading<span class=\"screen-reader-text\">&#8216;Metrics Monday: Interpreting Coefficients II<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-11763","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPg8-33J","_links":{"self":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11763","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/comments?post=11763"}],"version-history":[{"count":23,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11763\/revisions"}],"predecessor-version":[{"id":11786,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11763\/revisions\/11786"}],"wp:attachment":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/media?parent=11763"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/categories?post=11763"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/tags?post=11763"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}