{"id":12856,"date":"2018-02-26T05:00:47","date_gmt":"2018-02-26T11:00:47","guid":{"rendered":"http:\/\/marcfbellemare.com\/wordpress\/?p=12856"},"modified":"2018-02-27T09:33:36","modified_gmt":"2018-02-27T15:33:36","slug":"metrics-monday-what-to-do-instead-of-logx-1","status":"publish","type":"post","link":"https:\/\/marcfbellemare.com\/wordpress\/12856","title":{"rendered":"&#8216;Metrics Monday: What to Do Instead of log(x +1)"},"content":{"rendered":"<p>I was in Helsinki last week for the <a href=\"https:\/\/www.wider.unu.edu\/event\/waves-varhs-data\">UNU-WIDER workshop on the\u00a0Vietnam Access to Resources Household Survey (VARHS) data<\/a>, presenting work that my coauthors and I have been doing using these data.<\/p>\n<p>One thing that I saw a few instances of during the workshop was the following. A researcher wants to a variable x in a regression, but that variables needs to be logged. Because there are many zero-valued observations of x, and because log(0) is undefined, the author simply uses log(x +1), or log(x + 0.001), or log(x + 0.00001), and so on.<\/p>\n<p>This post is about what to do in such cases. There are many instances in development where you&#8217;d like to include a financial variable&#8211;say, the value of chemical fertilizer used on a given plot, for example&#8211;where many observations will have a zero-valued observation&#8211;in the chemical fertilizer example, not everyone in the data will use chemical instead of organic fertilizer, and so they will report a zero when you ask them what was the value of chemical fertilizer used on any of their plots.<\/p>\n<p>When you want to log a variable x but that x has many zero-valued observations, there are three things you can do in principle:<!--more--><\/p>\n<ol>\n<li>Use log(x), and let the chips fall where they may, meaning that you just drop those observations for which x = 0. If x is assigned experimentally, there is no harm in doing that. But in most cases, x will not be as good as random, which means that merely dropping those observations for which x = 0 will introduce selection in your sample, which limits the external validity of your findings.<\/li>\n<li>Just use x. Presumably, this is not an option if you are here reading this post! More seriously, there are some cases where you really need to take a log. My colleague Jason Kerwin&#8217;s rule of thumb is to log all financial variables, for instance, or you might want to estimate a Cobb-Douglas or translog production function.<\/li>\n<li>Use log(x + 1), log(x + 0.001), or some variant thereof. This is a method that <a href=\"http:\/\/www.jstor.org\/stable\/pdf\/1837175.pdf\">MaCurdy and Pencavel introduced in a 1986 JPE article<\/a>, and it has long been the workhorse way to deal with those wayward zero-valued observations.<\/li>\n<li>Use the newer, widely accepted way of doing things, which is to take an inverse hyperbolic sine (IHS) transformation of x.<\/li>\n<\/ol>\n<p>The IHS transformation of x&#8211;formally denoted arsinh x, but I usually denote it IHS(x)&#8211;is such that<\/p>\n<p>(1) IHS(x) = ln(x + \\sqrt(x^2 + 1)),<\/p>\n<p>which you can see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Inverse_hyperbolic_functions#Inverse_hyperbolic_sine\">here<\/a> if you have difficulty deciphering the above. If you use Stata, this is what I do to take IHS(x):<\/p>\n<p>. gen IHS_x = ln(x + ((x^2 +1)^0.5))<\/p>\n<p>The beauty of the IHS transformation is that (i) it behaves similar to a log, and (ii) it allows retaining zero-valued observations. In cases where it is useful, it even (iii) allows retaining negative-valued observations; in <a href=\"https:\/\/academic.oup.com\/ajae\/article-abstract\/95\/4\/877\/93643\">Bellemare et al. (2013)<\/a>, for instance, we needed to regress marketable surplus (production minus consumption; this can take negative, zero, or positive values) on a number of variables, and so the IHS transformation came in very handy.<\/p>\n<p>When I brought this up to a colleague at the Helsinki workshop, he said: &#8220;You should write a &#8216;Metrics Monday post about this!&#8221; My initial reaction was &#8220;Why? <a href=\"http:\/\/worthwhile.typepad.com\/worthwhile_canadian_initi\/2011\/07\/a-rant-on-inverse-hyperbolic-sine-transformations.html\">Everyone knows about this<\/a>!&#8221;But clearly not everyone does know about this, so I came to the conclusion that this post might be worthwhile. This is especially so given that I have had to mention the IHS transformation more times than I can remember when reviewing articles for journals.<\/p>\n<p>Relative to the old-fashioned ln(x + 1) way of dealing with this problem, the IHS transformation is a newer, widely accepted way of dealing with ln(0). Note, however, that it does have its detractors; Martin Ravallion, for instance, <a href=\"https:\/\/economicsandpoverty.com\/read\/measurement-tools\/\">notes that the IHS transformation is not concave everywhere<\/a>, which leads to violations of the Pigou-Dalton transfer axiom when measuring poverty and inequality. For the rest of us, however, taking an IHS transformation is good enough. This is especially so when you apply it to a control variable, i.e., when you apply it neither to your dependent variable or your outcome of interest.<\/p>\n<p>In writing this post, have looked in vain (albeit <em>very<\/em> quickly) to see if anyone had written anything about what to do with the coefficient on an IHS-transformed variable to recover an elasticity, but could not find anything directly about this. If you know of such an article or resource, let me know, and I&#8217;ll be happy to update the post and credit you.<\/p>\n<p>For more about the IHS transformation itself, see <a href=\"http:\/\/www.jstor.org\/stable\/2288929\">Burbidge et al. (JASA, 1988)<\/a> and <a href=\"http:\/\/www.jstor.org\/stable\/2526842\">MacKinnon and Magee (IER, 1990)<\/a>. For applications, see <a href=\"https:\/\/academic.oup.com\/ajae\/article-abstract\/75\/4\/1056\/59823\">Moss ans Shonkwiler (AJAE, 1993)<\/a>,\u00a0<a href=\"http:\/\/www.jstor.org\/stable\/pdf\/1243958.pdf\">Yen and Jones (AJAE, 1997)<\/a>,\u00a0\u00a0<a href=\"https:\/\/www.degruyter.com\/view\/j\/bejeap.2005.5.issue-1\/bejeap.2006.5.1.1430\/bejeap.2006.5.1.1430.xml\">Pence (beJEAP, 2006)<\/a>, and the aforementioned\u00a0<a href=\"https:\/\/academic.oup.com\/ajae\/article-abstract\/95\/4\/877\/93643\">Bellemare et al. (AJAE, 2013)<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was in Helsinki last week for the UNU-WIDER workshop on the\u00a0Vietnam Access to Resources Household Survey (VARHS) data, presenting work that my coauthors and<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/marcfbellemare.com\/wordpress\/12856\">Continue reading<span class=\"screen-reader-text\">&#8216;Metrics Monday: What to Do Instead of log(x +1)<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-12856","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPg8-3lm","_links":{"self":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/12856","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/comments?post=12856"}],"version-history":[{"count":10,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/12856\/revisions"}],"predecessor-version":[{"id":12866,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/12856\/revisions\/12866"}],"wp:attachment":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/media?parent=12856"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/categories?post=12856"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/tags?post=12856"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}