{"id":11879,"date":"2016-05-02T05:00:48","date_gmt":"2016-05-02T10:00:48","guid":{"rendered":"http:\/\/marcfbellemare.com\/wordpress\/?p=11879"},"modified":"2016-05-02T12:05:41","modified_gmt":"2016-05-02T17:05:41","slug":"metrics-monday-estimating-nonlinear-relationships","status":"publish","type":"post","link":"https:\/\/marcfbellemare.com\/wordpress\/11879","title":{"rendered":"&#8216;Metrics Monday: Estimating Nonlinear Relationships"},"content":{"rendered":"<p>Last week I <a href=\"http:\/\/marcfbellemare.com\/wordpress\/11856\">discussed<\/a> U-shaped relationships, and how to test for them. This week, I would like to discuss higher-order nonlinear relationship, or relationships that are &#8220;more nonlinear&#8221; than U-shaped relationships.<\/p>\n<p>There are many ways one can approach the estimation of nonlinear relationships. I will focus only on a handful of them in this post, from least to most nonlinear, and from semiparametric to nonparametric.<\/p>\n<p>A good first step beyond the estimation of a U-shaped relationship would be to estimate the equation<!--more--><\/p>\n<p>(1) [math]y = \\alpha + f(D) + \\beta{X} + \\epsilon[\/math],<\/p>\n<p>where [math]y[\/math] is the outcome of interest,\u00a0[math]D[\/math] is your treatment variable,\u00a0[math]X[\/math] is a vector of control variables, and [math]\\epsilon[\/math] is an error term with mean zero. I assume for the time being that\u00a0[math]D[\/math] is as good as randomly assigned, so that identification is guaranteed.<\/p>\n<p>The difference between equation (1) and the usual linear regression is the term\u00a0[math]f(D)[\/math], where the outcome\u00a0variable\u00a0[math]y[\/math] is\u00a0related to the treatment variable\u00a0[math]D[\/math] in a nonlinear\u00a0fashion by way of the functional form\u00a0[math]f(\\cdot)[\/math].<\/p>\n<p>In my own work, one estimator I like to use to model such nonlinear relationships is a restricted cubic spline. Before anything, I should perhaps render unto Caesar the things that are Caesar&#8217;s, and note that I learned how to use restricted cubic spline from <a href=\"http:\/\/maartenbuis.nl\/presentations\/bonn09.pdf\">this set of slides by Maarten Buis<\/a>, which includes Stata code that you can readily adapt for your own work.<\/p>\n<p>Briefly, when using a restricted cubic spline, you get &#8220;a continuous smooth function that is linear before the first knot, a piecewise cubic polynomial between adjacent knots, and linear again after the last knot&#8221; (p.1311, <em>Stata Base Reference Manual<\/em>, Release 13). This is in contrast to\u00a0a linear spline, which imposes piecewise linear components between the knots; restricted cubic splines should be used in cases where the relationship of interest is &#8220;more nonlinear&#8221; than what a linear spline allows.<\/p>\n<p>What does a restricted cubic spline look like? Something like this:<\/p>\n<figure id=\"attachment_11882\" aria-describedby=\"caption-attachment-11882\" style=\"width: 537px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/05\/FarmersMarkets.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11882 size-full\" src=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/05\/FarmersMarkets.jpg\" alt=\"FarmersMarkets\" width=\"537\" height=\"393\" \/><\/a><figcaption id=\"caption-attachment-11882\" class=\"wp-caption-text\">Source: Bellemare, King, and Nguyen (2016).<\/figcaption><\/figure>\n<p>The above figure is from the newest version of my paper on farmers markets and food-borne illness, which I will blog about soon. Because the estimated coefficients from a restricted cubic splines are difficult to interpret by merely looking at them, a picture is literally worth a thousand words when estimating such splines. The above figure, which overlays a scatter plot for\u00a0[math]y[\/math] and [math]D[\/math], shows that even when taking into account the nonlinear relationship between those two variables, that relationship looks pretty monotonic (especially considering that there are five knots here, and thus\u00a0four cubic components\u00a0between to linear components).<\/p>\n<p>An even cooler thing you can do with the code provided by Buis in his slides is to estimate and plot\u00a0[math]\\frac{\\partial{y}}{\\partial{D}}[\/math], along with its confidence interval, which is the restricted cubic spline analog\u00a0of the estimated coefficient for\u00a0[math]D[\/math] and its associated confidence interval in the context of a linear regression. For the restricted cubic spline above,\u00a0the marginal effect looks like this:<\/p>\n<figure id=\"attachment_11884\" aria-describedby=\"caption-attachment-11884\" style=\"width: 580px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/05\/MarginalEffect.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11884 size-medium\" src=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/05\/MarginalEffect-580x433.jpg\" alt=\"MarginalEffect\" width=\"580\" height=\"433\" srcset=\"https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/05\/MarginalEffect-580x433.jpg 580w, https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/05\/MarginalEffect.jpg 669w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><figcaption id=\"caption-attachment-11884\" class=\"wp-caption-text\">Source: Author&#8217;s own estimations.<\/figcaption><\/figure>\n<p>The interpretation of the above figure is as follows: The marginal effect of farmers markets per capita on the number of outbreaks of food-borne illness per capita is everywhere positive, but it is only significant at less than the 5 percent level just a little bit below the mean of the standardized distribution of the treatment variable.<\/p>\n<p>In cases where you want to go full nonlinear, you can use lowess smoothing, which estimates\u00a0a locally weighted regression of\u00a0[math]y[\/math] on [math]D[\/math]. If you are interested in those, the Stata reference manual has a good discussion <a href=\"http:\/\/www.stata.com\/manuals13\/rlowess.pdf\">here<\/a>. Without any additional options, estimating the relationship in figure 1 by lowess instead of by a restricted cubic spline gives the following:<\/p>\n<figure id=\"attachment_11886\" aria-describedby=\"caption-attachment-11886\" style=\"width: 580px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/05\/Lowess.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-11886\" src=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/05\/Lowess-580x433.jpg\" alt=\"Source: Author's own estimations.\" width=\"580\" height=\"433\" srcset=\"https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/05\/Lowess-580x433.jpg 580w, https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/05\/Lowess.jpg 669w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><figcaption id=\"caption-attachment-11886\" class=\"wp-caption-text\">Source: Author&#8217;s own estimations.<\/figcaption><\/figure>\n<p>With that said, I want to reiterate that linear splines, restricted cubic splines, and lowess smoothing are only\u00a0a handful of a number of potential estimators you can use to estimate nonlinear relationships. If you are interested in reading more on the topic, here is a very partial reading list, in no particular order:<\/p>\n<ul>\n<li>Henderson and Parmeter (2015), <a href=\"http:\/\/www.amazon.com\/gp\/product\/0521279682\/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0521279682&amp;linkCode=as2&amp;tag=marfbel-20&amp;linkId=SUWIXNK43UDNKDUF\"><em>Applied Nonparametric Econometrics<\/em><\/a>.<\/li>\n<li>H\u00e4rdle (1992), <em><a href=\"http:\/\/www.amazon.com\/gp\/product\/0521429501\/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0521429501&amp;linkCode=as2&amp;tag=marfbel-20&amp;linkId=ZVIHIYFLSBKYEJMN\">Applied Nonparametric Regression<\/a><\/em>.<\/li>\n<li>Pagan and Ullah (1999), <a href=\"http:\/\/www.amazon.com\/gp\/product\/0521586119\/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0521586119&amp;linkCode=as2&amp;tag=marfbel-20&amp;linkId=JC3PSAQQ3IVWATGG\"><em>Nonparametric Econometrics<\/em><\/a>.<\/li>\n<li>Yatchew (2003), <a href=\"http:\/\/www.amazon.com\/gp\/product\/B000VDKAII\/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B000VDKAII&amp;linkCode=as2&amp;tag=marfbel-20&amp;linkId=XSE6E4O7B6X3FCBB\"><em>Semiparametric Regression for the Applied Econometrician<\/em><\/a>.<\/li>\n<\/ul>\n<p>In closing, I would also like to offer a word of caution. As with any &#8220;fancy&#8221; procedure (e.g., tobit, Poisson, multinomial logit, etc.) aimed at properly modeling the DGP, there is a danger an inherent danger that once one has learned to use the nonlinear procedures described above, one starts to see everything as a nail. Don&#8217;t fall into this trap.<\/p>\n<p>As I have described before, there is an unspoken ontological order in which things are to be tackled in applied econometrics and in most social-scientific applications, it will be much more important to have a reasonable shot at causal identification than it is to accurately model nonlinearities in your data. This means that the procedures described above should be reserved\u00a0for those cases where you have experimental data, a selection-on-observables design, etc. which yields plausible identification.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last week I discussed U-shaped relationships, and how to test for them. This week, I would like to discuss higher-order nonlinear relationship, or relationships that<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/marcfbellemare.com\/wordpress\/11879\">Continue reading<span class=\"screen-reader-text\">&#8216;Metrics Monday: Estimating Nonlinear Relationships<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-11879","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPg8-35B","_links":{"self":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11879","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/comments?post=11879"}],"version-history":[{"count":9,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11879\/revisions"}],"predecessor-version":[{"id":11902,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11879\/revisions\/11902"}],"wp:attachment":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/media?parent=11879"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/categories?post=11879"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/tags?post=11879"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}