{"id":13285,"date":"2019-01-21T05:00:17","date_gmt":"2019-01-21T11:00:17","guid":{"rendered":"http:\/\/marcfbellemare.com\/wordpress\/?p=13285"},"modified":"2019-01-20T12:57:15","modified_gmt":"2019-01-20T18:57:15","slug":"metrics-monday-learning-machine-learning","status":"publish","type":"post","link":"https:\/\/marcfbellemare.com\/wordpress\/13285","title":{"rendered":"&#8216;Metrics Monday: Learning Machine Learning"},"content":{"rendered":"\n<p>A long time ago I promised myself that I would not become one of those professors who gets too comfortable knowing what he already knows. This means that I do my best to keep up-to-date about recent developments in applied econometrics. <\/p>\n\n\n\n<p>So my incentives in writing this series of post isn&#8217;t entirely selfless: Because good (?) writing is clear thinking made visible, doing so helps me better understand econometrics and keep up with recent developments in applied econometrics.<\/p>\n\n\n\n<p>By &#8220;applied econometrics,&#8221; I mean applied econometrics almost exclusively of the  causal-inference-with-observational-data variety. I haven&#8217;t really thought about time-series econometrics since the last time I took a doctoral-level class on the subject in 2000, but that&#8217;s mostly because I don&#8217;t foresee doing anything involving those methods in the future.<\/p>\n\n\n\n<p>One thing that I don&#8217;t necessarily foresee using but that I really don&#8217;t want to ignore, however, is machine learning (ML), especially since ML methods are now being combined with causal inference techniques. So having been nudged by Dave Giles&#8217; <a href=\"https:\/\/davegiles.blogspot.com\/2019\/01\/machine-learning-econometrics.html\">post on the topic<\/a> earlier this week, I figured 2019 would be a good time&#8211;my only teaching this spring is our department&#8217;s second-year paper seminar, and I&#8217;m on sabbatical in the fall, so it really is now or never.<\/p>\n\n\n\n<p>I&#8217;m not a theorem prover, so I really needed a gentle, intuitive introduction to the topic. Luckily, my friend and office neighbor <a href=\"http:\/\/www.stevejmiller.com\">Steve Miller<\/a> also happens to teach our PhD program&#8217;s ML class and to do some work in this area (see his forthcoming <em>Economics Letters<\/em> article on <a href=\"https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=2966194\">FGLS using ML<\/a>, for instance), and he recommended just what I needed: <em><a href=\"https:\/\/www.amazon.com\/gp\/product\/1461471370\/ref=as_li_qf_asin_il_tl?ie=UTF8&amp;tag=marfbel-20&amp;creative=9325&amp;linkCode=as2&amp;creativeASIN=1461471370&amp;linkId=eab014ad9576495d22fa5676b4884fee\">Introduction to Statistical Learning<\/a><\/em>, by James et al. (2013).<\/p>\n\n\n\n<p>The cool thing about James et al. is that it also provides an introduction to R for newbies. Being such a neophyte, going through this book will provide a double learning dividend for me. Even better is the fact that the book is available for free on the companion website (which features R code, data sets, etc.) <a href=\"http:\/\/www-bcf.usc.edu\/~gareth\/ISL\/\">here<\/a>.<\/p>\n\n\n\n<p>I&#8217;m only in chapter 2, but I have already learned some new things. Most of those things have to do with new terminology (e.g., supervised learning wherein you have a dependent variable, vs. unsupervised learning, wherein there is no such thing as a dependent variable), but here is one thing that was new to me: The idea that there is a tradeoff between flexibility and interpretability.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"609\" src=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2019\/01\/JamesFigure-940x609.jpg\" alt=\"\" class=\"wp-image-13286\" srcset=\"https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2019\/01\/JamesFigure-940x609.jpg 940w, https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2019\/01\/JamesFigure-580x376.jpg 580w, https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2019\/01\/JamesFigure-768x498.jpg 768w, https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2019\/01\/JamesFigure.jpg 1264w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><figcaption>(Source: James et al., 2013, Introduction to Statistical Learning, Springer.)<\/figcaption><\/figure>\n\n\n\n<p>Specifically, what this tradeoff says is this: The more flexible your estimation method gets, the less interpretable it is. OLS, for instance, is relatively inflexible: It imposes a linear relationship between Y and X, which is rather restrictive. But it is also rather easy to interpret, since the coefficient on X is an estimate of the change in Y associated with a one-unit increase in X. And so in the figure above, OLS tends to be low on flexibility, but high on interpretability. <\/p>\n\n\n\n<p>Conversely, really flexible methods&#8211;those methods that tend to be very good at accounting for the specific features of the data&#8211;tend to be harder to interpret. Think, for instance, of kernel density estimation. You get a nice graph approximating the distribution of the variable you&#8217;re interested in, whose smoothness depends on the specific bandwidth you chose, but that&#8217;s it: You only get a graph, and there is little in the way of interpretation to be provided beyond &#8220;Look at this graph.&#8221;<\/p>\n\n\n\n<p>Bonus: Throughout all of the readings I&#8217;ve done this week I also came across the following joke (apologies for forgetting where I saw it):<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Q: What&#8217;s a data scientist?<\/p><p>A: A statistician who lives in San Francisco.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>A long time ago I promised myself that I would not become one of those professors who gets too comfortable knowing what he already knows.&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/marcfbellemare.com\/wordpress\/13285\">Continue reading<span class=\"screen-reader-text\">&#8216;Metrics Monday: Learning Machine Learning<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-13285","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPg8-3sh","_links":{"self":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/13285","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/comments?post=13285"}],"version-history":[{"count":3,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/13285\/revisions"}],"predecessor-version":[{"id":13290,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/13285\/revisions\/13290"}],"wp:attachment":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/media?parent=13285"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/categories?post=13285"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/tags?post=13285"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}