{"id":11833,"date":"2016-04-11T05:00:22","date_gmt":"2016-04-11T10:00:22","guid":{"rendered":"http:\/\/marcfbellemare.com\/wordpress\/?p=11833"},"modified":"2016-04-12T09:56:44","modified_gmt":"2016-04-12T14:56:44","slug":"metrics-monday-robustness-check-or-data-mining","status":"publish","type":"post","link":"https:\/\/marcfbellemare.com\/wordpress\/11833","title":{"rendered":"&#8216;Metrics Monday: Robustness Check or Data Mining?"},"content":{"rendered":"<figure id=\"attachment_11838\" aria-describedby=\"caption-attachment-11838\" style=\"width: 580px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/04\/Mining.jpg\" rel=\"attachment wp-att-11838\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-11838\" src=\"http:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/04\/Mining-580x387.jpg\" alt=\"&quot;There be three asterisks.&quot; (Source: Wikimedia Commons).\" width=\"580\" height=\"387\" srcset=\"https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/04\/Mining-580x387.jpg 580w, https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/04\/Mining-768x512.jpg 768w, https:\/\/marcfbellemare.com\/wordpress\/wp-content\/uploads\/2016\/04\/Mining-940x627.jpg 940w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><figcaption id=\"caption-attachment-11838\" class=\"wp-caption-text\">&#8220;There be three asterisks.&#8221; (Source: Wikimedia Commons).<\/figcaption><\/figure>\n<p>Last month, Ben Chapman and Don Schaffner, who host\u00a0the Food Safety Talk podcast,\u00a0<a href=\"http:\/\/foodsafetytalk.com\/food-safety-talk\/2016\/3\/6\/food-safety-talk-93-does-your-dog-poop-outside\">discussed<\/a>\u00a0my January Gray Matter\u00a0<a href=\"http:\/\/www.nytimes.com\/2016\/01\/17\/opinion\/sunday\/farmers-markets-and-food-borne-illness.html\">column<\/a>\u00a0in the\u00a0<em>New York Times<\/em>\u00a0in January,\u00a0in which I discussed my work on farmers markets and food-borne illness.<\/p>\n<p>Their discussion was even-handed,\u00a0and Don (I think it was him; I listened to the segment only once, over a month ago) demonstrated a surprising understanding of\u00a0the <a href=\"http:\/\/krugman.blogs.nytimes.com\/2013\/04\/22\/understanding-the-nber\/\">working paper culture<\/a>\u00a0in economics, wherein we circulate working papers well ahead of submitting for publication so as to make our work better in view of publishing it in better journals. But the one part which made my ears perk up was when Ben asked Don (or the other way around; again, it&#8217;s been a while since I listened) why my coauthors and I had looked at the relationship between farmers markets and all those seemingly irrelevant illnesses, and Don said (and I&#8217;m paraphrasing), &#8220;I don&#8217;t know, it looks like data mining.&#8221;<!--more--><\/p>\n<p>This made me conscious once again of the gap there exists between economics and other disciplines when it comes to empirical work. <a href=\"http:\/\/medical-dictionary.thefreedictionary.com\/bench+research\">Bench scientists<\/a> think papers in economics are much too long: &#8220;Why do you need to describe the data in so much detail?,&#8221; or &#8220;Why do you need all those tables that show the same thing over and over?&#8221;<\/p>\n<p>Briefly, on why we need to describe the data in so much detail, much of the\u00a0cleavage between economics and bench science comes from the fact that bench scientists deal exclusively with experimental data. In economics, however, for all this talk of field experiments and lab-in-the-field experiments, experimental data is still the exception and observational data the norm. And when\u00a0you are dealing with observational\u00a0data, chances are you are dealing with survey data, in which case\u00a0you are almost surely\u00a0dealing with responses provided by human beings whose answers are not always the most reliable. In such cases, it is helpful to explain where, when, and how the data were collected so your readers can tell whether there might be\u00a0anything hinky going on with your results.<\/p>\n<p>On why we need all those tables, that too has to do with the fact that much of economics deals with observational data. Unlike experimental data, which often allow for a simple comparison of means between treatment and control groups, observational data require\u00a0one to slice the data in many different ways to make sure that a given finding is not spurious, and that the researchers have\u00a0not cherry-picked their\u00a0findings and reported the one specification in which what\u00a0they wanted to\u00a0find\u00a0turned out to be there.<\/p>\n<p>As such, all those tables of robustness checks are there\u00a0to do the exact opposite of data mining. And as for why I\u00a0look at seven different types of food-borne illness in my work on farmers markets and food-borne illness, that&#8217;s because when you find a positive relationship between, say, D and Y = Y_{1} + Y_{2} + &#8230; + Y_{7}, it makes sense to want to know whether that positive relationship comes from Y_{1}, Y_{2}, &#8230;, or Y_{7}, i.e., the different constituent parts of Y.<\/p>\n<p>On the one hand, finding that none of the constituent parts of Y\u00a0is associated with D\u00a0would cast the initial finding that there is a relationship between D\u00a0and Y\u00a0in question. If I had found that outbreaks or cases of <em>none<\/em> of the top seven food-borne illnesses reported by the CDC were associated with the number of farmers markets, my main finding would have been pretty weak. On the other hand, finding that one ore more of the constituent parts of Y\u00a0is associated with D\u00a0is interesting in its own right. In my work, this means that knowing that the positive relationship between farmers markets and food-borne illness in general is due to a similar positive relationship between both (i) farmers markets and norovirus and (ii) farmers markets and campylobacter is interesting, since this can guide future research and policy making efforts.<\/p>\n<p>That said, I completely get where Ben and Don were coming from with the comment that this looked like data mining. When you deal primarily (if not exclusively) with experimental data, looking at the relationship between your treatment variable and\u00a0an increasing number of outcomes will eventually yield significance. Even if there is no causal relationship, you would expect significance at the 1, 5, or 10 percent significance level by looking at the relationship between your treatment and 100, 20, or 10 a priori unrelated outcomes. In that case, looking at more outcomes would indeed be data mining. That is why economists have recently started advocating\u00a0for submitting a\u00a0<a href=\"https:\/\/ideas.repec.org\/a\/aea\/jecper\/v29y2015i3p61-80.html\">pre-analysis plan<\/a>\u00a0when doing experimental work, i.e., a document you submit before you begin collecting data in which you explain exactly which outcomes you will be looking at and how. (See <a href=\"http:\/\/www.ingentaconnect.com\/content\/aea\/jep\/2015\/00000029\/00000003\/art00005\">here<\/a> for a counterpoint that argues that pre-analysis plans are not always that useful, especially in case where one can do replication research.)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last month, Ben Chapman and Don Schaffner, who host\u00a0the Food Safety Talk podcast,\u00a0discussed\u00a0my January Gray Matter\u00a0column\u00a0in the\u00a0New York Times\u00a0in January,\u00a0in which I discussed my work<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/marcfbellemare.com\/wordpress\/11833\">Continue reading<span class=\"screen-reader-text\">&#8216;Metrics Monday: Robustness Check or Data Mining?<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-11833","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPg8-34R","_links":{"self":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11833","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/comments?post=11833"}],"version-history":[{"count":9,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11833\/revisions"}],"predecessor-version":[{"id":11839,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11833\/revisions\/11839"}],"wp:attachment":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/media?parent=11833"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/categories?post=11833"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/tags?post=11833"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}