Last updated on May 26, 2014
I have been working on a paper about food-borne illness lately, and one of the things I have learned is that for a specific outbreak of food-borne illness to show up in the Centers for Disease Control’s (CDC) data, the stars need to be properly aligned.
Specifically, you have to get sick enough that you see the doctor about it. Then, upon determining that your illness was food-borne, your doctor needs to notify the health authorities of the county you live in. Finally, the health authorities in your county need to notify the CDC. So necessarily, the CDC data on food-borne illnesses is an undercount, and just how systematic is this undercounting varies by state… which poses a number of econometric problems for this researcher.
All this to say that I was particularly excited about this New York Times article discussing a pilot project of the New York City Department of Health and Mental Hygiene:
Using a software program developed by Columbia University, city researchers combed through 294,000 Yelp reviews for restaurants in the city over a period of nine months in 2012 and 2013, searching for words like “sick,” “vomit” and “diarrhea,” along with other details. After investigating those reports, the researchers substantiated three instances when 16 people had been sickened. Those people had eaten the house salad, shrimp and lobster cannelloni, and macaroni and cheese spring rolls at three restaurants that the agencies are not identifying.
Now, this would also lead to an undercounting — how many people actually write Yelp reviews? — and it might only work in big cities were people can remain anonymous — in smaller towns, it might be easier for restaurateurs to remember just who ordered the E. coli-tainted spinach salad — but it is nonetheless a clever use of big data.
ht: Janet.