Big Ag and Big Data

I am always skeptical of people who bandy big data about. As I have discussed earlier on this blog, big data is great if you’re in the business of forecasting, but it is not so great if your goal is to do social science. I raised two points. First:

[W]ithout the identification of causal relationships, there can be no science, social or otherwise. This means that no matter how large a dataset, if it does not allow answering questions of the form “Does X cause Y?,” that dataset is worthless to scientists.

Second:

There is a fundamental difference between estimating causal relationships and forecasting. The former requires a research design in which X is plausibly exogenous to Y. The latter only requires that X include as much stuff as possible.

When it comes to forecasting, big data is unbeatable. With an ever larger number of observations and variables, it should become very easy to forecast all kinds of things …

But when it comes to doing science, big data is dumb. It is only when we think carefully about the research design required to answer the question “Does X cause Y?” that we know which data to collect, and how much of them. The trend in the social sciences over the last 20 years has been toward identifying causal relationships, and away from observational data — big or not.*

In agriculture, however, the causal relationships are pretty well known (experimental agriculture is an old field of investigation, and we know which inputs to mix together in order to grow crops), and there is a lot more value in reducing the scope of our uncertainty by accurately forecasting yields, weather, and so on. This is where The Economist‘s Schumpeter column last week comes in:

Monsanto’s prescriptive-planting system, FieldScripts, had its first trials last year and is now on sale in four American states. Its story begins in 2006 with a Silicon Valley start-up, the Climate Corporation. Set up by two former Google employees, it used remote sensing and other cartographic techniques to map every field in America (all 25 million of them) and superimpose on that all the climate information that it could find. By 2010 its database contained 150 billion soil observations and 10 trillion weather-simulation points.

The Climate Corporation planned to use these data to sell crop insurance. But last October Monsanto bought the company for about $1 billion—one of the biggest takeovers of a data firm yet seen. Monsanto, the world’s largest hybrid-seed producer, has a library of hundreds of thousands of seeds, and terabytes of data on their yields. By adding these to the Climate Corporation’s soil- and-weather database, it produced a map of America which says which seed grows best in which field, under what conditions.

In short, those big data are an agricultural economist’s wet dream: Imagine knowing exactly what to plant in your field — and where to plant different crops in your field — in order to maximize yields. Imagine knowing exactly what the likelihood of crop failure is so that you can provide a the ideal insurance contract for each and every farmer. This is exactly the kind of innovation that makes me so optimistic about the future of food and that makes me think the neo-Malthusians, just like the Malthusians of old, are wrong.

Oh, and if heaven forbid I ever leave academia for the private sector, you’ll know where to find me…

* Even then, some people are working at bridging the gap between big data and causal inference. See this recent post by Emanuela Galasso on the Development Impact blog for a good discussion, as well as this paper by Belloni et al. in the Journal of Economic Perspectives on big data and causal inference.

No related content found.

3 comments

  1. am

    Would you be found somewhere working on the Monsanto datasets.

    In Africa a lot of seed recommendations are based on average rainfall in the area and take no account of the different soil types in that area. This means that if a area is semi arid short or early maturity seeds are recommended. However some soils that are clay and sand based can actually grow medium and long term seeds with much higher yields as they hold rain longer than silt or gravelly soils. Some do that because of personal experiments with different seed types on their soils but others won’t go against the blanket recommendations that prevail. So roll that satellite over here.

  2. GP

    Big data, moreover, can lead us astray by generating plenty of spuriously (statistically) significant relationships. One could then cherry-pick relationships and make up a story to fit our objectives. This is already a well known danger in social science, but with big data it only worsens.

  3. Pingback: Big Ag and Big Data | LKS Blog Watch | Scoop.it