Impact Evaluation


24
Apr 12

Identifying Causal Relationships vs. Ruling Out All Other Possible Causes

Portrait of Artistotle (Source: Wikimedia Commons.)

I was in Washington last month to discuss my work on food prices, in which I look at whether food prices cause social unrest, at an event whose goal was to discuss the link between climate change and conflict.

As many readers of this blog know, disentangling causal relationships from mere correlations is the goal of modern science, social or otherwise, and though it is easy to test whether two variables x and y are correlated, it is much more difficult to determine whether x causes y.

So while it is easy to test whether increases in the level of food prices are correlated with episodes of social unrest, it is much more difficult to determine whether food prices cause social unrest.

In my work, I try to do so by conditioning food prices on natural disasters. To make a long story short, if you believe that natural disasters only affect social unrest through food prices, this ensures that if there is a relationship between food prices and social unrest, that relationship is cleaned out of whatever variation which is not purely due to the relationship flowing from food prices to social unrest. In other words, this ensures that the estimated relationship between the two variables is causal. This technique is known as instrumental variables estimation.

Identifying Causal Relationships vs. Ruling Out All Other Causes

As with almost any other discussion of a social-scientific issue nowadays, the issue of causality came up during one of the discussions we had at that event in Washington. It was at that point that someone implied that it did not make sense to talk of causality by bringing up the following analogy: Continue reading →


19
Mar 12

Slides of My Keynote Lecture at Last Weekend’s “Economics and Management of Risk in Agriculture and Natural Resources” Conference

I was trained as an agricultural and applied economist, so I have spent a lot of time doing research on risk as it relates to agriculture and development (see here and here for published articles).

Because of this, I have been involved with the annual Economics and Management of Risk in Agriculture and Natural Resources conference for the past few years.

I first presented at that conference in 2009, and since I had then volunteered to organize the conference, I was in charge of the conference program in 2010 and of logistics in 2011.

This year, I was asked to give the keynote lecture, in which I chose to discuss what the “credibility revolution” that took place in economics over the past ten years or so — which has lead to economists to adopting stricter standards of evidence and of statistical identification — means for agricultural and applied economics as a field.

In case you have an interest in this topic, I am making the slides of my keynote lecture are available. I think the content of those slides is especially relevant for current graduate students of agricultural and applied economics.

The Economics and Management of Risk in Agriculture and Natural Resources conference is usually held somewhere on the Gulf Coast. This year, it was held in Pensacola, FL. I took the picture on top of this post while walking along the beach early Saturday morning.


16
Feb 12

Randomization and Inference

Experiments have become an increasingly common tool for political science researchers over the last decade, particularly laboratory experiments performed on small convenience samples. We argue that the standard normal theory statistical paradigm used in political science fails to meet the needs of these experimenters and outline an alternative approach to statistical inference based on randomization of the treatment. The randomization inference approach not only provides direct estimation of the experimenter’s quantity of interest — the certainty of the causal inference about the observed units — but also helps to deal with other challenges of small samples. We offer an introduction to the logic of randomization inference, a brief overview of its technical details, and guidance for political science experimenters about making analytic choices within the randomization inference framework. Finally, we reanalyze data from two political science experiments using randomization tests to illustrate the inferential differences that choosing a randomization inference approach can make.

That’s the abstract of a forthcoming American Journal of Political Science article by Luke Keele, Corrine McConnaughy, and Ismail White.

That being said, I really can’t wait for summer to arrive so I can finally get through my “Documents to Read” folder.


9
Feb 12

On the (Mis)Use of Regression Analysis: Country Music and Suicide

This article assesses the link between country music and metropolitan suicide rates. Country music is hypothesized to nurture a suicidal mood through its concerns with problems common in the suicidal population, such as marital discord, alcohol abuse, and alienation from work. The results of a multiple regression analysis of 49 metropolitan areas show that the greater the airtime devoted to country music, the greater the white suicide rate. The effect is independent of divorce, southernness, poverty, and gun availability. The existence of a country music subculture is thought to reinforce the link between country music and suicide. Our model explains 51 percent of the variance in urban white suicide rates.

That’s the abstract of an article published in Social Forces – a top-10 journal in sociology — in 1992.

Before my snark gets me into trouble: Yes, I do realize that the article was published in 1992, back when most social science researchers only had a flimsy grasp of identification and causality. I also realize it would be foolish to impose on the authors of the above-referenced article the same standards of identification we impose upon ourselves today.

Yet, I cannot help but think that someone with a lesser of understanding of causality than the average reader of this blog is bound to eventually stumble upon the abstract, think “Hey, that totally makes sense!,” and run with it.

I’m sure there are also examples of such findings in other disciplines. If you know of any, please share.

(HT: Friend and former student Norma Padron, who is doing her PhD at Yale and has just launched a nice health economics blog.)


1
Feb 12

Is It Time for a T Party in Impact Evaluation?

When we write a dynamic model in economics, we typically use the subscript t to denote a given time period, and we usually say that t = 1, 2, …, T, where T denotes the last time period considered by our model. Likewise, we usually use T to denote the number of time periods considered in a longitudinal data set.

With that in mind, the World Bank’s David McKenzie argues for more T in the experiments conducted by development economists in a forthcoming article in the Journal of Development Economics:

The vast majority of randomized experiments in economics rely on a single baseline and single follow-up survey. If multiple follow-ups are conducted, the reason is typically to examine the trajectory of impact effects, so that in effect only one follow-up round is being used to estimate each treatment effect of interest. While such a design is suitable for study of highly autocorrelated and relatively precisely measured outcomes in the health and education domains, this article makes the case that it is unlikely to be optimal for measuring noisy and relatively less autocorrelated outcomes such as business profits, household incomes and expenditures, and episodic health outcomes. Taking multiple measurements of such outcomes at relatively short intervals allows one to average out noise, increasing power. When the outcomes have low autocorrelation and budget is limited, it can make sense to do no baseline at all. Moreover, I show how for such outcomes, more power can be achieved with multiple follow-ups than allocating the same total sample size over a single follow-up and baseline. I also highlight the large gains in power from ANCOVA analysis rather than difference-in-differences analysis when autocorrelations are low and a baseline is taken. This article discusses the issues involved in multiple measurements, and makes recommendations for the design of experiments and related non-experimental impact evaluations.

This brings to mind what one of my friends who works in the microfinance industry had told me the last time we argued about the effects of microfinance on poverty: “It can take a long time to get out of poverty even in the best of scenarios, so evaluating the impact of microfinance after just one or two years tends to shortchange microfinance.”

Moreover, this makes me less worried about not having had the luxury of conducting a baseline survey for the randomized controlled trial I am conducting with Michael Carter and Catherine Guirkinger on the impacts of crop insurance on the welfare of cotton producers in southern Mali. Thankfully, we will be doing at least two rounds of follow-up survey in order to study the dynamic effects of our intervention, and we are working on finding funding for a third round.

UPDATE: In the time between the moment I wrote this post on Sunday morning and the moment it was published, David offered his own blog post on his paper on the World Bank’s Development Impact blog.