Last updated on May 6, 2012
From a new paper (link opens a .pdf file) by Oxford’s Tessa Bold and her coauthors:
The recent wave of randomized trials in development economics has provoked criticisms regarding external validity and the neglect of political economy. We investigate these concerns in a randomized trial designed to assess the prospects for scaling-up a contract teacher intervention in Kenya, previously shown to raise test scores for primary students in Western Kenya and various locations in India. The intervention was implemented in parallel in all eight Kenyan provinces by a nongovernmental organization (NGO) and the Kenyan government. Institutional differences had large effects on contract teacher performance. We find a significant, positive effect of 0.19 standard deviations on math and English scores in schools randomly assigned to NGO implementation, and zero effect in schools receiving contract teachers from the Ministry of Education. We discuss political economy factors underlying this disparity, and suggest the need for future work on scaling up proven interventions to work within public sector institutions.
Bold et al.’s finding points to an important problem with the findings of many randomized controlled trials (RCTs): No matter how careful one is in ensuring that subjects are randomly assigned to the treatment and control groups, almost all RCTs rely on only one implementing partner.
Implementation Bias
Take the RCT my co-authors and I are currently running in Mali, for example. Insurance markets are nonexistent in most developing countries, and agriculture is a risky business due to uncertain weather, among other sources of uncertainty. So we wanted to know whether having access to insurance significantly affected the production decisions (and, ultimately, the welfare) of cotton producers in southern Mali.
In order to answer our research question, we randomly assigned cotton producer cooperatives to the treatment (i.e., offered the insurance) and control (i.e., not offered the insurance) groups. And because we cannot force people to buy insurance, we randomly offered discounts (i.e., 0, 25, 50 percent off) on the price of the insurance to coops within the treatment group so as to encourage take up.
So far, so good? Not quite. In order to implement this project, we partnered up with an NGO — the implementing partner in this context, whose job it is to sell the insurance. Ideally, however, we would have partnered with at least one other implementing partner and then randomized over which implementing partner gets to implement the insurance.
Why? Because with only one implementing partner, it is impossible to tell whether the success or failure of any intervention is due to the perception people in the treatment group have of the implementing partner. Put more succinctly: Why should farmers trust a health NGO when it offers them supposedly improved seeds? The answer lies in having more than one implementing partner and in also randomizing over which partner gets to implement in which village.
I realize that implementation bias is an external validity problem, and that the solution to external validity problems is replication. But as long as researchers in the social sciences have little to no incentives for replication, that solution remains a bit of a cop-out. Maybe we will converge toward a situation where replication will be a necessary condition for publication.
In a post on the World Bank’s Development Impact blog, Gabriel Demombynes discusses how he expects the Bold et al. finding to be a kind of Rorschach test for people’s perceptions of RCTs.