Randomized control trials and the debate over them, explained.
The Nobel Prize in Economics awarded to Esther Duflo, Abhijit Banerjee, and Michael Kremer — formally given to them yesterday at the Nobel ceremony in Norway — was a big win for a scientific approach they’ve championed: randomized controlled trials (RCTs).
Randomized controlled trials are experiments that apply an intervention to only a randomly selected portion of the target population, so that you can compare the effects of the intervention against a group that didn’t receive it. As the Nobel committee noted, their use in development economics has helped better direct aid and public policy, transforming millions of lives.
The introduction of the RCT-driven approach didn’t just change how development economists answered their questions; it also changed which questions they posed. From a focus largely on nation-scale problems with no clear answers (like why countries are poor), the field has shifted to a focus largely on smaller problems that can be definitively answered: Do textbooks improve student outcomes? Does deworming work? How about microfinance? The “Randomistas” were posing, and answering, questions that hadn’t previously been considered.
But RCTs have their critics. Some worry that RCTs are dehumanizing, treating people like science experiments. Some worry that they tell us less than their proponents claim, and that they’ve become too pervasive rather than being one good tool in a toolbox. Others have been critical of what they see as RCTs steering economists toward asking small questions instead of big ones, like the causes of productivity or poverty.
This might seem like a debate without many implications outside academia, but it’s actually vitally important in everyday life. We use science to set public policy. In medicine, clinical trials (usually RCTs) determine whether drugs will be legal to prescribe in the US. In development economics, millions in aid dollars are distributed based on the outcomes of RCTs.
The Nobel Prize given to Duflo, Banerjee, and Kremer has brought these disagreements to the fore. Here’s what the debate is all about — and why, on balance, we should consider the rise of RCTs a huge step forward for development economics.
The rise of the randomized trial
Answering even simple research questions in medicine or public policy can be surprisingly challenging. To see why, imagine trying to answer a question like, “Does drinking soda make people less healthy?”
One way to answer that would be to ask people how much soda they drink and measure how healthy they are.
But this approach doesn’t get you a satisfying answer. Maybe people who drink soda were already unhealthy to begin with. Or maybe soda helps with the symptoms of some common health problems (like fatigue and low energy), so people with those health problems will consume more soda. Maybe conscientious, health-aware people have read that soda is bad for them and so they avoid it — and whether or not soda is bad for them, being conscientious and health-conscious is good for them and will likely produce good health outcomes.
The point is: Just asking people how much soda they drink and then measuring their health won’t tell you much about the causal relationship between soda drinking and being healthy.
Lots of types of studies exist to try to get around this fundamental challenge. You could look at a neighborhood where a convenience store recently opened up, assume that people are now drinking more soda, and measure changes in their health. You could look at whether coupons for soda increase purchases, and whether that has effects on health outcomes.
But one of the most useful solutions, if you can do it, is to randomize exposure to your variable of interest (in this case, soda). Here’s what that means: You find some people interested in participating in your study. Half of them go about their daily lives as before. The other half, you attempt to persuade to drink less soda. Then you measure if they do it, and if there are health benefits. If you get results, you can be much more confident that your results are caused by soda drinking, instead of just associated with them.
That’s the concept of a RCT. They’re key to our understanding of science in fields like medicine. In fact, while an impressive 22 million people were enrolled in a social science RCT from 2007 to 2017 (in fields ranging from psychology to economics), that’s dwarfed by the 360 million enrolled in a medical RCT.
In the case of development economics — Duflo, Banerjee, and Kremer’s field — RCTs emerged in the late 1990s. There were lots of foreign aid interventions already happening, but they were mostly proposed and implemented without a clear sense of which of the interventions were most helpful. To improve outcomes in a poor school, should you send textbooks? Computers? Bonuses for teachers? Scholarships for students?
Enter the RCT. In a 2005 article, Banerjee described the state of affairs in development economics that he and others were trying to change. He discussed a publication from the World Bank:
The Sourcebook is meant to be a catalogue of what, according to the bank, are the right strategies for poverty reduction. These are also, we presume, strategies into which the bank is prepared to put its money. It provides a very long list of recommended projects, which include: computer kiosks in villages; cell phones for rent in rural areas; scholarships targeted toward girls who go to secondary school: schooling voucher programmes for poor children; joint forest management programmes; water users’ groups: citizen report cards for public services; participatory poverty assessments; internet access for tiny firms; land titling; legal reform; micro-credit based on group lending; and many, many, others.
While many of these are surely good ideas, the book does not tell us how we know that they work. Indeed […] this is not a primary concern of the authors.
We now know that some of these schemes work dramatically better than others. The way we know that is because development economics researchers conducted studies. They found that microfinance has modest effects, but not the transformative ones initially advertised; that computer kiosks have lots of implementation problems; that scholarships can motivate students to stay in school if properly targeted, and lots more. And we found out because social scientists started using RCTs to test these interventions’ effectiveness.
The critics of randomized trials
There’s not much nostalgia for the days when we had no idea which development interventions actually worked. But there are researchers with reservations about the role of RCTs in development. Many of them have raised those concerns in papers and talks over the last few years, and many more raised them when the Nobel Prize was announced.
There are four broad categories of complaint:
One complaint is that RCTs often don’t generalize as well as you’d assume. A study of textbooks in one region in Kenya will tell you a lot about the advisability of providing those textbooks in that region in Kenya, but might tell you fairly little about what textbooks will achieve in Bangladesh. This concern is called “external validity.”
Economist Peter Dorman writes:
There are two specific aspects of experimentalism that raise questions on this front, the tendency for experiments to be small, local and time-bound (like a set of schools in one state in India in the mid-00’s) and the effects of experimental control itself, when a sort of artificiality creeps in. I’m familiar with the literature on experimentally designed conditional income transfers, for instance, where every new study, with a new country location, time period or set of design tweaks seems to alter the bottom line of what works and how.
Princeton University’s Angus Deaton has published on these issues, arguing that it’s a mistake to expect external validity from RCTs (and that there are important limitations on what they can teach us).
This complaint is basically accurate and fair — but it’s important to note that it’s not a problem unique to RCTs. Economist Eva Vivalt told me, “A lot of the criticisms of RCTs are a little bit misguided, in that many of the same complaints can be made about non-RCTs. External validity in general is pretty low, and that’s really disappointing — but in terms of RCTs versus other kinds of studies, they actually have the same external validity concerns.”
It’s not that RCTs have uniquely bad external validity — it’s just that generalizing from any study to guess implications for different programs elsewhere on the planet is hard work. Overall, the shift of the development economics movement toward conducting lots of empirical research has been good for our ability to generalize about the effects of problems — even if some people overgeneralize and wrongly assume RCTs have more external validity than they do.
Answering small questions
The second line of criticism is that the success of RCTs has driven researchers to only answer the sort of small, straightforward questions that lend themselves to a RCT.
Stefan Dercon summed up the “randomista” approach this way: “Everything has to be inductive, experimental. Lots of little solutions will move us forward. They have no big theory of what causes low growth, no big questions, just ‘a technocratic agenda of fixing small market failures.’”
Or, as Dorman puts it:
The strategy of experimental design virtually requires a reductionist, small-bore approach to social change. A more sweeping, structural approach to poverty and inequality introduces too many variables and defeats experimental control. Thus, without any explicit ideological justification, we end up with incremental reformism when the entire social configuration may be the true culprit.
One way to think about this criticism is this: Imagine we sent some development economists to 17th-century Italy to see why it was so poor. They conducted RCTs of efforts to treat malaria, tried loaning people small amounts of money, and checked whether reminder bracelets could improve immunization rates.
Those interventions might improve matters! But fundamentally, historians think 17th-century Italy was poor because of big, structural things — the fact that modernization was happening elsewhere but not in Italy, the Catholic Church was reasserting its power, rising inequality, and so on.
And fundamentally, in a sense, 17th-century Italy was poor because the ways to be rich hadn’t been invented yet. High-productivity modern agricultural techniques hadn’t been invented yet, nor had electricity or the internet and all of the industries it would make possible. You could definitely improve conditions with small-scale tests, but nearly all of the differences between 17th-century Italy and modern Italy are dramatic differences in the whole character and structure of the society and its industries and technology.
So it stands to reason that modern poverty might be the same way. While we can do good with small-scale solutions, the overall forces driving poverty and prosperity will be big institutional and cultural ones.
Defenders of RCTs tend to respond that, yes, “big questions” are important. But we do not know how to answer them. We know how to save lives, improve schools, increase incomes, and give people more control over their futures; we have no idea how to fundamentally transform the trajectories of nations. And there are serious ethical problems with trying to change whole societies at random; you can’t run a RCT to see if socialism is better than capitalism, say. In other words, it’s not really the presence of RCTs that’s holding us back from studying “big questions” — it’s the lack of promising and meaningful ways to study them.
Concerns about participant consent and agency
Most members of the public feel kind of uneasy about RCTs, with a significant minority rating them immoral. Often, people will say that it’s okay to do something — say, give out malaria nets — to a whole population, or it’s okay to sell the malaria nets for a low price. But they are much more divided over whether it’s ethical to give out nets in some regions but not in others in order to test which produces higher use of nets (and consequently lower rates of malaria).
It makes sense for people to be averse to “experimenting on” others, especially experimenting on many of the world’s poorest people. But in practice, the alternative to a controlled experiment is usually an uncontrolled experiment — interventions being carried out haphazardly where there are the resources for them, without anyone checking whether they work. And aid distributed without checking whether it works often ends up being a disaster for the people it’s trying to help.
Conducting a study is an imperfect way of making aid efforts accountable to the people they’re supposed to help, but it certainly beats no effort to make aid accountable to the people it’s supposed to help.
Crowding out other problem-solving
The last major criticism? RCTs are expensive. They are regarded as a uniquely valuable and high-quality way to do research. Some researchers complain that as a result, the field has pivoted to focus too obsessively on RCTs, at the expense of research that uses other methods to answer other questions, or cheaper research that can effectively get at many of the same answers.
There’s one problem with this complaint — RCTs really haven’t swamped development economics all that much, at least not nearly as much as the discussions about them would suggest. A 2016 analysis by the World Bank’s David McKenzie found that RCTs remain a minority of published development economics research. They make up a disproportionate share of research in top journals in the field, but even there, they’re only about 30 percent of the papers in 2015.
One tool in a toolbox?
Reconciling these complaints, some analysts say that RCTs should be viewed as “one tool in the toolbox” — that the problem is less the approach itself, which has certainly done a lot of good, and more the people claiming that it’s the “gold standard” for research and a singularly good way of advancing knowledge.
It seems likely that there are people who have overstated the power of RCTs, or unfairly dismissed studies conducted other ways. And it’s certainly good to respond with nuance to a Nobel Prize, keeping in mind the limitations of the prize-winning research even while recognizing its significance.
But “one tool in the toolbox” is an overcorrection. The shift in development economics to focusing on answerable questions and ensuring that development advice was focused on what worked was an enormous and beneficial change. Far from making aid more arbitrary, it made it less arbitrary and more accountable.
RCTs certainly won’t tell us everything but they’re more than just another tool in the toolbox — they’re a pretty essential way to rigorously answer questions that would otherwise be impossible to answer, and they’ve improved many lives as a result.
Sign up for the Future Perfect newsletter. Twice a week, you’ll get a roundup of ideas and solutions for tackling our biggest challenges: improving public health, decreasing human and animal suffering, easing catastrophic risks, and — to put it simply — getting better at doing good
Author: Kelsey Piper