Women learning about health issues in a village in Sahre Bocar, Senegal. (Photo Credit: Jonathan Torgovnik/Reportage by Getty Images

The positions are polarized. The debates are divisive. Arguments mischaracterize opponents’ views.  Am I talking about the U.S. presidential election?  Nope.  I’m talking about the repetitive, tendentious quarrels on the merits and disadvantages of random assignment methods to assess “what works” in social programs in developing countries.  Seriously.

For the past 15 years or so, evaluation methods originally inspired by tests of new medicines have been applied to answer very different kinds of questions in the developing world.  Randomized controlled trials, or RCTs, have been conducted to measure the effectiveness of social programs, which provide resources—health care, schooling, job training or even cash—in particular ways to individuals or households with the expectation that those interventions will improve specific outcomes.

Evaluations of program impact using random assignment methods try to find out whether a particular program really made a difference. Did the job training get young people jobs or would they have been hired anyway? Will community oversight improve the quality of local infrastructure projects so that roads and water systems last?

In general terms these evaluations are asking: What is the net effect of the program?  And were the assumptions about what it would take to improve social outcomes correct?  Finding answers to these questions is of great—and shared–interest to those who fund, design implement and potentially benefit from programs. How to get these answers it is where opinions diverge.

On one side we have academic researchers who design and conduct studies using a method that tries to separate the effects of a particular intervention from changes that would have occurred anyway. Think: Michael Kremer at Harvard, Abhijit Banerjee and Esther Duflo at MIT, Paul Gertler at UC Berkeley, Dean Karlan at Yale, and others.

On the other side are academic researchers who have multiple and varied critiques of the method and its application. There are Princeton’s Nobel laureate Angus Deaton, and Lant Pritchett and Ricardo Hausmann, both at Harvard.

For reasons that are beyond my understanding, the fight is intense, personal and confusing to those of us who see a dispute that is framed as “either-or” when it could (and should) be “both-and.”

Three basic concerns are leveled at the use of RCTs, although they are often woven together in perplexing ways.

1. It’s not the only way to know what works

RCTs are not the only legitimate source of knowledge – even most of its strongest proponents will agree to that.  RCTs require a specific intervention and a defined population. That excludes lots of important policy changes which are nationwide, like civil service reforms, or are untargeted, like mass media campaigns. We care about the effectiveness of those efforts, too, and non-RCT analyses can shed light on whether they are working, even if “with the program” cannot be compared to an actual population “without the program.”

Even when an RCT might be the strongest way to estimate net impact, an experimental design may not be feasible because of practical constraints.  In those instances, no one would ignore the insights that observational studies and other sources of evidence can provide – although we should do our best to figure out if there are alternative explanations for what’s observed. And when RCTs are possible, the findings don’t answer every important question. Complementary analyses of the quality of program implementation are invaluable.

It is true enough that RCTs do not provide the answer to all questions, but is that a reason  reject them? We don’t set the bar that high for any other methods in which we invest time and money. So let’s not do that for RCTs.

2. It’s the wrong way to know what works

Far more than for drugs, context crucially affects how social programs are implemented and what impact they have.  So, some argue, findings from a randomized evaluation in one context won’t apply to others.

We have a lot of evidence, however, about the value of social experiments from both high-income and developing settings. We’ve benefited from randomized evaluations of social programs for decades, and in the United States they have contributed to both accountability and learning in early childhood education, social protection, and job training.

Rand’s health insurance studies in the 1970s showed that neither critics nor proponents were entirely correct about how people would respond to subsidized premiums. RCTs of “Scared Straight,” a program designed to reduce juvenile delinquency, revealed that it had exactly the opposite effect. A social experiment with conditional cash transfers in Mexico demonstrated not only that the program improved health and education of children, but also that fears over encouraging domestic abuse did not materialize.

These social experiments provided convincing evidence of net impact. But they also helped program designers clarify their theories of human behavior and reveal key assumptions. The knowledge the experiments generated helped refine and improve the programs.

Yes, RCTs should be conducted in as real-world settings as possible, and we shouldn’t overspend on boutique experiments that could never be implemented at a significant scale.  But randomized evaluations remain a valuable tool to generate knowledge that is sorely needed to figure out how money and other inputs can turn into better health, education and employment outcomes.

3. It’s driving us to emphasize the wrong type of development programs 

This is a critique that invokes the specter of a Food and Drug Administration for development interventions.  In that world, funders would only support a subset of discrete (rather than system-wide) interventions aimed at one (rather than a whole constellation of) development outcomes. You get “deworming” and “chlorine dispensers” rather than “address the structural drivers of poverty.” This competes directly with systems thinking, complexity theory, multisectoral work, and a whole set of approaches that resist the “if x then y” thinking intrinsic to impact evaluation.  It is a critique of a nonexistent world in which RCTs not only crowd out other types of inquiry but also crowd out programs that cannot be evaluated through RCTs.

But let’s be real. We don’t live in that world and never will, no matter how many RCTs are conducted. The majority of official and private development dollars are spent on programs that are not and will not be subject to RCTs. But by providing insights into individual and community behaviors, RCTs will provide information that is useful to questions that involve complexity.

And the existence of evidence from RCTs – the fact that the effectiveness of some programs can be established in a systematic, scientific way – does raise the bar in a healthy way for the use of the strongest possible data and evidence for all decisions.  In an age when RCTs have visibility and even cache, there’s less space for pure political discretion and greater incentive to find ways to use data and evidence to make decisions about where to spend precious dollars.

Advocating for greater attention to what is happening rather than what we hope will happen – the essence of rigorous evaluation – is where both sides can meet.  You don’t have to choose methodologies to agree on the value of transparent, consistent measurement that neither ignores the complexity of context nor uses it as an excuse not to ask hard questions.  In some circumstances, RCTs will be the best way to answer those questions.  In others, it may be to design programs that continuously search for improvements in a dynamic try-measure-learn-fix-try again cycle.

All of the people involved in the disputes about whether RCTs are or are not the “right” methodology are themselves brilliant champions for the integration of reason, logic and evidence into public policy.  That brilliance will shine brighter if they come together to support the generation and use of evidence of many kinds for the many types of policy decisions that matter.