This post from our friends at Apolitical, a platform highlighting and connecting the work of public servants, explores the potential pitfalls of evidence-based policymaking and what to do about it
At first glance, it seems almost nonsensical to question whether governments should make policy based on evidence. Why wouldn’t we expect programs to stand up to rigorous testing? Isn’t this just the norm?
For supporters of “evidence-based policymaking,” it’s pretty straightforward. “The dirty secret of almost all government is that we don’t know whether we’re doing the right thing,” David Halpern, founder of the UK’s Behavioural Insights Team, told me when I interviewed him for Apolitical. “Whether what we’re doing is helping the patient or not helping the patient — or even killing him. People are often running blind, overconfident, and dispensing massive sums of money.”
Over the past 20 years, increasing momentum has been gathering behind the idea that governments need to apply scientific standards of proof to making decisions about policy.
In the UK in the late 1990s, Tony Blair’s Labour Party ran on the platform that “what matters is what works,” proposing a pragmatic, rational answer to ideology-driven politics. Since then, the UK has set up 10 What Works Centres to evaluate policy evidence, each devoted to a different area of expertise, from educational disadvantage to crime reduction. Together, they are now responsible for around £200 billion ($280 billion) of decision-making.
This movement has been accelerated by behavioral insights and big data, both of which promise to let policymakers measure their impact more precisely than ever before.
In the US, more than 100 cities, from Albuquerque, New Mexico, to Winston-Salem, North Carolina, have pledged to use evidence under the “what works” slogan. In a marked distancing from partisan politics, the organizers say that, “World-class cities come in all shapes and sizes. But they share the same mission: to serve residents in the most effective ways possible.”
The drive towards evidence-based policymaking is a drive towards a rationalist dream: that hard evidence can remove partisan wrangling from the equation and, instead, turn policymaking into a scientific process, guided by numbers, run like a lab and devoted not to ideology but to the simple question: what works?
The hope is that even the most contentious political issues could be soothed and moved forward by better evidence. Gun violence in the US is a perfect example.
“There are policies that experts on both sides of the gun debate believe have opposite effects,” Andrew Morral, head of gun violence research at the RAND corporation told one of my colleagues, citing the idea of gun-free zones. Some believe they invite attackers who know they won’t meet armed resistance; others believe that fewer guns equals fewer shootings.
“They aren’t both right,” said Morral. “This is a factual matter and it could be resolved with good science.”
The classic examples come from the UK, where David Halpern’s “Nudge Unit,” founded in 2010, pioneered using behavioral insights to push people towards certain actions.
In the West Midlands region, police were struggling with how to deter people from unsafe driving. The language in the letters they sent to drivers caught speeding was too complex — as a result, offenders did not pay their fines and ended up in court.
“We started showing a picture of the aftermath of a car accident, where flowers and a teddy bear had been left on a lamppost, alongside the text: ‘Last year, 700 children were killed in this area because of speeding drivers.’ It demonstrated that our motivation is good: it’s not about collecting fines; it’s about saving lives. And that’s when we saw the response,” Alex Murray, Assistant Chief Constable for the West Midlands Police, told one of my colleagues.
Drivers who received the revised letter paid fines 20 percent more quickly, and as a result, went to court 41 percent less, which saves the county about £1.5 million ($2.1 million) in court fees a year. The behavioral science-based approach has also cut reoffending by 20 percent.
If only all policymaking was so clear-cut.
When “‘what works’” doesn’t work
As the use of rigorous evidence, such as the findings of randomized controlled trials (RCTs), has spread, we at Apolitical have identified a small number of problems and limitations with the approach:
- The replication problem. A great deal of time and money can be spent testing a solution that simply does not work in a different setting. For example, the Nurse-Family Partnership, a home visiting program, achieved excellent results in an RCT in California. When it was replicated in other states and the UK, the effects were insignificant. Even though the reason it didn't work in the UK hasn't been fully established, it may be that the scheme was designed with parameters that don't make sense in the context of the UK's public health system.
- The target problem. “When the NHS decided that a major problem was that people were having to wait too long to be admitted to emergency wards, they declared that hospitals would be evaluated based on to what extent patients were admitted within four hours,” Professor Jerry Muller of the Catholic University of America told one of my colleagues. “Some hospitals responded by having ambulances with patients circle the hospital until they could be admitted within the four-hour window. There are infinite varieties of gaming of that sort.”
- The data gap problem. Not everything useful can be measured, and when budgets are allocated, good work that doesn’t create good data can be neglected. “In an A/B test, one thing will always prove better than the other; in diagrams, data curves always change over time,” Thomas Prehn, who was, until recently, director of Denmark’s MindLab, wrote on Apolitical. “But are these tools really adequate for making policy amid messy politics where the qualitative indicators outnumber the quantitative?”
- The politics problem. The ultimate limitation of what works is that our societies have not agreed on what it is they’re trying to achieve, nor on who should pay for it. The problems policymakers are working on are contested — and probably always will be. And great leaps of progress are often made not through rational refining of previous successes, but through feeling, persuasion, and a vision that society’s aims should be different. To put it bluntly, they are often made by politicians.
It works if you work it
The replication problem, and the failure of the Nurse-Family Partnership outside California, gets at the heart of the matter: RCTs are now so highly valued that there is a cliché for describing them — “the gold standard.”
A well-executed RCT says something about the intervention then and there and on that population. In that regard, they are the gold standard. But beyond that specific context, they have no special privilege over other kinds of evidence. An RCT done in one place has no real bearing on how that program would work somewhere else. A more universal standard of evidence would be provided by RCTs done simultaneously in different locales, the more varied from one another, the better.
An RCT can demonstrate that an intervention works — in one specific instance, in one specific context. What policymakers need to do to replicate, scale up, or even just hone their interventions is to use the RCT to try to understand why something works. Without that depth of understanding, the “gold standard” can be no more than a means of keeping something funded.
Ultimately, evidence is only useful if it’s held by people with expertise, who can evaluate the numbers and go beyond them for the greater good. There is no substitute for smart, dedicated civil servants with a deep understanding of their field.