IN SEARCH OF A GOLD STANDARD FOR SOCIAL PROGRAMS
Op-ed, Boston Globe, February 18, 2000

by Lisbeth B. Schorr and Daniel Yankelovich



The presidential candidates differ on how large a role they advocate for government in solving social problems, but on one point they sound a consistent theme: Whether in education, welfare-to-work, or crime prevention, everyone agrees we must be far more rigorous in focusing our investments on what works. Yet nowhere does confusion reign more than in how to determine what works.

The confusion stems from the mistaken assumption that we can evaluate social programs with the same methods that led to the nation’s great medical advances. Economist Alan Krueger argues that we test education reforms the way the Food and Drug Administration tests drugs. A Brookings Institution symposium concludes that finding out what works requires randomized field trials in which "one person gets the pill and the other person gets the placebo."

Unfortunately, evaluating complex social programs is not like testing a new drug. The interventions needed to rescue inner-city schools, strengthen families, and rebuild neighborhoods are not stable chemicals administered in standardized doses.

Social programs are sprawling efforts with multiple components requiring constant mid-course corrections, the involvement of committed human beings, and flexible adaptation to local circumstances. Paradoxically, the very nature of successful programs makes them almost impossible to evaluate as one would a new drug.

For example, when policy makers want to know what works in preventing youth violence, academic evaluators are not likely to refer them to the work of the Ten Point Coalition, an alliance of black ministers that has helped curb gang warfare in Boston. The coalition mobilized a combination of initiatives, ranging from neighborhood probation patrols to safe havens for recreation, and stopped the killings cold. But because such initiatives don’t lend themselves to "one person getting the pill, the other the placebo," many evaluators reject them as "merely anecdotal" or the work of "charismatic leaders."

Similarly, because community initiatives that combine family support, preschool and after-school programs, and radically improved classroom instruction can’t be assessed with pill-testing methods, they are rarely assessed at all. While the gurus insist on taking only the most orthodox methods, we miss learning from bold and effective initiatives to restore neighborhood life, educate children, and strengthen families.

Insistence on irrefutable scientific proof of causal connections has become an obstacle to finding what works, frustrating the nation’s hunger for evidence that social programs are on the right path. Ironically, the methods considered most "scientific" can actually defeat thoughtful assessments of promising interventions.

Why is this so? It is because scientific experiments are best equipped to study isolated interventions, while the most promising social programs don’t consist of discrete, circumscribed pieces. The current fascination with reducing class size derives not only from its intuitive appeal but from the fact that class size is one of the few elements of education reform lending itself to controlled experiments. The method by which it has been studied elevates it to the top of the reform agenda. Many new approaches are now becoming available for evaluating whether complex programs work. What they lack in certainty they make up for in richness of understanding that builds over time and across initiatives. Quarrels over which method represents "the gold standard" make no more sense than arguing about whether hammers are superior to saws. The choice depends on whether you want to drive in a nail or cut a board. FDA methods should be used to assess one-change-at-a-time interventions, not to evaluate complex, evolving, interactive, community-driven initiatives.

We should stop treating methodological issues as religious quarrels and put our energies into developing and applying new ways of learning from what works so the public can keep score on whether social programs are on track to achieving their goals. This is the road to more effective social policies and programs and to wiser investment of our dollars, our energies, and our hopes.