Critiques and claims regarding Evidence-based Policy

Man of straw

It seems to me that Evidence-based Policy’s description of external validity as a “rules system” is something of a straw man. I doubt1 that researchers are rule-based automata applying the dictum of external validity unthinkingly. When evaluating whether the population, time and place are “similar” enough for the original study to have external validity, researchers surely interpret the direction and degree of similarity with care.

Frustratingly, EBP offers no real description of these supposed rules of external validity2. The closest I can find to a systematized procedure is (Khorsan and Crawford 2014). Which is not very close. It’s just three domains each rated on a three point scale. And rating those domains requires considerable human judgment.

Tu quoque

If EBP were to back off from its straw man and allow that people think about external validity with discretion, we’d see that all the critiques of external validity apply similarly (see what I did there?) to the EBP approach with causal principles.

In the summary, I reorganized their critique of external validity a bit. To ensure that I’m not critiquing a distortion, I’ll match their original presentation here.

The advice is vague

EBP complains that external validity’s guidance to apply the “same treatment” is vague. It only works if “you have identified the right description for the treatment”. But this complaint can be applied to the EBP approach too. An intervention only travels from there to here via the effectiveness argument if we find the right formulation of the causal (sub)principle. This is exactly what vertical search was about!

The Tamil Nadu Integrated Nutrition Program worked (TINP) and the Bangladesh Integrated Nutrition Program didn’t and it doesn’t much matter if you say that’s because “same treatment” was too vague or if you say it’s because vertical search failed to turn up the right description of the causal principle at work.

On either approach, mechanical application fails and discretion is required for success.

Similarity is too demanding

EBP makes fun of a study that says:

Thus [Moving to Opportunity] data … are strictly informative only about this population subset—people residing in high-rise public housing in the mid-1990’s, who were at least somewhat interested in moving and sufficiently organized to take note of the opportunity and complete an application. The MTO results should only be extrapolated to other populations if the other families, their residential environments, and their motivations for moving are similar to those of the MTO population. (Ludwig et al. 2008)

If our bar for similarity is this high, why even bother with a study that will never travel, EBP asks. But I think the above conclusion is actually semi-reasonable.

First, the authors are clearly being conservative in some regards. They don’t actually mean that the information expired with the mid-1990s. That’s a shorthand for a variety of factors which they expect are relevant but they haven’t individuated. It will be up to future policymakers and researchers to use their dIsCrEtIoN to determine whether all those implicit factors are present in new circumstances and this intelligent interpretation is an expected part of external validity—not a gross breach.

Second, it sounds a lot to me like the authors of the critiqued study are trying to identify the support factors that EBP loves. We could rewrite this in EBPese: “The intervention only plays a positive causal role if it’s supported by dissatisfaction with current housing and sufficient conscientiousness.”.

Finally, we could say that identifying support factors in the EBP approach is too demanding. If we just listed off every fact we knew about the context of the original intervention and called it a support factor, it would clearly be extremely demanding—nowhere else would have this precise combination of support factors. It’s only by filtering proposed support factors through human judgment that we get a more manageable set and escape the demandingness critique. But if we move away from the straw man version of external validity and allow ourselves to apply judgment there too, then we can say that an intervention context only has to be similar in certain ways—thereby escaping the demandingess critique.

Similarity is the wrong idea

EBP says that similarity is just the wrong idea. By this, I think it means that similarity is demanded without an underlying rationale. It returns to the MTO example excerpted above and says that it demands a random, nonsensical assortment of similarities. I think this is just plain uncharitable. I can think of many worse similarities to demand. For example, we could claim the MTO study only has external validity if the follow-up policy:

  • Starts on the same day of the month as the original policy
  • Is also implemented under a president with the first name Bill
  • Is also called the “Moving to Opportunity Experiment”

Even if we accept the argument that the list of similarities required for external validity by the MTO study is a bad one, the EBP approach doesn’t inoculate us from the problem of making bad demands with respect to effectiveness. EBP says that the MTO study should have tried to explicitly identify support factors. But there’s no guarantee that this would succeed! Just thinking in terms of causal principles and support factors doesn’t mean we’ll automatically get them right. We could still end up with missing or extraneous support factors and EBP would still make fun of us unless we used the magic words, I guess.

In other words, it seems just about as hard to determine which contextual factors are actually support factors as it is to determine which similarities are important and which are irrelevant.

Similarity is wasteful

As we mentioned in a footnote last time, EBP complains that external validity’s demand for similarity is wasteful. What we really want is conditions that are at least as favorable. I have to think that adherents to the external validity approach would readily acknowledge deviations from similarity in certain directions are tolerable.

But this is an extra degree of freedom and with freedom comes responsibility. Our intuitions and models aren’t always correct about the sign of association between two factors. We might think that an intervention is more likely to work the poorer the target population is and be wrong.

So with either the EBP approach or the external validity approach, reasonable adherents would allow us to apply the intervention in circumstances more favorable than the original trial. But in both cases we’d have to careful to know what “more favorable” actually means.

Biting bullets

These bullets are delicious:

Here we show how the orthodoxy3, which is a rules system, discourages decision makers from thinking about their problems, because the aim of rules is to reduce or eliminate the use of discretion and judgment, and deliberation requires discretion and judgment. The aim of reducing discretion comes from a lack of trust in the ability of operatives to exercise discretion well.


To tell people that they have to follow your rules for assessing evidence for effectiveness requires that you think that the rules will produce a better result than allowing them to think. This requires some combination of lack of confidence in their ability to think, and high confidence in the general applicability of the rules.

I think the replication crisis is very clear evidence that trusting the discretion and judgment of operatives is not a winning strategy. Not necessarily because the operatives are ignorant or malignant, but because the task at hand is apparently very hard. And the replication crisis is mostly about internal validity. External validity seems like an even harder problem where discretion can be even more problematic.

On the other hand, I have already said that I am skeptical of characterizing the external validity approach as a rules system. But this is lamentable! I think we should be pushing toward a world where we can codify these procedures and eliminate discretion, not promoting discretion as EBP does.


Obviously, there are a lot of complaints here. What’s left after we sort through them?

Both EBP and I agree that it’s currently ill-advised to apply rules without exercising discretion. EBP’s proposal is to embrace discretion and mine is to improve the rules.

EBP describes the external validity orthodoxy as a “rules system”. I think this is inaccurate. If it’s inaccurate and we admit that adherents of the external validity approach also apply discretion, then all of EBP’s other complaints about external validity (vague, too demanding, wrong, wasteful) apply equally to the EBP approach. Neither approach is strictly superior on these grounds.

I also interpret EBP as arguing for the value of theory and models in addition to raw empiricism. I’m fully on-board with this. But I don’t think theories and models are incompatible with the external validity approach. This is perhaps the key contribution of EBP for me—emphasizing that considerations of external validity (e.g. ecological validity, population validity) ought to foreground possible causal principles.

Beyond that, it mostly seems like the EBP approach and external validity are different languages for talking about the same problem. Use the EBP language if it helps you think about the problem more clearly.

Campbell, Donald T. 1986. “Relabeling Internal and External Validity for Applied Social Scientists.” New Directions for Program Evaluation 1986 (31). Wiley Online Library: 67–77.

Khorsan, Raheleh, and Cindy Crawford. 2014. “External Validity and Model Validity: A Conceptual Approach for Systematic Review Methodology.” Evidence-Based Complementary and Alternative Medicine 2014. Hindawi.

Ludwig, Jens, Jeffrey B Liebman, Jeffrey R Kling, Greg J Duncan, Lawrence F Katz, Ronald C Kessler, and Lisa Sanbonmatsu. 2008. “What Can We Learn About Neighborhood Effects from the Moving to Opportunity Experiment?” American Journal of Sociology 114 (1). The University of Chicago Press: 144–88.

  1. My claims about the typical practice of active researchers in general is, of course, largely speculative. But EBP also gives little evidence to support that researchers think about external validity in the way they suggest.↩︎

  2. In fact, given external validity’s central role to the book as a target of critique, EBP’s description of it is fairly high-level and minimal.↩︎

  3. It’s not obvious to me that external validity is the orthodoxy that EBP makes it out to be. At a minimum, there has been thoughtful consideration of its complexities for quite a while (Campbell 1986).↩︎