• ## Conditioning in causal graphs

As mentioned in the warnings on the first post on graphical causal models, I’ve been lying to you so far. But it was for a good reason: that sweet, sweet expository simplicity. So far, all our definitions, algorithms, etc. have proceeded without any acknowledgment of the social scientists’ favorite statistical tool: controlling for a variable1.

In this post, we’ll introduce the concept of conditioning to our graphical causal models framework and see how it both complicates things and offers new possibilities. (This post deliberately mirrors the structure of that one so it may be handy to have it open in a second tab/window for comparison purposes.)

### Causal triplets, again

We started out by talking about three types of causal triplets: chains, forks and inverted forks. For convenience, here is the summary table we ended up with:

When we add the possibility of conditioning, things change dramatically:

The complete reversal of in/dependence occasioned by conditioning on the middle vertex may be a bit surprising. There’s a certain reflex that says when ever you want to draw a clean story out of messy data, conditioning on more stuff will help you. But as we see here, that’s not generally true2.

Full post

• ## Instrumental variables on causal graphs

Last time we talked about viewing d-separation as a tool for model selection. But we’re pretty limited in the causal models we can distinguish between by only observing our variables of interest—any two graphs with the same set of d-separations are indistinguishable. Instrumental variables are a common tool for trying to get around the limitations of purely observational data.

### Instrumental variables

Instrumental variables (IV) are variables that we’re not intrinsically interested in but that we look at in an attempt to suss out causality. The instrument must be correlated with our cause, but its only impact on the effect should be via the cause.

The classic example is about—you guessed it—smoking. Because running an RCT on smoking is ethically verboten, we’re limited to observational data. How can we determine if smoking causes lung cancer from observational data alone? An instrumental variable! To reiterate, we want a factor that affects smoking prevalence but (almost certainly) does not affect lung cancer in other ways. Finding an instrument that satisfies the IV criteria generally seems to require substantial creativity. Can you think of an instrument for the causal effect of smoking on lung cancer?

An instrument that meets these criteria is a tax on cigarettes. We expect smoking to decrease as taxes increase, but it seems hard to imagine a cigarette tax otherwise having an effect on lung cancer.

### Instrumental variables on causal graphs

Okay, so that’s what IVs are at a high level. But what are they concretely in the graphical causal model setting we’ve been developing?

#### A brief notational interlude

We’ll get this out of the way here:

• $$\perp\!\!\!\perp$$ is the symbol for d-separation
• Once we add the strikethrough, $$\not\!\!{\perp\!\!\!\perp}$$ mean d-connected.
• If $$G$$ is a graph, $$G_{\overline{X}}$$, is $$G$$ in which all the edges pointing to vertex X have been removed1.

#### Defined

We’ll start with the definition and then try to build up a feel for it. An instrumental variable X for the causal effect of Y on Z in graph G must be:

1. d-connected to our cause Y—$$(X \not\!\!{\perp\!\!\!\perp} Y)_G$$
2. d-separated from our effect Z after severing the cause Y from all its parents—$$(X \perp\!\!\!\perp Z)_{G_\overline{Y}}$$
Full post

• ## Flip it and reverse it

Last time we found the d-separations that correspond to a graph. This time, we find the graphs that correspond to a set of d-separations. Which is more useful because we generally know d-separations and generally don’t know graphs.

Last time we talked about causal graphs, what d-separation and d-connection mean, and how to infer these properties from a causal graph. But this isn’t terribly useful because it requires that we have a fully specified causal graph. If we’re performing research in new or uncertain areas, we have data rather than a causal graph. And this data tells us about d-separations (variables that are independent of each other) and d-connections (variables that are correlated). So our work last time was exactly backwards: graphs to d-separations. This time we’ll go from d-separations to graphs.

### Model selection

One way to think about d-separation and d-connection is as helping us with model selection. Last time we presented

as one possible causal model regarding smoking. But it’s not the only possibility. We might also be worried that the true causal structure looks like this (just go with it):

How can we tell them apart? Can we use observational data alone? In this case, observational data alone is enough to distinguish between these two causal models! The key is that the two models have different sets of d-separations. In the original model, all the vertices are d-connected and there are no d-separations (this must be the case since there are no colliders). In the second (silly) model, “smoking” and “lung cancer” are d-separated because “yellow fingers” is a collider between them. If our data show that smoking and lung cancer are independent, we must rule out the first model and prefer the second. If the two variables are correlated, we must rule out the second model and prefer the first.

This is a procedure that works generally:

1. Draw out the plausible graphical causal models that include all the variables you have data on
2. Determine the d-separations for each plausible model
3. Determine the variables in your data that are independent
4. Retain the models from step 1 whose d-separations in step 2 are compatible with the data analysis in step 3

The ideal is that there’s only one model left at the end of step 4. However, it’s possible to end up with none. This means that step 1 wasn’t permissive enough and more models need to be considered. It’s also possible to end up with more than one model. Not all models are distinguishable by observational data alone. This occurs whenever two models have the same set of d-separations.

Full post

can the human brain deal with the complexity to control an extra limb and yield advantages from it? […] Anatomical MRI of the supernumerary finger (SF) revealed that it is actuated by extra muscles and nerves, and fMRI identified a distinct cortical representation of the SF. […] Polydactyly subjects were able to coordinate the SF with their other fingers for more complex movements than five fingered subjects, and so carry out with only one hand tasks normally requiring two hands.

In summary, most of the biggest claims made by Wilkinson and Pickett in The Spirit Level look even weaker today than they did when the book was published. Only one of the six associations stand up under W & P’s own methodology and none of them stand up when the full range of countries is analysed. In the case of life expectancy - the very flagship of The Spirit Level - the statistical association is the opposite of what the hypothesis predicts.

If The Spirit Level hypothesis were correct, it would produce robust and consistent results over time as the underlying data changes. Instead, it seems to be extremely fragile, only working when a very specific set of statistics are applied to a carefully selected list of countries.

The allure of “meta” and “axiomatic first principles” is that it’s kinda like get-rich-quick thinking but for epistemics. Get a few abstractions really right and potentially earn more than you would grinding as an object-level wage slave for decades.

Trying to identify the best policy is different from estimating the precise impact of every individual policy: as long as we can identify the best policy, we do not care about the precise impacts of inferior policies. Yet, despite this, most experiments follow protocols that are designed to figure out the impact of every policy, even the obviously inferior ones.

Cambiaso rode six different horses to help his team win. […] What is noteworthy is that all six horses were clones of the same mare—they’re named Cuartetera 01 through 06. […] “Every scientist that deals with epigenetics told me this would never work,” says Meeker

Full post

• ## Baby's first graphical causal models

We can describe causal models with directed graphs. The graph perspective allows us to specify precise procedures for determining when variables (vertices) are independent (d-separated) and dependent (d-connected).

### Causal graphs

We can represent causal models as directed graphs. The vertices in the graph represent different random variables—causes and effects—and the edges represent causal relationships. If two vertices do not have an edge between them, there is no direct causal relationship between them. For example:

Some technical details:

• These graphs must be acyclic. In a strict sense, something can’t be both a cause and an effect of something else. Thing A at time 1 can effect thing B at time 2 which affects thing A at time 3. Causation only flows forward in time and time is acyclic.
• A path on a directed graph is a sequence of edges joining a sequence of vertices. We can ignore direction of the edges when forming a path.

### Causal triplets

Now that we’ve presented the basic idea of modeling causal systems with graphs, we can start to use graphs as a tool to analyze causal models. We’ll start by looking at the smallest interesting part of a graph—a triplet consisting of three vertices and two edges. Such a triplet can be configured in one of three ways1. We give a name to each triplet and to the center vertex in each triplet.

Chains
Chains are the most straightforward. If A causes B and B causes C (A → B → C), then A causes C2. We call the central vertex B a mediator or a traverse. For example, if smoking causes (increased risk of) cancer and cancer causes (increased risk of) death, then smoking causes (increased risk of) death.
Forks
The next possible triplet configuration is what we call a fork. If B causes both A and C (A ← B → C), then A and C will not be independent in light of their common cause. For example, if smoking causes both yellowed fingers and lung cancer, we’d expect lung cancer and yellowed fingers to be correlated.
Inverted forks
The final possible triplet configuration is what we call an inverted fork. If A causes B and C causes B (A → B ← C), then A and C will be independent. We call the central vertex B a collider. For example, if smoking causes lung cancer and exposure to high doses of radiation also causes lung cancer, we wouldn’t expect smoking and exposure to high doses of radiation to be correlated.

So we can determine the causal and non-causal dependence between three factors by turning them into a causal graph and looking at the configuration of the edges.

Full post

• ## Critiques and claims regarding Evidence-based Policy

### Man of straw

It seems to me that Evidence-based Policy’s description of external validity as a “rules system” is something of a straw man. I doubt1 that researchers are rule-based automata applying the dictum of external validity unthinkingly. When evaluating whether the population, time and place are “similar” enough for the original study to have external validity, researchers surely interpret the direction and degree of similarity with care.

Frustratingly, EBP offers no real description of these supposed rules of external validity2. The closest I can find to a systematized procedure is (Khorsan and Crawford 2014). Which is not very close. It’s just three domains each rated on a three point scale. And rating those domains requires considerable human judgment.

### Tu quoque

If EBP were to back off from its straw man and allow that people think about external validity with discretion, we’d see that all the critiques of external validity apply similarly (see what I did there?) to the EBP approach with causal principles.

In the summary, I reorganized their critique of external validity a bit. To ensure that I’m not critiquing a distortion, I’ll match their original presentation here.

EBP complains that external validity’s guidance to apply the “same treatment” is vague. It only works if “you have identified the right description for the treatment”. But this complaint can be applied to the EBP approach too. An intervention only travels from there to here via the effectiveness argument if we find the right formulation of the causal (sub)principle. This is exactly what vertical search was about!

The Tamil Nadu Integrated Nutrition Program worked (TINP) and the Bangladesh Integrated Nutrition Program didn’t and it doesn’t much matter if you say that’s because “same treatment” was too vague or if you say it’s because vertical search failed to turn up the right description of the causal principle at work.

On either approach, mechanical application fails and discretion is required for success.

#### Similarity is too demanding

EBP makes fun of a study that says:

Thus [Moving to Opportunity] data … are strictly informative only about this population subset—people residing in high-rise public housing in the mid-1990’s, who were at least somewhat interested in moving and sufficiently organized to take note of the opportunity and complete an application. The MTO results should only be extrapolated to other populations if the other families, their residential environments, and their motivations for moving are similar to those of the MTO population. (Ludwig et al. 2008)

If our bar for similarity is this high, why even bother with a study that will never travel, EBP asks. But I think the above conclusion is actually semi-reasonable.

First, the authors are clearly being conservative in some regards. They don’t actually mean that the information expired with the mid-1990s. That’s a shorthand for a variety of factors which they expect are relevant but they haven’t individuated. It will be up to future policymakers and researchers to use their dIsCrEtIoN to determine whether all those implicit factors are present in new circumstances and this intelligent interpretation is an expected part of external validity—not a gross breach.

Second, it sounds a lot to me like the authors of the critiqued study are trying to identify the support factors that EBP loves. We could rewrite this in EBPese: “The intervention only plays a positive causal role if it’s supported by dissatisfaction with current housing and sufficient conscientiousness.”.

Finally, we could say that identifying support factors in the EBP approach is too demanding. If we just listed off every fact we knew about the context of the original intervention and called it a support factor, it would clearly be extremely demanding—nowhere else would have this precise combination of support factors. It’s only by filtering proposed support factors through human judgment that we get a more manageable set and escape the demandingness critique. But if we move away from the straw man version of external validity and allow ourselves to apply judgment there too, then we can say that an intervention context only has to be similar in certain ways—thereby escaping the demandingess critique.

Full post

• ## Optimism, regret and indifference

Last time, we continued our discussion of decision rules that apply in information-poor settings. In particular, we’ve been focusing on “decisions under ignorance”—decisions where the probabilities associated with various states of the world are unknown.

This time, we’ll look at three final decision rules in this category.

### Optimism-pessimism

The first decision rule we’ll look at is the optimism-pessimism rule.

#### Prose

Conceptually, optimism-pessimism is a generalization of the maximin and maximax rules. The maximin rule tells us to make the decision which has the best worst case outcome. The maximax rule tells us to make the decision which has the best best case outcome.

The optimism-pessimism rule tells us to look at both the best outcome which may come to pass after taking a particular action and the worst outcome which may come to pass. Then we should take a weighted average of the best and worst case outcome for each action and take the action that has the best such average. The weighting used in the decision rule is a parameter that the decision maker is free to choose based on how optimistic or pessimistic they are. So really the optimism-pessimism rule is a family of rules parameterized by a weighting factor.

Also, it’s worth noting that the optimism-pessimism family of rules no longer work in the fully general setting we were working with in previous posts. While we still don’t have probabilities associated with states of the world, we will need to move from an ordinal scale of outcomes to an interval scale. This shift is necessary because it doesn’t make sense to take a weighted average of ordinal data.

#### Example

You have the choice of two alternative routes to work. In good conditions, the first route takes 10 minutes and the second route 5 minutes. But the second route is prone to traffic and on bad days takes 20 minutes while the first route still takes 10 minutes.

If you’re perfectly balanced between optimism and pessimism, the optimism-pessimism rule is indifferent between the two routes here. If you’re more pessimistic than you are optimistic, you should take route 1–just like in maximin. If you’re more optimistic than you are pessimistic, you should take route 2–just like in maximax.

Full post

• ## YAAS evidence-based policy

RCTs are good for producing claims like “It worked there.”. But what we really care about is “Will it work here?”. The standard answer to this question is external validity, but EBP disapproves of this answer for a variety of reasons. Instead, they propose that we answer the question of generalization by thinking about causal principles, causal roles and support factors. To find support factors and the right causal principle, we must engage in, respectively, horizontal search and vertical search up and down the ladder of abstraction.

### Vignette

The Tamil Nadu Integrated Nutrition Project (TINP) was a program to reduce child malnutrition among the rural poor of India’s Tamil Nadu State. Core elements of the program were: nutrition counseling for pregnant mothers and supplementary food for the most deprived young children. TINP succeeded in reducing malnutrition substantially (Weaving 1995). Seeing this success, policymakers in Bangladesh launched the Bangladesh Integrated Nutrition Project (BINP) modeled on TINP. Unfortunately, six years later, childhood malnutrition in Bangladesh continued unabated (Weaving 1995).

Why did BINP fail where TINP succeeded?1 One of the main problems was that, while mothers did indeed learn about childhood nutrition, mothers often weren’t the primary decision makers in matters of nutrition. In rural Bangladesh, men typically do the shopping. And if a mother lived with her mother-in-law, that mother-in-law was the final authority on issues within the women’s domain. Because the decision-makers hadn’t received BINP’s nutrition counseling, they often used supplemental food from BINP as a substitute and reallocated other food away from mother and child.

### Effectiveness

#### The problem: Generalization

Even in the absence of that holy grail—the RCT—there’s considerable confidence that TINP worked. But BINP didn’t. Evidence-Based Policy takes problems like this as it’s central concern. How do we move from “It worked there.”—efficacy—to “It will work here.”—effectiveness?

#### The standard solution: External validity

The standard way of thinking about this problem is external validity. Briefly, a study is internally valid when it provides strong reasons to believe its conclusions. A study has external validity when it can be generalized to other contexts—different times, places and populations.

But EBP disdains external validity. Claims of external validity usually take a shape like “Study A gives us reason to believe that intervention B—which worked on population C at time and place D—will also work on similar (to population C) population E at similar (to time and place D) time and place F.” But the word “similar” is doing all the work here. What does it mean?

“Similar” can’t mean “identical”—then all studies would be pointless because we would never have external validity and could never generalize. But “similar” also shouldn’t be construed too permissively. If you insist that the population of Manhattanites and the population of rural Tibetans are “similar” because they both consist of humans living in communities with hierarchies of esteem under a system of hegemonic capitalism on planet Earth, you’ll be find yourself perpetually surprised when your interventions fail to replicate.

Furthermore, similarity means radically different things in different contexts. If the original study is about reducing Alzheimer’s for high risk populations, similarity means biomedical similarity and certain elderly people in rural Tibet may in fact be more similar to certain elderly people in Manhattan than either subpopulation is to their neighbors. On the other hand, if the study is about the welfare effects of exposure to pervasive advertising, rural Tibet and Manhattan count as pretty dissimilar.

So “similar” has to mean similar in the right ways and to the right degree2. The external validity claim then becomes something like “Study A gives us reason to believe that intervention B—which worked on population C at time and place D—will also work on the right population E at the right time and place F.” But this is pretty tautological. To be a useful tool, external validity should transform a hard problem into a simpler one. But it turns out that, once we unpack things, it’s hard to know what “similar” means other than “right” and we’re back where we started—we have to rely on outside knowledge to know if we can translate “It worked there.” to “It will work here.”.

#### The Evidence-based Policy solution

To get a good argument from “it works somewhere” to “it will work here” facts about causal principles here and there are needed.
Full post

Over the past few decades, labor force participation has sharply dropped for men ages 20-34. Theories about the root cause range from indolence, to a lack of skills and training, to offshoring, to (perhaps most interestingly) the increasing attractiveness and availability of leisure and media entertainment. In this essay, we propose that the drop in labor participation rate of young men is a result of a combination of factors: (i) a decrease in cost of access to media entertainment leisure, (ii) increases in both the availability and (iii) quality media entertainment leisure, and (iv) a decrease in the marginal signalling utility of (conspicuous) consumption goods for all but the highest earners.

Analyses of the genre preferences of over 3,000 individuals revealed a remarkably clear factor structure. Using multiple samples, methods, and geographic regions, data converged to reveal five entertainment-preference dimensions: Communal, Aesthetic, Dark, Thrilling, and Cerebral.

• Engineering part 1: station construction methods
• Engineering part 2: mezzanines
• Management part 1: procurement
• Management part 2: conflict resolution
• Management part 3: project management
• Management part 4: agency turf battles
• Institutions part 1: political lading with irrelevant priorities
• Institutions part 2: political incentives
• Institutions part 3: global incuriosity

And when it comes to journalism, committed capitalists are always better materialists than the liberals. And that’s why I read FT. Sure, they’re rooting for the other team, but at least they know the game.

Related: Searching for “alan dershowitz martha’s vineyard nytimes” turns up five different items on the very important story of Dershowitz being shunned by his fellow Vintners.

We check in with people at each stage of the cash transfer process to see how things are going. Take a look at some of their stories as they appear here in real-time.

Full post

A weblog