## Instrumental variables on causal graphs

Last time we talked about viewing d-separation as a tool for model selection. But we’re pretty limited in the causal models we can distinguish between by only observing our variables of interest—any two graphs with the same set of d-separations are indistinguishable. Instrumental variables are a common tool for trying to get around the limitations of purely observational data.

### Instrumental variables

Instrumental variables (IV) are variables that we’re not intrinsically interested in but that we look at in an attempt to suss out causality. The instrument must be correlated with our cause, but its only impact on the effect should be via the cause.

The classic example is about—you guessed it—smoking. Because running an RCT on smoking is ethically verboten, we’re limited to observational data. How can we determine if smoking causes lung cancer from observational data alone? An instrumental variable! To reiterate, we want a factor that affects smoking prevalence but (almost certainly) does not affect lung cancer in other ways. Finding an instrument that satisfies the IV criteria generally seems to require substantial creativity. Can you think of an instrument for the causal effect of smoking on lung cancer?

…

An instrument that meets these criteria is a tax on cigarettes. We expect smoking to decrease as taxes increase, but it seems hard to imagine a cigarette tax otherwise having an effect on lung cancer.

### Instrumental variables on causal graphs

Okay, so that’s what IVs are at a high level. But what are they concretely in the graphical causal model setting we’ve been developing?

#### A brief notational interlude

We’ll get this out of the way here:

- \(\perp\!\!\!\perp\) is the symbol for d-separation
- Once we add the strikethrough, \(\not\!\!{\perp\!\!\!\perp}\) mean d-connected.
- If \(G\) is a graph, \(G_{\overline{X}}\), is \(G\) in which all the edges pointing to vertex X have been removed
^{1}.

#### Defined

We’ll start with the definition and then try to build up a feel for it. An instrumental variable X for the causal effect of Y on Z in graph G must be:

- d-connected to our cause Y—\((X \not\!\!{\perp\!\!\!\perp} Y)_G\)
- d-separated from our effect Z after severing the cause Y from all its parents—\((X \perp\!\!\!\perp Z)_{G_\overline{Y}}\)