## Uncertainty and sensitivity analysis of GiveWell's top charity rankings

Arguably, we don’t care about the exact cost-effectiveness estimates of each of GiveWell’s top charities. Instead, we care about their relative values. By using distance metrics across these multidimensional outputs, we can perform uncertainty and sensitivity analysis to answer questions about:

- how uncertain we are about the overall relative values of the charities
- which input parameters this overall relative valuation is most sensitive to

##### Contents

In the last two posts, we performed uncertainty and sensitivity analyses on GiveWell’s charity cost-effectiveness estimates. Our outputs were, respectively:

- probability distributions describing our uncertainty about the value per dollar obtained for each charity and
- estimates of how sensitive each charity’s cost-effectiveness is to each of its input parameters

One problem with this is that we are not supposed to take the cost-effectiveness estimates literally. Arguably, the real purpose of GiveWell’s analysis is not to produce exact numbers but to assess the relative quality of each charity evaluated.

Another issue is that by treating each cost-effectiveness estimate as independent we underweight parameters which are shared across many models. For example, the moral weight that ought to be assigned to increasing consumption shows up in many models. If we consider all the charity-specific models together, this input seems to become more important.

### Metrics on rankings

We can solve both of these problems by abstracting away from particular values in the cost-effectiveness analysis and looking at the overall rankings returned. That is we want to transform:

Charity | Value per $10,000 donated |
---|---|

GiveDirectly | 38 |

The END Fund | 222 |

Deworm the World | 738 |

Schistosomiasis Control Initiative | 378 |

Sightsavers | 394 |

Malaria Consortium | 326 |

Against Malaria Foundation | 247 |

Helen Keller International | 223 |

into:

But how do we usefully express probabilities over rankings^{1} (rather than probabilities over simple cost-effectivness numbers)? The approach we’ll follow below is to characterize a ranking produced by a run of the model by computing its distance from the reference ranking listed above (i.e. GiveWell’s current best estimate). Our output probability distribution will then express how far we expect to be from the reference ranking—how much we might learn about the ranking with more information on the inputs. For example, if the distribution is narrow and near 0, that means our uncertain input parameters mostly produce results similar to the reference ranking. If the distribution is wide and far from 0, that means our uncertain input parameters produce results that are highly uncertain and not necessarily similar to the reference ranking.

#### Spearman’s footrule

What is this mysterious distance metric between rankings that enables the above approach? One such metric is called Spearman’s footrule distance. It’s defined as:

\[d_{fr}(u, v) = \sum_{c \in A} |\text{pos}(u,c) - \text{pos}(v, c)|\]

where:

- \(u\) and \(v\) are rankings,
- \(c\) varies over all the elements \(A\) of the rankings and
- \(\text{pos}(r, x)\) returns the integer position of item \(x\) in ranking \(r\).

In other words, the footrule distance between two rankings is the sum over all items of the (absolute) difference in positions for each item. (We also add a normalization factor so that the distance varies ranges from 0 to 1 but omit that trivia here.)

So the distance between A, B, C and A, B, C is 0; the (unnormalized) distance between A, B, C and C, B, A is 4; and the (unnormalized) distance between A, B, C and B, A, C is 2.

#### Kendall’s tau

Another common distance metric between rankings is Kendall’s tau. It’s defined as:

\[d_{tau}(u, v) = \sum_{\{i,j\} \in P} \bar{K}_{i,j}(u, v)\]

where:

- \(u\) and \(v\) are again rankings,
- \(i\) and \(j\) are items in the set of unordered pairs \(P\) of distinct elements in \(u\) and \(v\)
- \(\bar{K}_{i,j}(u, v) = 0\) if \(i\) and \(j\) are in the same order (concordant) in \(u\) and \(v\) and \(\bar{K}_{i,j}(u, v) = 1\) otherwise (discordant)

In other words, the Kendall tau distance looks at all possible pairs across items in the rankings and counts up the ones where the two rankings disagree on the ordering of these items. (There’s also a normalization factor that we’ve again omitted so that the distance ranges from 0 to 1.)

So the distance between A, B, C and A, B, C is 0; the (unnormalized) distance between A, B, C and C, B, A is 3; and the (unnormalized) distance between A, B, C and B, A, C is 1.

#### Angular distance

One drawback of the above metrics is that they throw away information in going from the table with cost-effectiveness estimates to a simple ranking. What would be ideal is to keep that information and find some other distance metric that still emphasizes the relationship between the various numbers rather than their precise values.

Angular distance is a metric which satisfies these criteria. We can regard the table of charities and cost-effectiveness values as an 8-dimensional vector. When our output produces another vector of cost-effectiveness estimates (one for each charity), we can compare this to our reference vector by finding the angle between the two^{2}.