• ## The reliability of moral judgments: A survey and systematic(ish) review

We’d like our moral judgments to be reliable—to be sensitive only to factors that we endorse as morally relevant. Experimental studies on a variety of putatively irrelevant factors—like the ordering of dilemmas and incidental disgust at time of evaluation—give some (but not strong, due to methodological issues and limited data) reason to believe that our moral judgments do in practice track these irrelevant factors. Theories about the origins and operations of our moral faculties give further reason to suspect that our moral judgments are not perfectly reliable. There are a variety of responses which try to rehabilitate our moral judgments—by denying the validity of the experimental studies, by blocking the inference to the people and situations of most concern, by accepting their limited reliability and shrugging, by accepting their limited reliability and working to overcome it—but it’s not yet clear whether any of them do or can succeed.

(This post is painfully long. Coping advice: Each subsection within Direct (empirical) evidence, within Indirect evidence, and within Responses is pretty independent—feel free to dip in and out as desired. I’ve also put a list-formatted summary at the end of each these sections boiling down each subsection to one or two sentences.)

### Intro

Dan is a student council representative at his school. This semester he is in charge of scheduling discussions about academic issues. He often picks topics that appeal to both professors and students in order to stimulate discussion.

Is Dan’s behavior morally acceptable? On first glance, you’d be inclined to say yes. And even on the second and third glance, obviously, yes. Dan is a stand-up guy. But what if you’d been experimentally manipulated to feel disgust while reading the vignette? If we’re to believe (Wheatley and Haidt 2005), there’s a one-third chance you’d judge Dan as morally suspect. ‘One subject justified his condemnation of Dan by writing “it just seems like he’s up to something.” Another wrote that Dan seemed like a “popularity seeking snob.”’

The possibility that moral judgments track irrelevant factors like incidental disgust at the moment of evaluation is (to me, at least) alarming. But now that you’ve been baited, we can move on the boring, obligatory formalities.

Full post

• ## Uncertainty and sensitivity analysis of GiveWell's top charity rankings

Arguably, we don’t care about the exact cost-effectiveness estimates of each of GiveWell’s top charities. Instead, we care about their relative values. By using distance metrics across these multidimensional outputs, we can perform uncertainty and sensitivity analysis to answer questions about:

• how uncertain we are about the overall relative values of the charities
• which input parameters this overall relative valuation is most sensitive to

In the last two posts, we performed uncertainty and sensitivity analyses on GiveWell’s charity cost-effectiveness estimates. Our outputs were, respectively:

• probability distributions describing our uncertainty about the value per dollar obtained for each charity and
• estimates of how sensitive each charity’s cost-effectiveness is to each of its input parameters

One problem with this is that we are not supposed to take the cost-effectiveness estimates literally. Arguably, the real purpose of GiveWell’s analysis is not to produce exact numbers but to assess the relative quality of each charity evaluated.

Another issue is that by treating each cost-effectiveness estimate as independent we underweight parameters which are shared across many models. For example, the moral weight that ought to be assigned to increasing consumption shows up in many models. If we consider all the charity-specific models together, this input seems to become more important.

### Metrics on rankings

We can solve both of these problems by abstracting away from particular values in the cost-effectiveness analysis and looking at the overall rankings returned. That is we want to transform:

into:

But how do we usefully express probabilities over rankings1 (rather than probabilities over simple cost-effectivness numbers)? The approach we’ll follow below is to characterize a ranking produced by a run of the model by computing its distance from the reference ranking listed above (i.e. GiveWell’s current best estimate). Our output probability distribution will then express how far we expect to be from the reference ranking—how much we might learn about the ranking with more information on the inputs. For example, if the distribution is narrow and near 0, that means our uncertain input parameters mostly produce results similar to the reference ranking. If the distribution is wide and far from 0, that means our uncertain input parameters produce results that are highly uncertain and not necessarily similar to the reference ranking.

#### Spearman’s footrule

What is this mysterious distance metric between rankings that enables the above approach? One such metric is called Spearman’s footrule distance. It’s defined as:

$d_{fr}(u, v) = \sum_{c \in A} |\text{pos}(u,c) - \text{pos}(v, c)|$

where:

• $$u$$ and $$v$$ are rankings,
• $$c$$ varies over all the elements $$A$$ of the rankings and
• $$\text{pos}(r, x)$$ returns the integer position of item $$x$$ in ranking $$r$$.

In other words, the footrule distance between two rankings is the sum over all items of the (absolute) difference in positions for each item. (We also add a normalization factor so that the distance varies ranges from 0 to 1 but omit that trivia here.)

So the distance between A, B, C and A, B, C is 0; the (unnormalized) distance between A, B, C and C, B, A is 4; and the (unnormalized) distance between A, B, C and B, A, C is 2.

#### Kendall’s tau

Another common distance metric between rankings is Kendall’s tau. It’s defined as:

$d_{tau}(u, v) = \sum_{\{i,j\} \in P} \bar{K}_{i,j}(u, v)$

where:

• $$u$$ and $$v$$ are again rankings,
• $$i$$ and $$j$$ are items in the set of unordered pairs $$P$$ of distinct elements in $$u$$ and $$v$$
• $$\bar{K}_{i,j}(u, v) = 0$$ if $$i$$ and $$j$$ are in the same order (concordant) in $$u$$ and $$v$$ and $$\bar{K}_{i,j}(u, v) = 1$$ otherwise (discordant)

In other words, the Kendall tau distance looks at all possible pairs across items in the rankings and counts up the ones where the two rankings disagree on the ordering of these items. (There’s also a normalization factor that we’ve again omitted so that the distance ranges from 0 to 1.)

So the distance between A, B, C and A, B, C is 0; the (unnormalized) distance between A, B, C and C, B, A is 3; and the (unnormalized) distance between A, B, C and B, A, C is 1.

#### Angular distance

One drawback of the above metrics is that they throw away information in going from the table with cost-effectiveness estimates to a simple ranking. What would be ideal is to keep that information and find some other distance metric that still emphasizes the relationship between the various numbers rather than their precise values.

Angular distance is a metric which satisfies these criteria. We can regard the table of charities and cost-effectiveness values as an 8-dimensional vector. When our output produces another vector of cost-effectiveness estimates (one for each charity), we can compare this to our reference vector by finding the angle between the two2.

Full post

• ## Sensitivity analysis of GiveWell's cost-effectiveness analysis

Visual (scatter plot) and delta moment-independent sensitivity analysis on GiveWell’s cost-effectiveness models show which input parameters the cost-effectiveness estimates are most sensitive to. Preliminary results (given our input uncertainty) show that some input parameters are much more influential on the final cost-effectiveness estimates for each charity than others.

Last time we introduced GiveWell’s cost-effectiveness analysis which uses a spreadsheet model to take point estimates of uncertain input parameters to point estimates of uncertain results. We adjusted this approach to take probability distributions on the input parameters and in exchange got probability distributions on the resulting cost-effectiveness estimates. But this machinery lets us do more. Now that we’ve completed an uncertainty analysis, we can move on to sensitivity analysis.

### Sensitivity analysis

The basic idea of sensitivity analysis is, when working with uncertain values, to see which input values most affect the output when they vary. For example, if you have the equation $$f(a, b) = 2^a + b$$ and each of $$a$$ and $$b$$ varies uniformly over the range from 5 to 10, $$f(a, b)$$ is much more sensitive to $$a$$ then $$b$$. A sensitivity analysis is practically useful in that it can offer you guidance as to which parameters in your model it would be most useful to investigate further (i.e. to narrow their uncertainty).

### Visual sensitivity analysis

The first kind of sensitivity analysis we’ll run is just to look at scatter plots comparing each input parameter to the final cost-effectiveness estimates. We can imagine these scatter plots as the result of running the following procedure many times1: sample a single value from the probability distribution for each input parameter and run the calculation on these values to determine a result value. If we repeat this procedure enough times, it starts to approximate the true values of the probability distributions.

(One nice feature of this sort of analysis is that we see how the output depends on a particular input even in the face of variations in all the other inputs—we don’t hold everything else constant. In other words, this is a global sensitivity analysis.)

(Caveat: We are again pretending that we are equally uncertain about each input parameter and the results reflect this limitation. To see the analysis result for different input uncertainties, edit and run the Jupyter notebook.)

#### Direct cash transfers

##### GiveDirectly

The scatter plots show that, given our choice of input uncertainty, the output is most sensitive (i.e. the scatter plot for these parameters shows the greatest directionality) to the input parameters:

Full post

A weblog