• Shilling for Anki

I like Anki.

The decline and rise of long-term memory

I used to scorn long-term memory. My brain is an exquisite organ for vanquishing conundra, thank you very much, not some library card catalog. I assume I assimilated this attitude from discussion like1:

But, after reading up on pedagogy in articles like the one we discussed earlier, “[I] no longer see longterm memory as a repository of isolated, unrelated facts that are occasionally stored and retrieved; instead, it is the central structure of human cognitive architecture.” (Sweller 2008)

Spaced repetition

Thus, I present my own entry in the burgeoning subgenre of spaced repetition software encomia.

Briefly, spaced repetition software optimally schedules flashcard review to radically boost retention of information in long-term memory. Anki is one such program. I’d heard of Anki in the past but only managed to acquire the habit of regular review more recently. I’ve performed a total of 4803 reviews of 1518 cards during review sessions covering 40 of the last 42 days (Let it never be said that I do things in half measures.). This is probably too intense and more than I’d recommend for most people. But that information should serve to calibrate you as to how much experience I have with Anki.

The feel of a thing

The dominant feeling I now have is a mild frustration at the betrayal of present me by past me—so much of my prior reading and general intellectual development was wasted effort. If we each had anterograde amnesia and forgot 100% of what we’d read or otherwise experienced, we’d surely take a different approach to life. We wouldn’t just wander along learning and forgetting in an endless cycle. But the truth of our memories is not so far off!

• Assorted Links VII

“The meteorite itself was so massive that it didn’t notice any atmosphere whatsoever,” said Rebolledo. “It was traveling 20 to 40 kilometers per second, 10 kilometers — probably 14 kilometers — wide, pushing the atmosphere and building such incredible pressure that the ocean in front of it just went away.”

These numbers are precise without usefully conveying the scale of the calamity. What they mean is that a rock larger than Mount Everest hit planet Earth traveling twenty times faster than a bullet. This is so fast that it would have traversed the distance from the cruising altitude of a 747 to the ground in 0.3 seconds. The asteroid itself was so large that, even at the moment of impact, the top of it might have still towered more than a mile above the cruising altitude of a 747. In its nearly instantaneous descent, it compressed the air below it so violently that it briefly became several times hotter than the surface of the sun.

“The pressure of the atmosphere in front of the asteroid started excavating the crater before it even got there,” Rebolledo said. “Them when the meteorite touched ground zero, it was totally intact. It was so massive that the atmosphere didn’t even make a scratch on it.”

Unlike the typical Hollywood CGI depictions of asteroid impacts, where an extraterrestrial charcoal briquette gently smolders across the sky, in the Yucatan it would have been a pleasant day one second and the world was already over by the next. As the asteroid collided with the earth, in the sky above it where there should have been air, the rock had punched a hole of outer space vacuum in the atmosphere. As the heavens rushed in to close this hole, enormous volumes of earth were expelled into orbit and beyond — all within a second or two of impact.

“So there’s probably little bits of dinosaur bone up on the moon,” I asked.

“Yeah, probably.”

Survival in the first hours of the Cenozoic

Life confined to Earth’s surface would have perished well before incineration. After ignition temperature was reached, fires would not have spread from one area to another in the usual way. Rather, fires would have ignited nearly simultaneously at places having available fuel.
The shortest-lived child of Prohibition actually survived to adulthood. This was the change in drinking patterns that depressed the level of consumption compared with the pre-Prohibition years. Straitened family finances during the Depression of course kept the annual per capita consumption rate low, hovering around 1.5 US gallons. The true results of Prohibition’s success in socializing Americans in temperate habits became apparent during World War II, when the federal government turned a more cordial face toward the liquor industry than it had during World War I, and they became even more evident during the prosperous years that followed.50 Although annual consumption rose, to about 2 gallons per capita in the 1950s and 2.4 gallons in the 1960s, it did not surpass the pre-Prohibition peak until the early 1970s.

In MUSE a distinction is made between present and past perfect (i.e., within the perfect aspect, tense is marked). Perfect means that the action is completed. AAVE has two additional markers for aspect which extend the perfect:

Muse AAVE
present perfect I have walked I have walked
past perfect I had walked I had walked
completive n/a I done walked
remote time n/a I been walked
Study 1 (N = 228) examined 49 common variants (SNPs) within 10 candidate genes and identified a nominal association between a polymorphism (rs237889) of the oxytocin receptor gene (OXTR) and variation in deontological vs utilitarian moral judgment

• The moral imperative and mortal peril of maximizing

The world as we understand it

A single death is a tragedy; a million deaths is a statistic. —Probably not Joseph Stalin

We humans are famously bad at finely-tuned and well-calibrated caring. In an early study on scope neglect, experimental subjects were willing to pay $80 to save 2,000 migrating birds from drowning in oil ponds and$78 to save 20,000 (Desvousges et al. 1992). Alas, our sentiments are not a precision instrument.

But suppose that our moods better mapped to the world as we understand it:

Whatever betrayal you feel watching Doubt, the betrayal would feel more than 300 times as sharp. 300 abusers represents just Pennsylvania. The full tally is unknown and probably unknowable.

Whatever loss you feel watching Up, you’d be struck by it again and again for each of the approximately 80,000 miscarriages per day across the world (Obstetricians, Gynecologists, and others 2002).

Whatever despondency you feel watching Winter’s Bone, your feelings would be magnified in depth and breadth by the knowledge that 836 million people in the world live on less than \$1.25/day (UN 2015).

Whatever precarity you feel watching Grapes of Wrath, you’d feel it a shattering 2 billion times more intensely for the approximately 25 percent of the world population that live on small farms.

Whatever despair you feel watching The Skeleton Twins, you would feel that way for each of the estimated 334 million people in the world with depression (Organization and others 2017).

Whatever loneliness you feel watching Three Colors: Blue, you might feel that same feeling 2,500 times a day for each of the one quarter of US women who are widowed by age 65 (Berardo 1992).

Whatever shame you feel watching Tokyo Sonata, you’d feel it 192 million times over for the global unemployed in 2018 (International Labour Organization 2018). Then you’d remember that any period of unemployment’s negative effects last at least 10 years [louis2002].

Whatever horror you feel watching The Battle of Algiers, you’d feel it just the same for each and every one of the approximately 80,000 people who will die from battle in state-based conflicts this year (Roser 2018).

Whatever impotence you feel watching Killer of Sheep, you’d remember that there are over 900,000 black people in Los Angeles alone living under flawed structures and institutions.

Whatever grief you feel watching Amour, you’d have around 3.5 minutes to recover before the grief of another US death by stroke crashed over you.

Whatever hopelessness you feel watching Cool Hand Luke, it would echo and rebound magnified from the cells of more than 10 million detainees (Walmsley and others 2015).

Whatever suffocation you feel watching Requiem for a Dream, you’d feel it on behalf of the approximately 164 million people with substance use disorders (Ritchie and Roser 2018).

Whatever ache you feel watching Grave of the Fireflies, it would hollow you out each day along with the approximately 815 million people who are chronically undernourished (Organization 2014).

But, for good or for ill, we are bounded and parochial. We cannot comprehend in any but the most abstracted ways the daily ruin that nature visits on us, that we visit on ourselves and each other.

The world as we imagine it

[T]hose who would play this [utopian] game on the strength of their own private opinion … and would brave the frightful bloodshed and misery that would ensue if the attempt was resisted—must have a serene confidence in their own wisdom on the one hand and a recklessness of other people’s sufferings on the other, which Robespierre and St. Just […] scarcely came up to. (Mill 1879)

The tragedies of life on Earth are no recent revelation. The Epic of Gilgamesh—our earliest surviving great work of literature—is about the hero’s vain attempt to undo the great tragedy of his life.

For some, the knowledge that tragedy has been with us from the beginning inspires not acquiescence but determination that we might leave it behind before we reach the end. These are names that go down in history. Names like:

• A non-exhaustive list of putative problems with ignorant priors

The principle of indifference (a.k.a. the principle of insufficient reason) suggests that when considering a set of possibilities and there’s no known reason for granting special credence to one possibility, we ought to assign all possibilities the same credence (which, on the Bayesian point of view, is also a probability). For example, when someone asks what the result of a 6-sided die roll is, the principle of indifference recommends we assign a probability of 1/6 to each outcome. Slightly more interesting is that it /also/ recommends assigning a probability of 1/6 to each outcome even when we’re told the die is weighted as long as we’re not told how it’s weighted.

There’s a definite intuitive plausibility and appeal to this rule. But it turns out there are a lot of difficulties when it comes to actually operationalizing it. Below, I list some of the problems that have been raised over the years. Some of these problems seem silly to me and will doubtless seem silly to you. Others strike me as important. I list them all here regardless and ignore any claimed solutions for the moment.

Coarsening

“What is the origin country of this unknown traveler? France, Ireland or Great Britain?”

Naive application of the principle of indifference (NAPI) suggests we assign probability 1/3 to each possibility.

The question can be rephrased: “What is the origin country of this unknown traveler? France, or the British Isles?”.

In this case, NAPI suggests we ought to assign a probability 1/2 to each possibility.

So, depending on the framing, we assign probability 1/2 or 1/3 to the same outcome—the traveler is from France.

Negation

“I’ve just pulled a colored ball from an urn containing an equal number of red, black and yellow balls. Which color is the ball? Red, black, or yellow?”

NAPI suggests we assign probability 1/3 to each possibility.

The question can be rephrased: “Which color is the ball? Red or not red?”.

In this case, NAPI suggests we ought to assign a probability 1/2 to each possibility.

So, depending on the framing, we assign probability 1/2 or 1/3 to the same outcome—the ball is red.

“I have an equilateral triangle inscribed in a circle. I’ve also chosen a chord in the circle randomly. What is the probability that the chord is longer than a side of the triangle?”

If we construct our random chords by choosing two random points on the circumference of the circle and construct a chord between them, we find that the probability of a long chord is 1/3.

If we construct our random chords by choosing a random radius and then constructing a chord perpendicular to a random point on that radius, we find that the probability of a long chord is 1/2.

If we construct our random chords by choosing a random point inside the circle and constructing a chord with that point as its midpoint, we find that the probability of a long chord is 1/4.

So depending on our framing, we assign probability of 1/4, 1/3 or 1/2 to the same proposition.

• Ideal theory and decision theory

The ideal theory debate is actually applied decision theory. The tools and vocabulary of decision theory—at a minimum, the von Neumann-Morgenstern utility theorem, the concept of epistemic risk aversion, and the area of sequential decision theory—are useful in this new domain.

Ideal theory

If I may editorialize, the ideal theory debate is essentially about how to translate our understanding of justice into actions in the present. Reductively, one side (the idealists) advocates for always moving the world we inhabit closer to the ideally just world while the other side (the non-idealists) advocates for always moving the world we inhabit toward the best adjacent world.

What’s not usually at issue in the ideal theory debate is: our understanding of the status quo, our predictive models of the future, or our notion of justice. That’s not to say that there’s consensus on these issues—far from it. It’s just that discussion of these issues doesn’t fall under the heading of ‘ideal theory’. No one considers themselves to be waging that debate when they talk about currently existing inequality in Germany or what justice recommends with regard to positive and negative rights. By all this I merely mean to emphasize that the scope of the ideal theory debate is rather small—given all the presuppositions above, what algorithm do we employ to choose the next possible world we’ll inhabit?

Hopefully, by framing the ideal theory debate in the foregoing terms, I’ve predisposed you to my point of view: The subject matter of the ideal theory debate is also the subject matter of decision theory. That is, the ideal theory debate is really a debate about applied decision theory.

Normative decision theory

Webster’s dictionary defines—*cough*—(Hansson 1994) says “decision theory is concerned with goal-directed behaviour in the presence of options”. We’ll try to make this description more comprehensive by appealing to Leonard Savage’s formalization. The hope is that by describing decision theory fully, we can see how the boundaries of the ideal theory debate line up with the boundaries of decision theory.

• Human cognitive architecture and learning

Problem-solving relies on working memory. Working memory is very limited except when it comes to information that’s also in long-term memory. Long-term memory is thus central to expertise. Committing things to long-term memory (i.e. learning) is best accomplished by the careful management of cognitive load.

Preface

The following post is basically a straightforward regurgitation of (part of) (Sweller 2008). That paper is very readable so there’s really no reason to read the rest of this post. With that out of the way, I liked this paper for two main reasons:

• It fairly radically changed my opinion on the value of long-term memory in ways that are practically important
• It provides a coherent theory which unifies many phenomena. A coherent theory is easier to remember and easier to apply in novel situations than a disparate collection of facts.

Working memory

Essentially all human problem-solving is about the manipulation of items in working memory. Alas, our working memory is tragically limited—traditionally, research suggests the upper limit on the number of ‘chunks’ in working memory is the “magical number seven”. (Interestingly, there’s some evidence that chimpanzees have superior working memory to humans. Video and paper). Despite this grievous limitation, experience suggests that humans do actually carry out impressive feats of problem-solving. How?

Long-term memory

The key is exploiting a ‘loophole’—“huge amounts of organized information can be transferred from long-term memory to working memory without overloading working memory” (Sweller 2008). Thus, we arrive at the central importance of long-term memory to human cognition. Contra the denigration of rote memorization, “[task-relevant long-term memory] is the only reliable difference that has been obtained differentiating novices and experts in problem-solving skill and is the only difference required to fully explain why an individual is an expert in solving particular classes of problems” (Sweller 2008). In other words, long-term memory is necessary and sufficient to explain expertise.

Chess board recall

We can make illustrate these claims with the results of a classic study (De Groot 2014). Look at the next image for a few seconds, close your eyes, and try to recall the positions of pieces.

If you’re a chess amateur, this should have been quite hard (i.e. you probably misremembered the pieces). On the other hand, if you’re a chess expert, this was probably fairly straightforward.

• Assorted Links VI

Brabant says he had heard about Côté’s experience, so he sent in a sample of DNA from his French poodle, Mollie, to the same company CAPC uses for DNA testing.

It determined the dog had five per cent Native American ancestry: two per cent Oji-Cree, two per cent Saulteaux and one per cent Mississauga.

I tried to work out what deep learning was about. Most of the candidates were too sleep deprived to dissemble. Basic answer: every sexy project we do—flying quadcopters, getting another 0.1% on the MNIST—is basically one graduate student.

You work out the topology of the neural net. Then you find the weights. How? The answer: “graduate student descent”, a little pun to giggle over floppy croissants at the student cafe—in short, there’s no good answer, a human being sits there and twiddles things about.

Machine learning is an amazing accomplishment of engineering. But it’s not science. Not even close. It’s just 1990, scaled up. It has given us literally no more insight than we had twenty years ago.

“[T]he three mostly deadly types the Aedes, Anopheles and Culex are found almost all over the world and are responsible for around 17 per cent of infectious disease transmissions globally.”

From November 2017 to June this year, non-biting male Aedes aegypti mosquitoes sterilised with the natural bacteria Wolbachia were released in trial zones along the Cassowary Coast in North Queensland.

They mated with local female mosquitoes, resulting in eggs that did not hatch and a significant reduction of their population.

Which begs the question: why do admen and adwomen stay in their industry, when it’s generally viewed so negatively?

That moral stigma shows up in annual Gallup polls, where American are asked how they would rate “the honesty and ethical standards” of people in different fields. Year after year, advertising practitioners come in around the bottom of that list, right along with members of congress, lobbyists, and car salespeople.

Through the first author’s field observations and interviews, we found that advertising practitioners justified the moral worth of their work through narratives that tied their work to some conception of the common good, emphasizing the good service they believe advertising can provide to society.

• An example of the lazy approach to AI safety

The lazy approach to AI safety suggests that we explicitly encode our moral uncertainty into artificial agents. Then agents can decide to undertake moral investigation via value of information calculations. We make the description of this approach more concrete by examining its application in a nearly trivial setting.

Examples often clarify. Let’s see an example of the lazy approach to AI safety in action.

The setting

Suppose The Professor has performed another bamboo miracle and built an AI agent on the island. Sadly, the castaways forgot the agent in their frantic final escape. So it’s just our agent, alone on an island in the Pacific.

As a man of taste and refinement, the professor has followed the lazy approach to AI safety. As such, the agent’s utility futility is quite simple: The utility of any state of affairs is exactly the moral good of that state of affairs according to whatever turns out to be the One True Moral Theory (OTMT)1. In symbols, $$u(x) = g(x)$$ where $$u : X \rightarrow \mathbb{R}$$ and $$g : X \rightarrow \mathbb{R}$$2 where $$X$$ is the set of possible states of affairs, $$u$$ is the utility function, and $$g$$ evaluates the moral goodness of a state of affairs according to the OTMT.

For simplicity, we’ll suppose there are only two possible interventions the agent can make: Ze can harvest coconuts or harvest bamboo. Furthermore, we’ll fiat that there are only two possible moral theories in all the world: the coconut imperative and bamboocentrism. According to the coconut imperative, the goodness of a state of affairs is defined as $$g_c(b, c) = 0 \cdot b + 3 \cdot c$$ where $$c$$ is the total number of coconuts that have been harvested and $$b$$ is the total number of bamboo shoots that have been harvested. On the bamboocentric view of things, $$g_b(b, c) = 2 \cdot b + 0 \cdot c$$. (The fact that we only have moral theories which express goodness in terms of real numbers permits our earlier simplification of assuming that the OTMT takes this shape.)

Initial behavior

Before the Professor abandoned his child, he programmed the agent with a uniform prior over all possible ethical theories. That is, the agent thinks there’s a 50% chance bamboocentrism is true and a 50% chance the coconut imperative is the OTMT. Thus, in the absence of better information, the agent spends zir days harvesting coconuts (we assume the resources required to harvest a coconut are identical to the resources required to harvest a bamboo stalk). To be fully explicit:

• False dichotomies and the ideal theory debate

The dichotomy of ideal and anti-ideal theory is a false one. For each supposedly unique feature of ideal theorizing, there is a scaled-down analogue in non-ideal theory. Furthermore, all the dilemmas in the debate can be fruitfully approached as problems in decision theory.

This post is deprecated in favor of Ideal theory and decision theory.

Ideal and non-ideal theory

We’ve already described ideal theory in previous posts, but we’ll give a short recap here for the sake of self-sufficiency. Ideal theory suggests that when making decisions about alternative social worlds—that is, about different political and economic institutions, we should have an ideally just society in mind. Non-idealists argue that this information is irrelevant; we only need to be able to perform pairwise comparisons. A popular metaphor in the area is that of mountain climbing. In the language of this metaphor, ideal theorists like John Rawls suggest that mountaineers orient themselves toward Everest while non-idealists like Amartya Sen suggest that knowledge of Everest is irrelevant when comparing the heights of Kilimanjaro and Denali.

Thesis

I contend this is a debate which can be dissolved. There is no necessary opposition between incrementalism and idealism. Instead, all of these perspectives can be ably unified under the framework of decision theory.

Dichotomy

Before I can make the argument that’s it’s a false dichotomy, I need to show that it’s a putative dichotomy. There’s little value in attacking straw men. Since I’ve just read (Gaus 2016), we’ll examine that in detail and expect that it’s representative of the larger discussion.

The boundary that Gaus draws is between worlds in the ‘neighborhood’ of the status quo and those outside it. If we restrict our attention to worlds in the neighborhood, we’re engaging in non-ideal theory, but if we speculate on distant worlds we’re doing ideal theory. What is this key neighborhood concept? In Gaus’s words: “A neighborhood delimits a set of nearby social worlds characterized by relatively similar justice-relevant social structures.”

So we’re already on firm grounds for a claim of dichotomous thinking. On this view, the structure of the problem is dichotomous1. But Gaus also demonstrates the dichotomous view when describing the divergent implications of the ideal and non-ideal view:

[L]ocal optimization often points in a different direction than pursuit of the ideal. We then confront what I have called The Choice: should we turn our back on local optimization and move toward the ideal? [… O]ur judgments within our neighborhood have better warrant than judgments outside of it; if the ideal is outside our current neighborhood, then we are forgoing relatively clear gains in justice for an uncertain prospect that our realistic utopia lies in a different direction. Mill’s revolutionaries2, certain of their own wisdom and judgment, were more than willing to commit society to the pursuit of their vision of the ideal; their hubris had terrible costs for many.

Similarities

Now, I’ll hope you agree ideal and non-ideal theory are framed as incompatible. On that assumption, I’ll begin to argue against the dichotomy.

Uncertainty all around

I do accept Gaus’s Neigborhood Constraint—our knowledge of distant social words is much less reliable than our knowledge of worlds similar to the status quo. Furthermore, I think we have non-trivial uncertainties about the workings and justice of worlds that are nearby. Importantly, (though not, I think, crucially) I don’t see any obvious reason for discontinuities in the reliability of our knowledge. My intuition suggests it drops off smoothly with distance from the status quo .

