Moral uncertainty is inevitable. We accept this and encode our uncertainty as a weighted parliament of moral theories in an agent. We allow the agent to perform ethical investigation to refine these weights. Value of information calculations determine when to investigate. We claim several benefits arise from this approach.
According to most ethical theories, the actions of any AI agent we create have moral import. To accommodate this reality, we guide the agent with a utility function and (implicitly or explicitly) an ethical framework.
One of the key problems of AI alignment is that we are uncertain about which ethical theory to encode in the agent because we (philosophers, humans, society) are ourselves unsure of the correct ethical theory. How can we expect our agent to act in accordance with our values when we don’t even know what our values are?
I propose that we wave the white flag of surrender in the battle to find final, certain answers to the hard problems of ethics. Instead, we should reify our uncertainty and our search procedures in agents we build.
Full post