Tagged machine learning

Taking social choice seriously: An alternative approach to reward modeling in RLHF
Unconditional conditioning: Removing sleeper agent behavior in a toy model
The Typed Transformer: Intro and architecture
Progress and preservation in IDA

Collectively Exhaustive

A weblog