Tagged machine learning

Taking social choice seriously: An alternative approach to reward modeling in RLHF

April 6, 2024

Unconditional conditioning: Removing sleeper agent behavior in a toy model

April 3, 2024

The Typed Transformer: Intro and architecture

April 1, 2024

Progress and preservation in IDA

December 3, 2019

Collectively Exhaustive

A weblog