Hacker News: Debugging misaligned completions with sparse-autoencoder latent attribution

Debugging misaligned completions with sparse-autoencoder latent attribution

1 points • gmays • about 21 hours ago • 0 comments