Hacker News

Debugging misaligned completions with sparse-autoencoder latent attribution