Hacker News

Benchmark that evaluates LLMs using 759 NYT Connections puzzles