Discussion about this post

User's avatar
Lydia Nottingham's avatar

curious about performance on ‘reasoning questions that might be in the training set’ vs ‘generalization to reasoning questions we’re pretty sure aren’t in the training set’—might go through these papers looking for how training-set-heavy their questions were later

Expand full comment
Celeste 🌱's avatar

https://arxiv.org/pdf/2510.14901

shocked at the omission of this paper

Expand full comment

No posts

Ready for more?