Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization [R]

📰 Reddit r/MachineLearning

Paper: https://arxiv.org/abs/2603.21676 I found this interesting as another iteration of the TRM approach: Shows decent OOD generalization in 2/3 tasks (but why does this fail >2x? and why is unstructured text so much worse?) Explains why intermediate step supervision can hurt generalization. This makes

Published 13 Apr 2026