Detecting Distillation Data from Reasoning Models

📰 ArXiv cs.AI

arXiv:2510.04850v3 Announce Type: replace-cross Abstract: Reasoning distillation has emerged as a prevailing paradigm for transferring reasoning capabilities from large reasoning models to small language models. Yet, reasoning distillation risks data contamination: benchmark data may inadvertently be included in the distillation data, thereby inflating model performance metrics. In this work, we formally define the distillation data detection task, which determines whether a given question is in

Published 11 May 2026
Read full paper → ← Back to Reads