Detecting Distillation Data from Reasoning Models
📰 ArXiv cs.AI
arXiv:2510.04850v3 Announce Type: replace-cross Abstract: Reasoning distillation has emerged as a prevailing paradigm for transferring reasoning capabilities from large reasoning models to small language models. Yet, reasoning distillation risks data contamination: benchmark data may inadvertently be included in the distillation data, thereby inflating model performance metrics. In this work, we formally define the distillation data detection task, which determines whether a given question is in
DeepCamp AI