Red Teaming Large Reasoning Models

📰 ArXiv cs.AI

arXiv:2512.00412v4 Announce Type: replace-cross Abstract: Large Reasoning Models (LRMs) have emerged as a powerful advancement in multi-step reasoning tasks, offering enhanced transparency and logical consistency through explicit chains of thought (CoT). However, these models introduce novel safety and reliability risks, such as CoT-hijacking and prompt-induced inefficiencies, which are not fully captured by existing evaluation methods. To address this gap, we propose RT-LRM, a unified benchmark

Published 15 Apr 2026
Read full paper → ← Back to Reads