Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning

📰 ArXiv cs.AI

Researchers introduce ThoughtSteer, a backdoor attack on continuous latent reasoning language models that perturbs a single embedding vector to hijack the model's latent trajectory

advanced Published 2 Apr 2026

Action Steps

Identify the input layer embedding vector to perturb
Perturb the embedding vector using ThoughtSteer
Amplify the perturbation through multi-pass reasoning
Hijack the latent trajectory to produce the attacker's chosen answer

Who Needs to Know This

AI researchers and engineers working on language models and security can benefit from understanding this new attack surface to develop more robust models, while security teams can use this knowledge to identify and mitigate potential threats

Key Insight

💡 Continuous latent reasoning language models are vulnerable to backdoor attacks that can hijack their latent trajectory without leaving an audit trail