Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals

📰 ArXiv cs.AI

Squish and Release exposes hidden hallucinations in language models by making them surface as safety signals

advanced Published 31 Mar 2026

Action Steps

Identify the order-gap hallucination issue in language models
Implement the Squish and Release architecture to expose hidden hallucinations
Analyze the activation space of the safety circuit to detect suppressed errors
Patch the activations to make the errors surface as safety signals

Who Needs to Know This

ML researchers and engineers benefit from this technique as it helps identify and mitigate errors in language models, ensuring more reliable and trustworthy outputs

Key Insight

💡 The Squish and Release technique can help detect and mitigate errors in language models that are otherwise invisible to output inspection