Is Evaluation Awareness Just Format Sensitivity? Limitations of Probe-Based Evidence under Controlled Prompt Structure

📰 ArXiv cs.AI

Research questions whether evaluation awareness in large language models is just format sensitivity, finding probes track benchmark format rather than context

advanced Published 23 Mar 2026
Action Steps
  1. Design a controlled 2x2 dataset to test the sensitivity of probes to prompt format
  2. Use diagnostic rewrites to isolate the effect of prompt format on probe-based signals
  3. Analyze the results to determine whether probes track context or surface structure
  4. Consider the implications of the findings for the evaluation of large language models
Who Needs to Know This

ML researchers and AI engineers benefit from understanding the limitations of probe-based evidence in evaluating large language models, as it informs the design of more robust evaluation methods

Key Insight

💡 Probe-based evidence for evaluation awareness in large language models may be limited by format sensitivity

Share This
🤖 Probes in large language models may just be tracking format, not context 📊
Read full paper → ← Back to News