Adversarial Concept Search: Predicting Compositional Errors From Feature Geometry

📰 ArXiv cs.AI

Predict compositional errors in LLMs using feature geometry to identify challenging scenarios

advanced Published 15 Jun 2026

Action Steps

Use an LLM's representational geometry to predict compositional failures
Analyze feature geometry to identify potential interference between concepts
Apply adversarial concept search to generate challenging scenarios
Evaluate LLM performance on predicted failure scenarios
Refine LLM training data to mitigate compositional errors

Who Needs to Know This

ML researchers and engineers can use this technique to improve LLM robustness and identify potential failures, while data scientists can apply it to analyze and mitigate compositional errors

Key Insight

💡 Compositional errors in LLMs can be predicted by analyzing feature geometry, enabling targeted improvements

Full Article

Title: Adversarial Concept Search: Predicting Compositional Errors From Feature Geometry

Abstract:
arXiv:2606.13934v1 Announce Type: new Abstract: Humans cannot always intuit what scenarios are most challenging to LLMs. Hoping to capture challenging edge cases, developers either design problems to be difficult for humans or curate extensive benchmarks. What if we could instead anticipate which scenarios a model will fail on? In this paper, we use an LLM's representational geometry to predict which concept combinations it will fail on. We attribute this compositional failure to interference be

Read full paper → ← Back to Reads