Many-Tier Instruction Hierarchy in LLM Agents

📰 ArXiv cs.AI

arXiv:2604.09443v1 Announce Type: cross Abstract: Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The dominant paradigm, instruction hierarchy (IH), assumes a fixed, small set of privilege levels (typically fewer than five) defined by rigid role

Published 13 Apr 2026

Read full paper → ← Back to Reads