ClawSafety: "Safe" LLMs, Unsafe Agents

📰 ArXiv cs.AI

arXiv:2604.01438v2 Announce Type: replace Abstract: Personal AI agents like OpenClaw run with elevated privileges on users' local machines, where a single successful prompt injection can leak credentials, redirect financial transactions, or destroy files. This threat goes well beyond conventional text-level jailbreaks, yet existing safety evaluations fall short: most test models in isolated chat settings, rely on synthetic environments, and do not account for how the agent framework itself shape

Published 7 Apr 2026
Read full paper → ← Back to News