Reverse-Engineering Model Editing on Language Models

📰 ArXiv cs.AI

arXiv:2602.10134v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are pretrained on corpora containing trillions of tokens and, therefore, inevitably memorize sensitive information. Locate-then-edit methods, as a mainstream paradigm of model editing, offer a promising solution by modifying model parameters without retraining. However, in this work, we reveal a critical vulnerability of this paradigm: the parameter updates inadvertently serve as a side channel, enabling attac

Published 19 May 2026
Read full paper → ← Back to Reads