Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

📰 ArXiv cs.AI

Finetuning large language models can activate verbatim recall of copyrighted books, bypassing safety alignment strategies

advanced Published 26 Mar 2026

Action Steps

Finetuning can reactivate verbatim recall of copyrighted books in large language models
Safety alignment strategies such as RLHF, system prompts, and output filters may not be effective against finetuning
Developers should re-evaluate their models' training data and finetuning procedures to prevent copyright infringement
Regulatory bodies should consider the implications of finetuning on copyright laws and regulations

Who Needs to Know This

AI engineers and researchers working on large language models need to be aware of this issue to ensure compliance with copyright laws and regulations, and to develop more effective safety measures

Key Insight

💡 Finetuning can compromise the safety and compliance of large language models with copyright laws