Tiny weight edits improve LLM safety
📰 Dev.to · Papers Mache
Targeted tweaks to specific attention heads can slash jailbreak success rates by several‑fold (e.g.,...
Targeted tweaks to specific attention heads can slash jailbreak success rates by several‑fold (e.g.,...