We Poisoned an LLM’s Training Data. Here’s What Broke (and What Didn’t).
📰 Medium · Machine Learning
Learn how corrupting 25% of human feedback can compromise an LLM's safety and what happens when 100% of the data is poisoned
Action Steps
- Corrupt 25% of human feedback data to test an LLM's robustness
- Poison 100% of the training data to observe the model's behavior
- Analyze the results to identify potential vulnerabilities in the model's safety guardrails
- Implement data validation and verification techniques to prevent data poisoning
- Test the model's performance on a separate, clean dataset to evaluate its robustness
Who Needs to Know This
ML engineers and researchers can benefit from understanding the vulnerabilities of LLMs to data poisoning, while data scientists and product managers should be aware of the potential risks to AI model safety
Key Insight
💡 Data poisoning can have significant effects on LLM safety, even with partial corruption
Share This
🚨 Corrupting just 25% of human feedback can silently strip an LLM's safety guardrails! 🤖
DeepCamp AI