HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection
📰 ArXiv cs.AI
HatePrototypes detects implicit and explicit hate speech using interpretable and transferable representations
Action Steps
- Identify existing hate speech benchmarks and their limitations
- Develop new representations that capture implicit and indirect hate
- Fine-tune models using these representations to improve detection accuracy
- Evaluate and refine the models using transfer learning and interpretability metrics
Who Needs to Know This
AI engineers and researchers on a team can benefit from this research to improve hate speech detection models, while product managers can apply these findings to enhance content moderation systems
Key Insight
💡 Implicit hate speech detection requires novel representations that go beyond existing benchmarks
Share This
🚨 Improve hate speech detection with HatePrototypes! 🚨
Key Takeaways
HatePrototypes detects implicit and explicit hate speech using interpretable and transferable representations
Full Article
Title: HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection
Abstract:
arXiv:2511.06391v3 Announce Type: replace-cross Abstract: Optimization of offensive content moderation models for different types of hateful messages is typically achieved through continued pre-training or fine-tuning on new hate speech benchmarks. However, existing benchmarks mainly address explicit hate toward protected groups and often overlook implicit or indirect hate, such as demeaning comparisons, calls for exclusion or violence, and subtle discriminatory language that still causes harm.
Abstract:
arXiv:2511.06391v3 Announce Type: replace-cross Abstract: Optimization of offensive content moderation models for different types of hateful messages is typically achieved through continued pre-training or fine-tuning on new hate speech benchmarks. However, existing benchmarks mainly address explicit hate toward protected groups and often overlook implicit or indirect hate, such as demeaning comparisons, calls for exclusion or violence, and subtle discriminatory language that still causes harm.
DeepCamp AI