The Phish, The Spam, and The Valid: Generating Feature-Rich Emails for Benchmarking LLMs

📰 ArXiv cs.AI

Researchers introduce PhishFuzzer, a framework for generating feature-rich emails to benchmark LLMs, producing 23,100 diverse email variants with strict three-class labels

advanced Published 23 Mar 2026

Action Steps

Seed real emails into LLMs using PhishFuzzer
Generate diverse email variants with controlled entity and length dimensions
Annotate emails with strict three-class labels (Phishing, Spam, Valid) and attacker intent
Utilize the dataset for benchmarking and training LLMs

Who Needs to Know This

AI engineers and researchers on a team benefit from this framework as it provides a comprehensive dataset for benchmarking LLMs, while data scientists can utilize the dataset for training and testing models

Key Insight

💡 PhishFuzzer provides a comprehensive dataset for benchmarking LLMs with strict three-class labels and full URL and attachment metadata