SAVe: Self-Supervised Audio-visual Deepfake Detection Exploiting Visual Artifacts and Audio-visual Misalignment

📰 ArXiv cs.AI

SAVe is a self-supervised audio-visual deepfake detection framework that exploits visual artifacts and audio-visual misalignment to detect deepfakes

advanced Published 27 Mar 2026

Action Steps

Learn from authentic videos without relying on curated synthetic forgeries
Exploit visual artifacts and audio-visual misalignment for deepfake detection
Train a self-supervised model to detect inconsistencies between audio and visual modalities
Evaluate the model on unseen manipulations to test its scalability and robustness

Who Needs to Know This

AI engineers and researchers working on deepfake detection and multimodal analysis can benefit from SAVe, as it provides a robust and scalable solution for detecting subtle visual artifacts and cross-modal inconsistencies

Key Insight

💡 Self-supervised learning can be effective for deepfake detection, reducing dependence on curated synthetic forgeries and improving scalability and robustness