Real-Time Voicemail Detection in Telephony Audio Using Temporal Speech Activity Features

📰 ArXiv cs.AI

arXiv:2604.09675v1 Announce Type: cross Abstract: Outbound AI calling systems must distinguish voicemail greetings from live human answers in real time to avoid wasted agent interactions and dropped calls. We present a lightweight approach that extracts 15 temporal features from the speech activity pattern of a pre-trained neural voice activity detector (VAD), then classifies with a shallow tree-based ensemble. Across two evaluation sets totaling 764 telephony recordings, the system achieves a c

Published 14 Apr 2026
Read full paper → ← Back to Reads