CoLoRSMamba: Conditional LoRA-Steered Mamba for Supervised Multimodal Violence Detection

📰 ArXiv cs.AI

CoLoRSMamba is a multimodal architecture for violence detection that combines video and audio modalities using conditional LoRA-steered Mamba

advanced Published 7 Apr 2026

Action Steps

Combine video and audio modalities using a directional Video to Audio architecture
Use CLS-guided conditional LoRA to adapt AudioMamba projections
Implement channel-wise modulation vectors and stabilization gates to selectively focus on relevant audio features
Evaluate the performance of CoLoRSMamba on supervised multimodal violence detection tasks

Who Needs to Know This

This research benefits AI engineers and researchers working on multimodal violence detection, as it provides a novel approach to combining video and audio modalities for improved detection accuracy. The team can apply this architecture to develop more effective violence detection systems.

Key Insight

💡 Conditional LoRA-steered Mamba can effectively combine video and audio modalities for improved violence detection accuracy