CoLoRSMamba: Conditional LoRA-Steered Mamba for Supervised Multimodal Violence Detection
📰 ArXiv cs.AI
arXiv:2604.03329v1 Announce Type: cross Abstract: Violence detection benefits from audio, but real-world soundscapes can be noisy or weakly related to the visible scene. We present CoLoRSMamba, a directional Video to Audio multimodal architecture that couples VideoMamba and AudioMamba through CLS-guided conditional LoRA. At each layer, the VideoMamba CLS token produces a channel-wise modulation vector and a stabilization gate that adapt the AudioMamba projections responsible for the selective st
DeepCamp AI