SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models
📰 ArXiv cs.AI
arXiv:2604.11530v1 Announce Type: cross Abstract: Vision-Language Models (VLM) have revolutionized multimodal learning by jointly processing visual and textual information. Yet, they face significant challenges due to the high computational and memory demands of processing long sequences of vision tokens. Many existing methods rely on local heuristics, such as attention scores or token norms. However, these criteria suffer from positional bias and information dispersion, limiting their ability t
DeepCamp AI