Zamba2-VL Technical Report

📰 ArXiv cs.AI

arXiv:2606.00390v1 Announce Type: cross Abstract: We present Zamba2-VL, a suite of vision-language models built on Zamba2, a hybrid language-model architecture combining Mamba2 state-space layers with a small number of shared transformer blocks. Across a broad range of image understanding, reasoning, OCR, grounding, and counting benchmarks, Zamba2-VL is competitive with leading Transformer-based open-weight VLMs of comparable scale, including the Molmo2, Qwen3-VL, and InternVL3.5 families, and s

Published 2 Jun 2026

Read full paper → ← Back to Reads