PLaMo 2.1-VL Technical Report

📰 ArXiv cs.AI

arXiv:2604.19324v1 Announce Type: cross Abstract: We introduce PLaMo 2.1-VL, a lightweight Vision Language Model (VLM) for autonomous devices, available in 8B and 2B variants and designed for local and edge deployment with Japanese-language operation. Focusing on Visual Question Answering (VQA) and Visual Grounding as its core capabilities, we develop and evaluate the models for two real-world application scenarios: factory task analysis via tool recognition, and infrastructure anomaly detection

Published 22 Apr 2026

Read full paper → ← Back to Reads