PLaMo 2.1-VL Technical Report
📰 ArXiv cs.AI
arXiv:2604.19324v1 Announce Type: cross Abstract: We introduce PLaMo 2.1-VL, a lightweight Vision Language Model (VLM) for autonomous devices, available in 8B and 2B variants and designed for local and edge deployment with Japanese-language operation. Focusing on Visual Question Answering (VQA) and Visual Grounding as its core capabilities, we develop and evaluate the models for two real-world application scenarios: factory task analysis via tool recognition, and infrastructure anomaly detection
DeepCamp AI