Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition

📰 ArXiv cs.AI

arXiv:2605.10772v1 Announce Type: cross Abstract: Large language-vision models (LLVM), such as OpenAI's ChatGPT and GPT-4, have gained prominence as powerful tools for analyzing text and imagery. The merging of these data domains represents a significant paradigm shift with far-reaching implications for automatic target recognition (ATR). Recent transformer-based LLVM research has shown substantial improvements for geospatial perception tasks. Our study examines the application of LLVM to remote

Published 12 May 2026
Read full paper → ← Back to Reads