Adapting 2D Multi-Modal Large Language Model for 3D CT Image Analysis

📰 ArXiv cs.AI

arXiv:2604.10233v1 Announce Type: cross Abstract: 3D medical image analysis is of great importance in disease diagnosis and treatment. Recently, multimodal large language models (MLLMs) have exhibited robust perceptual capacity, strong cross-modal alignment, and promising generalizability. Therefore, they have great potential to improve the performance of medical report generation (MRG) and medical visual question answering (MVQA), which serve as two important tasks in clinical scenarios. Howeve

Published 14 Apr 2026

Read full paper → ← Back to Reads