Ran Score: a LLM-based Evaluation Score for Radiology Report Generation

📰 ArXiv cs.AI

Ran Score is a LLM-based evaluation score for radiology report generation, combining human expertise and large language models for finding extraction and report evaluation

advanced Published 25 Mar 2026

Action Steps

Develop a clinician-guided framework for multi-label finding extraction from free-text chest X-ray reports
Combine human expertise with large language models to improve recognition of low-prevalence abnormalities and handling of clinically important language
Define a finding-level metric, Ran Score, for report evaluation
Use Ran Score to evaluate and refine radiology report generation models

Who Needs to Know This

Radiologists and AI engineers on a team benefit from Ran Score as it improves the accuracy of radiology report generation and evaluation, enabling more effective collaboration between clinicians and AI systems

Key Insight

💡 Combining human expertise with large language models can improve the accuracy and effectiveness of radiology report generation and evaluation