I Can't Believe TTA Is Not Better: When Test-Time Augmentation Hurts Medical Image Classification

📰 ArXiv cs.AI

arXiv:2604.09697v1 Announce Type: cross Abstract: Test-time augmentation (TTA)--aggregating predictions over multiple augmented copies of a test input--is widely assumed to improve classification accuracy, particularly in medical imaging where it is routinely deployed in production systems and competition solutions. We present a systematic empirical study challenging this assumption across three MedMNIST v2 benchmarks and four architectures spanning three orders of magnitude in parameter count (

Published 14 Apr 2026

Read full paper → ← Back to Reads