Stable Audio 3

📰 ArXiv cs.AI

arXiv:2605.17991v1 Announce Type: cross Abstract: Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing full-length generations for short sounds. We also support inpainting, enabling targeted audio editing and the continuation of short recordings. Our latent diffusion models operate on top of a nov

Published 19 May 2026

Read full paper → ← Back to Reads