Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints

📰 AWS Machine Learning

Today, Amazon SageMaker AI introduces capacity aware instance pool for new and existing inference endpoints. You define a prioritized list of instance types, and SageMaker AI automatically works through your list whenever capacity is constrained at creation, during scale-out, and during scale-in. Your endpoint provisions on available AI Infrastructure without manual intervention. This capability is available for Single Model Endpoints, Inference Component-based endpoints, and Asynchronous Infere

Published 4 May 2026

Read full article → ← Back to Reads