Continuous batching from first principles

📰 Hugging Face Blog

Continuous batching optimizes LLM throughput by deriving from attention mechanisms and KV caching

advanced Published 25 Nov 2025
Action Steps
  1. Understand attention mechanisms in LLMs
  2. Learn about KV caching and its role in optimizing LLM performance
  3. Derive continuous batching by optimizing for throughput
  4. Apply continuous batching to improve the efficiency of LLM models
Who Needs to Know This

Machine learning engineers and researchers can benefit from understanding continuous batching to improve the efficiency of their LLM models, while software engineers can apply this knowledge to optimize the deployment of these models

Key Insight

💡 Continuous batching is derived from attention mechanisms and KV caching to optimize LLM throughput

Share This
🤖 Continuous batching optimizes LLM throughput!

Key Takeaways

Continuous batching optimizes LLM throughput by deriving from attention mechanisms and KV caching

Full Article

# Continuous batching from first principles

[![Image 1: Hugging Face's logo](https://huggingface.co/front/assets/huggingface_logo-noborder.svg)Hugging Face](https://huggingface.co/)

* [Models](https://huggingface.co/models)
* [Datasets](https://huggingface.co/datasets)
* [Spaces](https://huggingface.co/spaces)
* [Buckets new](https://huggingface.co/storage)
* [Docs](https://huggingface.co/docs)
* [Enterprise](https://huggingface.co/enterprise)
* [Pricing](https://huggingface.co/pricing)
*
*
* * *

* [Log In](https://huggingface.co/login)
* [Sign Up](https://huggingface.co/join)

[Back to Articles](https://huggingface.co/blog)

# [](https://huggingface.co/blog/continuous_batching#continuous-batching) Continuous batching

Published November 25, 2025

[Update on GitHub](https://github.com/huggingface/blog/blob/main/continuous_batching.md)

[- [x] Upvote 351](https://huggingface.co/login?next=%2Fblog%2Fcontinuous_batching)
* [![Image 2](https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg)](https://huggingface.co/julien-c "julien-c")
* [![Image 3](https://cdn-avatars.huggingface.co/v1/production/uploads/5e3aec01f55e2b62848a5217/PMKS0NNB4MJQlTSFzh918.jpeg)](https://huggingface.co/lysandre "lysandre")
* [![Image 4](https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/4uCXGGui-9QifAT4qelxU.png)](https://huggingface.co/lvwerra "lvwerra")
* [![Image 5](https://huggingface.co/avatars/46a4b6a1be7afa80920a2fcb6f6bd1c3.svg)](https://huggingface.co/wooihen "wooihen")
* [![Image 6](https://cdn-avatars.huggingface.co/v1/production/uploads/1594214747713-5e9ecfc04957053f60648a3e.png)](https://huggingface.co/lhoestq "lhoestq")
* [![Image 7](https://cdn-avatars.huggingface.co/v1/production/uploads/5f0988ad19cb630495b8147a/W9PMu6cURwe_RkwovKjdR.jpeg)](https://huggingface.co/ucalyptus "ucalyptus")
* +345

[![Image 8: Rémi Ouazan Reboul's avatar](https://cdn-avatars.huggingface.co/v1/production/uploads/6123945a0ed258ebc83f3d56/8wMHFQHEV24G_ljl4kPxQ.jpeg)](https://huggingface.co/ror)

[Rémi Ouazan Reboul ror Follow](https://huggingface.co/ror)

[![Image 9: Arthur Zucker's avatar](https://cdn-avatars.huggingface.co/v1/production/uploads/1674683851722-62441cb7456803e95009a08f.jpeg)](https://huggingface.co/ArthurZ)

[Arthur Zucker ArthurZ Follow](https://huggingface.co/ArthurZ)

[![Image 10: Luc Georges's avatar](https://cdn-avatars.huggingface.co/v1/production/uploads/1666977434736-617bc8d1000dbbbf7c225eed.png)](https://huggingface.co/mcpotato)

[Luc Georges mcpotato Follow](https://huggingface.co/mcpotato)

* [Attention](https://huggingface.co/blog/continuous_batching#attention "Attention")

* [KV-cache](https://huggingface.co/blog/continuous_batching#kv-cache "KV-cache")

* [Chunked prefill](https://huggingface.co/blog/continuous_batching#chunked-prefill "Chunked prefill")

* [Continuous batching](https://huggingface.co/blog/continuous_batching#continuous-batching-1 "Continuous batching")

* [Conclusion](https://huggingface.co/blog/continuous_batching#conclusion "Conclusion")

[![Image 11: Title card](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/continuous_batching/banner.png)](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/continuous_batching/banner.png)
_TL;DR: in this blog post, starting from attention mechanisms and KV caching, we derive continuous batching by optimizing for throughput._

If you've ever used Qwen, Claude, or any other AI chatbot, you've probably noticed something: it takes a while for the first word of the response to appear, and then words appear one-by-one on your screen with (hopefully) a regular and fast-paced frequency. That's because at the heart of it, all LLMs are just fancy next token predictors. An LLM first processes your entire prompt to produce one new token. Then it keeps adding tokens one by one, each time re
Read full article → ← Back to Reads