Optimization story: Bloom inference
📰 Hugging Face Blog
Optimizing Bloom inference with PyTorch, TPUs, and custom kernels
Action Steps
- Porting code to JAX/Flax for TPU compatibility
- Exploring compiled approaches like ONNX/TRT
- Using DeepSpeed for optimization
- Writing custom PyTorch kernels for efficiency
Who Needs to Know This
AI engineers and researchers can benefit from this story to optimize their own models, while product managers can understand the trade-offs involved in deploying large language models
Key Insight
💡 Combining PyTorch with TPUs and custom kernels can lead to significant performance gains
Share This
💡 Optimizing Bloom inference with PyTorch, TPUs, and custom kernels
Key Takeaways
Optimizing Bloom inference with PyTorch, TPUs, and custom kernels
Full Article
Published Time: 2022-10-12T00:00:00.135Z
# Optimization story: Bloom inference
[Hugging Face](https://huggingface.co/)
* [Models](https://huggingface.co/models)
* [Datasets](https://huggingface.co/datasets)
* [Spaces](https://huggingface.co/spaces)
* [Buckets new](https://huggingface.co/storage)
* [Docs](https://huggingface.co/docs)
* [Enterprise](https://huggingface.co/enterprise)
* [Pricing](https://huggingface.co/pricing)
*
*
* * *
* [Log In](https://huggingface.co/login)
* [Sign Up](https://huggingface.co/join)
[Back to Articles](https://huggingface.co/blog)
# [](https://huggingface.co/blog/bloom-inference-optimization#optimization-story-bloom-inference) Optimization story: Bloom inference
Published October 12, 2022
[Update on GitHub](https://github.com/huggingface/blog/blob/main/bloom-inference-optimization.md)
[- [x] Upvote 8](https://huggingface.co/login?next=%2Fblog%2Fbloom-inference-optimization)
* [](https://huggingface.co/ariG23498 "ariG23498")
* [](https://huggingface.co/chnug "chnug")
* [](https://huggingface.co/duongnghia222 "duongnghia222")
* [](https://huggingface.co/manix29 "manix29")
* [](https://huggingface.co/bhujith10 "bhujith10")
* [](https://huggingface.co/hoangnt2601vn "hoangnt2601vn")
* +2
[](https://huggingface.co/Narsil)
[Nicolas Patry Narsil Follow](https://huggingface.co/Narsil)
* [Creating BLOOM](https://huggingface.co/blog/bloom-inference-optimization#creating-bloom "Creating BLOOM")
* [Porting to transformers](https://huggingface.co/blog/bloom-inference-optimization#porting-to-transformers "Porting to transformers")
* [First inference (PP + Accelerate)](https://huggingface.co/blog/bloom-inference-optimization#first-inference-pp--accelerate "First inference (PP + Accelerate)")
* [Starting point](https://huggingface.co/blog/bloom-inference-optimization#starting-point "Starting point")
* [Exploring many routes](https://huggingface.co/blog/bloom-inference-optimization#exploring-many-routes "Exploring many routes")
* [Porting the code the JAX/Flax to run on TPUs:](https://huggingface.co/blog/bloom-inference-optimization#porting-the-code-the-jaxflax-to-run-on-tpus "Porting the code the JAX/Flax to run on TPUs:")
* [Using ONNX/TRT or other compiled approaches](https://huggingface.co/blog/bloom-inference-optimization#using-onnxtrt-or-other-compiled-approaches "Using ONNX/TRT or other compiled approaches")
* [DeepSpeed](https://huggingface.co/blog/bloom-inference-optimization#deepspeed "DeepSpeed")
* [Webserver ideas](https://huggingface.co/blog/bloom-inference-optimization#webserver-ideas "Webserver ideas")
* [Pure PyTorch](https://huggingface.co/blog/bloom-inference-optimization#pure-pytorch "Pure PyTorch")
* [Final route: PyTorch + TP + 1 custom kernel + torch.jit.script](https://huggingface.co/blog/bloom-inference-optimization#final-route-pytorch--tp--1-custom-kernel--torchjitscript "Final route: PyTorch + TP + 1 custom kernel + torch.jit.script")
* [Writing more efficient PyTorch](https://huggingface.co/blog/bloom-inference-optimization#writing-more-efficient-pytorch "Writing more efficient PyTorch")
* [Supporting TP](https://huggingface.co/blog/bloom-i
# Optimization story: Bloom inference
[Hugging Face](https://huggingface.co/)
* [Models](https://huggingface.co/models)
* [Datasets](https://huggingface.co/datasets)
* [Spaces](https://huggingface.co/spaces)
* [Buckets new](https://huggingface.co/storage)
* [Docs](https://huggingface.co/docs)
* [Enterprise](https://huggingface.co/enterprise)
* [Pricing](https://huggingface.co/pricing)
*
*
* * *
* [Log In](https://huggingface.co/login)
* [Sign Up](https://huggingface.co/join)
[Back to Articles](https://huggingface.co/blog)
# [](https://huggingface.co/blog/bloom-inference-optimization#optimization-story-bloom-inference) Optimization story: Bloom inference
Published October 12, 2022
[Update on GitHub](https://github.com/huggingface/blog/blob/main/bloom-inference-optimization.md)
[- [x] Upvote 8](https://huggingface.co/login?next=%2Fblog%2Fbloom-inference-optimization)
* [](https://huggingface.co/ariG23498 "ariG23498")
* [](https://huggingface.co/chnug "chnug")
* [](https://huggingface.co/duongnghia222 "duongnghia222")
* [](https://huggingface.co/manix29 "manix29")
* [](https://huggingface.co/bhujith10 "bhujith10")
* [](https://huggingface.co/hoangnt2601vn "hoangnt2601vn")
* +2
[](https://huggingface.co/Narsil)
[Nicolas Patry Narsil Follow](https://huggingface.co/Narsil)
* [Creating BLOOM](https://huggingface.co/blog/bloom-inference-optimization#creating-bloom "Creating BLOOM")
* [Porting to transformers](https://huggingface.co/blog/bloom-inference-optimization#porting-to-transformers "Porting to transformers")
* [First inference (PP + Accelerate)](https://huggingface.co/blog/bloom-inference-optimization#first-inference-pp--accelerate "First inference (PP + Accelerate)")
* [Starting point](https://huggingface.co/blog/bloom-inference-optimization#starting-point "Starting point")
* [Exploring many routes](https://huggingface.co/blog/bloom-inference-optimization#exploring-many-routes "Exploring many routes")
* [Porting the code the JAX/Flax to run on TPUs:](https://huggingface.co/blog/bloom-inference-optimization#porting-the-code-the-jaxflax-to-run-on-tpus "Porting the code the JAX/Flax to run on TPUs:")
* [Using ONNX/TRT or other compiled approaches](https://huggingface.co/blog/bloom-inference-optimization#using-onnxtrt-or-other-compiled-approaches "Using ONNX/TRT or other compiled approaches")
* [DeepSpeed](https://huggingface.co/blog/bloom-inference-optimization#deepspeed "DeepSpeed")
* [Webserver ideas](https://huggingface.co/blog/bloom-inference-optimization#webserver-ideas "Webserver ideas")
* [Pure PyTorch](https://huggingface.co/blog/bloom-inference-optimization#pure-pytorch "Pure PyTorch")
* [Final route: PyTorch + TP + 1 custom kernel + torch.jit.script](https://huggingface.co/blog/bloom-inference-optimization#final-route-pytorch--tp--1-custom-kernel--torchjitscript "Final route: PyTorch + TP + 1 custom kernel + torch.jit.script")
* [Writing more efficient PyTorch](https://huggingface.co/blog/bloom-inference-optimization#writing-more-efficient-pytorch "Writing more efficient PyTorch")
* [Supporting TP](https://huggingface.co/blog/bloom-i
DeepCamp AI