Optimization story: Bloom inference

📰 Hugging Face Blog

Optimizing Bloom inference with PyTorch, TPUs, and custom kernels

advanced Published 12 Oct 2022
Action Steps
  1. Porting code to JAX/Flax for TPU compatibility
  2. Exploring compiled approaches like ONNX/TRT
  3. Using DeepSpeed for optimization
  4. Writing custom PyTorch kernels for efficiency
Who Needs to Know This

AI engineers and researchers can benefit from this story to optimize their own models, while product managers can understand the trade-offs involved in deploying large language models

Key Insight

💡 Combining PyTorch with TPUs and custom kernels can lead to significant performance gains

Share This
💡 Optimizing Bloom inference with PyTorch, TPUs, and custom kernels

Key Takeaways

Optimizing Bloom inference with PyTorch, TPUs, and custom kernels

Full Article

Published Time: 2022-10-12T00:00:00.135Z

# Optimization story: Bloom inference

[![Image 1: Hugging Face's logo](https://huggingface.co/front/assets/huggingface_logo-noborder.svg)Hugging Face](https://huggingface.co/)

* [Models](https://huggingface.co/models)
* [Datasets](https://huggingface.co/datasets)
* [Spaces](https://huggingface.co/spaces)
* [Buckets new](https://huggingface.co/storage)
* [Docs](https://huggingface.co/docs)
* [Enterprise](https://huggingface.co/enterprise)
* [Pricing](https://huggingface.co/pricing)
*
*
* * *

* [Log In](https://huggingface.co/login)
* [Sign Up](https://huggingface.co/join)

[Back to Articles](https://huggingface.co/blog)

# [](https://huggingface.co/blog/bloom-inference-optimization#optimization-story-bloom-inference) Optimization story: Bloom inference

Published October 12, 2022

[Update on GitHub](https://github.com/huggingface/blog/blob/main/bloom-inference-optimization.md)

[- [x] Upvote 8](https://huggingface.co/login?next=%2Fblog%2Fbloom-inference-optimization)
* [![Image 2](https://cdn-avatars.huggingface.co/v1/production/uploads/608aabf24955d2bfc3cd99c6/-YxmtpzEmf3NKOTktODRP.jpeg)](https://huggingface.co/ariG23498 "ariG23498")
* [![Image 3](https://huggingface.co/avatars/aa63dc7f9d0d95ed93baa0adadb5539a.svg)](https://huggingface.co/chnug "chnug")
* [![Image 4](https://huggingface.co/avatars/a42f8e50347fd3994adc800705bef2c4.svg)](https://huggingface.co/duongnghia222 "duongnghia222")
* [![Image 5](https://huggingface.co/avatars/4ce1f1cb87dca260deae0df8b14d24f1.svg)](https://huggingface.co/manix29 "manix29")
* [![Image 6](https://huggingface.co/avatars/e13e108bd741e54d68e3bce890f2ff8e.svg)](https://huggingface.co/bhujith10 "bhujith10")
* [![Image 7](https://huggingface.co/avatars/21de178b973a059f3b85e0e3ddb45acb.svg)](https://huggingface.co/hoangnt2601vn "hoangnt2601vn")
* +2

[![Image 8: Nicolas Patry's avatar](https://cdn-avatars.huggingface.co/v1/production/uploads/1608285816082-5e2967b819407e3277369b95.png)](https://huggingface.co/Narsil)

[Nicolas Patry Narsil Follow](https://huggingface.co/Narsil)

* [Creating BLOOM](https://huggingface.co/blog/bloom-inference-optimization#creating-bloom "Creating BLOOM")

* [Porting to transformers](https://huggingface.co/blog/bloom-inference-optimization#porting-to-transformers "Porting to transformers")

* [First inference (PP + Accelerate)](https://huggingface.co/blog/bloom-inference-optimization#first-inference-pp--accelerate "First inference (PP + Accelerate)")

* [Starting point](https://huggingface.co/blog/bloom-inference-optimization#starting-point "Starting point")

* [Exploring many routes](https://huggingface.co/blog/bloom-inference-optimization#exploring-many-routes "Exploring many routes")
* [Porting the code the JAX/Flax to run on TPUs:](https://huggingface.co/blog/bloom-inference-optimization#porting-the-code-the-jaxflax-to-run-on-tpus "Porting the code the JAX/Flax to run on TPUs:")

* [Using ONNX/TRT or other compiled approaches](https://huggingface.co/blog/bloom-inference-optimization#using-onnxtrt-or-other-compiled-approaches "Using ONNX/TRT or other compiled approaches")

* [DeepSpeed](https://huggingface.co/blog/bloom-inference-optimization#deepspeed "DeepSpeed")

* [Webserver ideas](https://huggingface.co/blog/bloom-inference-optimization#webserver-ideas "Webserver ideas")

* [Pure PyTorch](https://huggingface.co/blog/bloom-inference-optimization#pure-pytorch "Pure PyTorch")

* [Final route: PyTorch + TP + 1 custom kernel + torch.jit.script](https://huggingface.co/blog/bloom-inference-optimization#final-route-pytorch--tp--1-custom-kernel--torchjitscript "Final route: PyTorch + TP + 1 custom kernel + torch.jit.script")
* [Writing more efficient PyTorch](https://huggingface.co/blog/bloom-inference-optimization#writing-more-efficient-pytorch "Writing more efficient PyTorch")

* [Supporting TP](https://huggingface.co/blog/bloom-i
Read full article → ← Back to Reads