Faster TensorFlow models in Hugging Face Transformers

📰 Hugging Face Blog

Hugging Face improves TensorFlow models' computational performance and integrates with TensorFlow Serving for faster inference

intermediate Published 26 Jan 2021
Action Steps
  1. Improve computational performance of TensorFlow models like BERT and RoBERTa
  2. Use TensorFlow Serving to deploy models and benefit from computational performance gains
  3. Benchmark model performance using tools like GPU V100 and sequence length of 128
Who Needs to Know This

Machine learning engineers and data scientists can benefit from this improvement to deploy faster and more robust models, while developers can utilize TensorFlow Serving for efficient model deployment

Key Insight

💡 Hugging Face's improvements to TensorFlow models and integration with TensorFlow Serving enable faster and more robust model deployment

Share This
🚀 Faster #TensorFlow models in #HuggingFace Transformers! 🤖

Key Takeaways

Hugging Face improves TensorFlow models' computational performance and integrates with TensorFlow Serving for faster inference

Full Article

Published Time: 2021-01-26T00:00:00.010Z

# Faster TensorFlow models in Hugging Face Transformers

[![Image 1: Hugging Face's logo](https://huggingface.co/front/assets/huggingface_logo-noborder.svg)Hugging Face](https://huggingface.co/)

* [Models](https://huggingface.co/models)
* [Datasets](https://huggingface.co/datasets)
* [Spaces](https://huggingface.co/spaces)
* [Buckets new](https://huggingface.co/storage)
* [Docs](https://huggingface.co/docs)
* [Enterprise](https://huggingface.co/enterprise)
* [Pricing](https://huggingface.co/pricing)
*
*
* * *

* [Log In](https://huggingface.co/login)
* [Sign Up](https://huggingface.co/join)

[Back to Articles](https://huggingface.co/blog)

# [](https://huggingface.co/blog/tf-serving#faster-tensorflow-models-in-hugging-face-transformers) Faster TensorFlow models in Hugging Face Transformers

Published January 26, 2021

[Update on GitHub](https://github.com/huggingface/blog/blob/main/tf-serving.md)

[- [x] Upvote -](https://huggingface.co/login?next=%2Fblog%2Ftf-serving)

[![Image 2: Julien Plu's avatar](https://cdn-avatars.huggingface.co/v1/production/uploads/1584609257509-5df8987fda6d0311fd3d540d.jpeg)](https://huggingface.co/jplu)

[Julien Plu jplu Follow](https://huggingface.co/jplu)

[![Image 3: Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/10_tf_serving.ipynb)

* [Computational Performance](https://huggingface.co/blog/tf-serving#computational-performance "Computational Performance")

* [TensorFlow Serving](https://huggingface.co/blog/tf-serving#tensorflow-serving "TensorFlow Serving")
* [What is TensorFlow Serving?](https://huggingface.co/blog/tf-serving#what-is-tensorflow-serving "What is TensorFlow Serving?")

* [What is a SavedModel?](https://huggingface.co/blog/tf-serving#what-is-a-savedmodel "What is a SavedModel?")

* [How to install TensorFlow Serving?](https://huggingface.co/blog/tf-serving#how-to-install-tensorflow-serving "How to install TensorFlow Serving?")

* [How to create a SavedModel?](https://huggingface.co/blog/tf-serving#how-to-create-a-savedmodel "How to create a SavedModel?")

* [How to deploy and use a SavedModel?](https://huggingface.co/blog/tf-serving#how-to-deploy-and-use-a-savedmodel "How to deploy and use a SavedModel?")
* [Step 1](https://huggingface.co/blog/tf-serving#step-1 "Step 1")

* [Step 2](https://huggingface.co/blog/tf-serving#step-2 "Step 2")

* [Step 3](https://huggingface.co/blog/tf-serving#step-3 "Step 3")

* [Conclusion](https://huggingface.co/blog/tf-serving#conclusion "Conclusion")

In the last few months, the Hugging Face team has been working hard on improving Transformers’ TensorFlow models to make them more robust and faster. The recent improvements are mainly focused on two aspects:
1. Computational performance: BERT, RoBERTa, ELECTRA and MPNet have been improved in order to have a much faster computation time. This gain of computational performance is noticeable for all the computational aspects: graph/eager mode, TF Serving and for CPU/GPU/TPU devices.
2. TensorFlow Serving: each of these TensorFlow model can be deployed with TensorFlow Serving to benefit of this gain of computational performance for inference.

## [](https://huggingface.co/blog/tf-serving#computational-performance) Computational Performance

To demonstrate the computational performance improvements, we have done a thorough benchmark where we compare BERT's performance with TensorFlow Serving of v4.2.0 to the official implementation from [Google](https://github.com/tensorflow/models/tree/master/official/nlp/bert). The benchmark has been run on a GPU V100 using a sequence length of 128 (times are in millisecond):

| Batch size | Google implementation | v4.2.0 implementation | Relative difference Google/v4.2.0 implem |
| :---: | :---: | :---: | :---: |
| 1 | 6.7 | 6.26 | 6.79% |
| 2 |
Read full article → ← Back to Reads