Accelerated Inference with Optimum and Transformers Pipelines

📰 Hugging Face Blog

Accelerate inference with Optimum and Transformers pipelines for improved performance and speed

intermediate Published 10 May 2022

Action Steps

Install Optimum for Onnxruntime
Convert a Hugging Face Transformers model to ONNX for inference
Use the ORTOptimizer to optimize the model
Use the ORTQuantizer to apply dynamic quantization
Run accelerated inference using Transformers pipelines
Evaluate the performance and speed

Who Needs to Know This

Data scientists and machine learning engineers can benefit from this tutorial to optimize their models for faster inference, while software engineers can apply these techniques to improve the performance of their applications

Key Insight

💡 Optimum provides a range of tools to optimize and accelerate inference for Transformers models, including quantization and optimization techniques