Accelerated Inference with Optimum and Transformers Pipelines

📰 Hugging Face Blog

Accelerate inference with Optimum and Transformers pipelines for improved performance and speed

intermediate Published 10 May 2022
Action Steps
  1. Install Optimum for Onnxruntime
  2. Convert a Hugging Face Transformers model to ONNX for inference
  3. Use the ORTOptimizer to optimize the model
  4. Use the ORTQuantizer to apply dynamic quantization
  5. Run accelerated inference using Transformers pipelines
  6. Evaluate the performance and speed
Who Needs to Know This

Data scientists and machine learning engineers can benefit from this tutorial to optimize their models for faster inference, while software engineers can apply these techniques to improve the performance of their applications

Key Insight

💡 Optimum provides a range of tools to optimize and accelerate inference for Transformers models, including quantization and optimization techniques

Share This
🚀 Accelerate inference with Optimum and Transformers pipelines! 🤖
Read full article → ← Back to News