Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker

HuggingFace · Intermediate ·🧠 Large Language Models ·4y ago

Skills: LLM Engineering90%ML Pipelines70%

Key Takeaways

Deploying Hugging Face Transformers models from the Model Hub to Amazon SageMaker for inference using the HuggingFaceModel class and SageMaker SDK.

Full Transcript

hi everyone my name is philip i'm a machine learning engineer at hugging phase and also the tech lead for our partnership with aws i will show you in a few minutes how you can deploy hiking phase transformers from dehydrating phase hub to amazon sagemaker for inference amazon sagemaker is a fully managed machine learning service that provides everyone with the ability to build train and deploy machine learning models quickly together with aws we have built an inference optimized solution to deploy a transformer model as sagemaker endpoints let's get started we are going to use the sagemaker notebook instance to deploy our model using a sagemaker notebook instance is not a requirement to deploy your model you can also use sagemaker studio or deploy a model from your local machine the first step we need to do is to choose a model from the hugging phase hub the hiking phase hub offers over 10 000 different models fine-tuned on nlp vision and speech in this example we want to use a model fine tune on question answering and we also want to be it quick therefore we select the digital bird base cased distal squad model we copy the model id and go back to our notebook instance to be able to deploy a model from the hub to sagemaker we need to create a hugging face model class this model class contains all the configuration for our hub model as the model id and our task in this case question and the ring after that we can create our hugging face model and run the dot deploy method dot deploy method will create our sagemaker endpoint and we can pass in the initial instance count basically we can define on how many instances we want our endpoint to deploy it and the instance type in this case i went with the m5 instance which is a cpu based image the deploy will now create our sagemaker endpoint which takes around 5 minutes the endpoint has been successfully deployed to sagemaker and can now be used for inference the benefit of using the python sagemaker sdk is that the deploy method automatically returns predictor class which we can use to request our endpoint with the dot predict method therefore we need to define our inputs in this example since it's a question answering model we have our context my name is philip and i live in nuremberg this model is used with stage maker for inference and our question what is used for inference and our request has been successfully executed and the model successfully predicted the correct answer what is used for inference its sagemaker after you are done testing or requesting your endpoint you can run predictor dot delete endpoint this will clean up everything on sagemaker and make sure everything is deleted properly if you want to learn more about packing face on amazon sagemaker check out huggingface.co maker

Original Description

To deploy a model directly from the Hugging Face Model Hub to Amazon SageMaker, we need to define two environment variables when creating the HuggingFaceModel. We need to define: - HF_MODEL_ID: defines the model id, which will be automatically loaded from huggingface.co/models when creating or SageMaker Endpoint. The 🤗 Hub provides +10 000 models all available through this environment variable. - HF_TASK: defines the task for the used 🤗 Transformers pipeline. A full list of tasks can be found here. https://huggingface.co/blog/deploy-hugging-face-models-easily-with-amazon-sagemaker

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from HuggingFace · HuggingFace · 48 of 60

← Previous Next →

The Future of Natural Language Processing

The Future of Natural Language Processing

Trends in Model Size & Computational Efficiency in NLP

Trends in Model Size & Computational Efficiency in NLP

Increasing Data Usage in Natural Language Processing

Increasing Data Usage in Natural Language Processing

In Domain & Out of Domain Generalization in the Future of NLP

In Domain & Out of Domain Generalization in the Future of NLP

The Limits of NLU & the Rise of NLG in the Future of NLP

The Limits of NLU & the Rise of NLG in the Future of NLP

The Lack of Robustness in the Future of NLP

The Lack of Robustness in the Future of NLP

Inductive Bias, Common Sense, Continual Learning in The Future of NLP

Inductive Bias, Common Sense, Continual Learning in The Future of NLP

Train a Hugging Face Transformers Model with Amazon SageMaker

Train a Hugging Face Transformers Model with Amazon SageMaker

What is Transfer Learning?

What is Transfer Learning?

The pipeline function

The pipeline function

Navigating the Model Hub

Navigating the Model Hub

Transformer models: Decoders

Transformer models: Decoders

The Transformer architecture

The Transformer architecture

Transformer models: Encoder-Decoders

Transformer models: Encoder-Decoders

Transformer models: Encoders

Transformer models: Encoders

Keras introduction

Keras introduction

The push to hub API

The push to hub API

Fine-tuning with TensorFlow

Fine-tuning with TensorFlow

Learning rate scheduling with TensorFlow

Learning rate scheduling with TensorFlow

TensorFlow Predictions and metrics

TensorFlow Predictions and metrics

Welcome to the Hugging Face course

Welcome to the Hugging Face course

The tokenization pipeline

The tokenization pipeline

Supercharge your PyTorch training loop with Accelerate

Supercharge your PyTorch training loop with Accelerate

The Trainer API

The Trainer API

Batching inputs together (PyTorch)

Batching inputs together (PyTorch)

Batching inputs together (TensorFlow)

Batching inputs together (TensorFlow)

Hugging Face Datasets overview (Pytorch)

Hugging Face Datasets overview (Pytorch)

Hugging Face Datasets overview (Tensorflow)

Hugging Face Datasets overview (Tensorflow)

What is dynamic padding?

What is dynamic padding?

What happens inside the pipeline function? (PyTorch)

What happens inside the pipeline function? (PyTorch)

What happens inside the pipeline function? (TensorFlow)

What happens inside the pipeline function? (TensorFlow)

Instantiate a Transformers model (PyTorch)

Instantiate a Transformers model (PyTorch)

Instantiate a Transformers model (TensorFlow)

Instantiate a Transformers model (TensorFlow)

Preprocessing sentence pairs (PyTorch)

Preprocessing sentence pairs (PyTorch)

Preprocessing sentence pairs (TensorFlow)

Preprocessing sentence pairs (TensorFlow)

Write your training loop in PyTorch

Write your training loop in PyTorch

Managing a repo on the Model Hub

Managing a repo on the Model Hub

Chapter 1 Live Session with Sylvain

Chapter 1 Live Session with Sylvain

Chapter 2 Live Session with Lewis

Chapter 2 Live Session with Lewis

The push to hub API

The push to hub API

Chapter 2 Live Session with Sylvain

Chapter 2 Live Session with Sylvain

Chapter 3 live sessions with Lewis (PyTorch)

Chapter 3 live sessions with Lewis (PyTorch)

Day 1 Talks: JAX, Flax & Transformers 🤗

Day 1 Talks: JAX, Flax & Transformers 🤗

Day 2 Talks: JAX, Flax & Transformers 🤗

Day 2 Talks: JAX, Flax & Transformers 🤗

Day 3 Talks JAX, Flax, Transformers 🤗

Day 3 Talks JAX, Flax, Transformers 🤗

Chapter 4 live sessions with Omar

Chapter 4 live sessions with Omar

Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker

Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker

Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker

Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker

Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker

Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker

[Webinar] How to add machine learning capabilities with just a few lines of code

[Webinar] How to add machine learning capabilities with just a few lines of code

Hugging Face + Zapier Demo Video

Hugging Face + Zapier Demo Video

Hugging Face + Google Sheets Demo

Hugging Face + Google Sheets Demo

Hugging Face Infinity Launch - 09/28

Hugging Face Infinity Launch - 09/28

Build and Deploy a Machine Learning App in 2 Minutes

Build and Deploy a Machine Learning App in 2 Minutes

Hugging Face Infinity - GPU Walkthrough

Hugging Face Infinity - GPU Walkthrough

Otto - 🤗 Infinity Case Study

Otto - 🤗 Infinity Case Study

Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it

Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it

Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models

Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models

🤗 Tasks: Causal Language Modeling

🤗 Tasks: Causal Language Modeling

🤗 Tasks: Masked Language Modeling

🤗 Tasks: Masked Language Modeling

This video demonstrates how to deploy a Hugging Face Transformers model from the Model Hub to Amazon SageMaker for inference. The process involves creating a HuggingFaceModel class, defining environment variables, and using the SageMaker SDK to deploy the model.

Key Takeaways

Choose a model from the Hugging Face Model Hub
Create a HuggingFaceModel class
Define environment variables (HF_MODEL_ID)
Create a SageMaker notebook instance
Deploy the model using the SageMaker SDK
Test the endpoint with a sample input
Clean up resources using predictor.delete_endpoint()

💡 The HuggingFaceModel class and SageMaker SDK provide a streamlined way to deploy Transformers models from the Model Hub to Amazon SageMaker for inference.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related AI Lessons

Embeddings Simplified

Learn the basics of embeddings and how they simplify complex data, a crucial concept in AI and ML

Building LSTMs with PyTorch and Lightning AI Part 7: Resuming Training with Checkpoints

Learn to resume LSTM training with checkpoints using PyTorch and Lightning AI, enabling efficient model iteration and development

Dev.to · Rijul Rajesh

How AI Learns with Less Labeled Data

Learn how AI can learn with less labeled data, a crucial aspect of machine learning beyond model selection

Comparing Sarvam-30B and Qwen2.5–14B on Spider Text-to-SQL: An Active-Parameter Perspective

Learn how to compare large language models like Sarvam-30B and Qwen2.5-14B on the Spider Text-to-SQL benchmark from an active-parameter perspective

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)