Fine-tuning with TensorFlow

HuggingFace · Beginner ·🧬 Deep Learning ·5y ago

Skills: Fine-tuning LLMs90%LLM Engineering70%

Key Takeaways

Fine-tuning a pre-trained Transformers model in TensorFlow using Keras, covering topics such as loading models, sequence classification, and compiling models with loss functions and optimizers.

Full Transcript

so in this video we're going to see how to load and fine-tune a pre-trained model it's very quick and if you've watched our pipeline videos which i'll link below the process is very similar this time though we're going to be using transfer learning and doing some training ourselves rather than just loading a model and using it as is like we did in the pipeline videos so if you to learn more about transfer learning if you don't know much about it you can head to the what is transfer learning video and i'll link that below as well but for now let's look at this code so to start we pick which model we want to use in this case we're going to use the famous classic bert but what does this this line here this monstrosity this tf auto model for sequence classification what does that mean well the tf stands for tensorflow and the rest means take a language model and stick a sequence classification head onto it if it doesn't have one already so what we're going to do here is load bert which is a general language general purpose language model that doesn't have a sequence classification head we're going to use the from pre-trained method and that method ensures that all our weights come from the pre-trained model so they're not randomly initialized with the exception of the new sequence classification head we're going to add so this method needs to know two things firstly it needs to know the name of the model you wanted to load and secondly it needs to know how many classes your problem has so if you want to follow along with the data from our data sets videos which i'll link below then you'll have two classes positive and negative and thus num labels equals two but what about this compile thing so if you're familiar with keras you've probably seen this already but if not this is one of the core methods in keras you always need to compile your model before you train it compile needs to know two things firstly the loss function which is basically what are we trying to optimize and here we import the sparse categorical cross cross-entropy loss function so that's a mouthful if you've never encountered it before but it's the standard loss function for any neural network that's doing a classification task it basically encourages the network to output large values so large probabilities for the right class and low values of low probabilities for the wrong classes note that you can specify the loss function as a string like we do with the optimizer here but there's a very common pitfall here by default the loss assumes the output is probabilities from a soft max layer but what our model has actually output is the values before the softmax these are often called the logits or logits you saw these before in the video about pipelines if you get that this wrong your model won't train and it'll be very annoying to figure out why in fact i'm going to go so far as to say that if you remember absolutely nothing else from this video remember to always check whether your model is outputting logits or probabilities and make sure your loss is set up to match that so this is going to save you a lot of debugging headaches in your career that would otherwise be very difficult to track down and very annoying but leaving that aside the second thing compile needs to know is the optimizer you want in our case we're going to use adam which is sort of the standard optimizer for deep learning these days the one thing you might want to change is the learning rate and to do that we'll need to import the actual optimizer rather than just calling it by string so much like we did with the loss but we can talk about that in another video and i'll link that below for now let's just try training the model so how do you train the model well if you've used keras before this will all be very familiar to you but if not let's look at what we're doing here fit is pretty much the central method for keras models it tells the model to break the input into batches and then train on it so the first input is tokenized text you'll almost always be getting this from a tokenizer and if you want to learn more about that process what exactly does these inputs look like uh please check out our videos on tokenizers and again there'll be links for those below so those are our inputs but then the second argument is our labels and this is really straightforward this is just a one-dimensional numpy or tensorflow array of integers and they correspond to the classes for our examples and that's it so if you're following along with our data from our data sets video there'll only be two classes so this will just be a vector of zeros and ones but you can have many more classes than that for your own problems so once we have our inputs and our labels we do the same thing with the validation data we pass the validation inputs and the validation labels in a tuple and then we can if we want to specify details like the batch size for training and then you just pass the whole thing to model.fit and you let it rip so if everything works out you should see a little training progress bar as your loss goes down and while that's running you you know you sit back you call your boss and you tell them you're a senior nlp machine learning engineer now and you're going to want a salary review next quarter so this is really i'm kidding a bit but this is really all it takes to apply the power of a massive pre-trained language model to your nlp problem but could we do better than this like is there any changes we could make so there certainly are there's a few more advanced keras features like a tuned scheduled learning rate we could get an even lower loss and an even therefore an even more accurate model and also when we when fit finishes what do we do with our model once it's trained so these are all topics i'm going to cover these and more in the videos and again i'm going to link those subsequent videos below

Original Description

This is the olversion of the Fine-Tuning with TensorFlow video, you should watch https://youtu.be/AUozVp78dhk instead. Let's fine-tune a Transformers models in TensorFlow, using Keras. This video is part of the Hugging Face course: http://huggingface.co/course Open in colab to run the code samples: https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/videos/tensorflow_finetuning.ipynb Related videos: - What is transfer learning: https://youtu.be/BqqfQnyjmgg - What is inside the pipeline function: https://youtu.be/wVN12smEvqg - The tokenization pipeline: https://youtu.be/Yffk5aydLzg - Datasets overview: https://youtu.be/W_gMJF0xomE - Introduction to Keras: https://youtu.be/rnTGBy2ax1c - Learning Rate Scheduling in TensorFlow: https://youtu.be/eKv4rRcCNX0 - Prediction and metrics: https://youtu.be/nx10eh4CoOs Have a question? Checkout the forums: https://discuss.huggingface.co/c/course/20 Subscribe to our newsletter: https://huggingface.curated.co/

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from HuggingFace · HuggingFace · 18 of 60

← Previous Next →

The Future of Natural Language Processing

The Future of Natural Language Processing

Trends in Model Size & Computational Efficiency in NLP

Trends in Model Size & Computational Efficiency in NLP

Increasing Data Usage in Natural Language Processing

Increasing Data Usage in Natural Language Processing

In Domain & Out of Domain Generalization in the Future of NLP

In Domain & Out of Domain Generalization in the Future of NLP

The Limits of NLU & the Rise of NLG in the Future of NLP

The Limits of NLU & the Rise of NLG in the Future of NLP

The Lack of Robustness in the Future of NLP

The Lack of Robustness in the Future of NLP

Inductive Bias, Common Sense, Continual Learning in The Future of NLP

Inductive Bias, Common Sense, Continual Learning in The Future of NLP

Train a Hugging Face Transformers Model with Amazon SageMaker

Train a Hugging Face Transformers Model with Amazon SageMaker

What is Transfer Learning?

What is Transfer Learning?

The pipeline function

The pipeline function

Navigating the Model Hub

Navigating the Model Hub

Transformer models: Decoders

Transformer models: Decoders

The Transformer architecture

The Transformer architecture

Transformer models: Encoder-Decoders

Transformer models: Encoder-Decoders

Transformer models: Encoders

Transformer models: Encoders

Keras introduction

Keras introduction

The push to hub API

The push to hub API

Fine-tuning with TensorFlow

Fine-tuning with TensorFlow

Learning rate scheduling with TensorFlow

Learning rate scheduling with TensorFlow

TensorFlow Predictions and metrics

TensorFlow Predictions and metrics

Welcome to the Hugging Face course

Welcome to the Hugging Face course

The tokenization pipeline

The tokenization pipeline

Supercharge your PyTorch training loop with Accelerate

Supercharge your PyTorch training loop with Accelerate

The Trainer API

The Trainer API

Batching inputs together (PyTorch)

Batching inputs together (PyTorch)

Batching inputs together (TensorFlow)

Batching inputs together (TensorFlow)

Hugging Face Datasets overview (Pytorch)

Hugging Face Datasets overview (Pytorch)

Hugging Face Datasets overview (Tensorflow)

Hugging Face Datasets overview (Tensorflow)

What is dynamic padding?

What is dynamic padding?

What happens inside the pipeline function? (PyTorch)

What happens inside the pipeline function? (PyTorch)

What happens inside the pipeline function? (TensorFlow)

What happens inside the pipeline function? (TensorFlow)

Instantiate a Transformers model (PyTorch)

Instantiate a Transformers model (PyTorch)

Instantiate a Transformers model (TensorFlow)

Instantiate a Transformers model (TensorFlow)

Preprocessing sentence pairs (PyTorch)

Preprocessing sentence pairs (PyTorch)

Preprocessing sentence pairs (TensorFlow)

Preprocessing sentence pairs (TensorFlow)

Write your training loop in PyTorch

Write your training loop in PyTorch

Managing a repo on the Model Hub

Managing a repo on the Model Hub

Chapter 1 Live Session with Sylvain

Chapter 1 Live Session with Sylvain

Chapter 2 Live Session with Lewis

Chapter 2 Live Session with Lewis

The push to hub API

The push to hub API

Chapter 2 Live Session with Sylvain

Chapter 2 Live Session with Sylvain

Chapter 3 live sessions with Lewis (PyTorch)

Chapter 3 live sessions with Lewis (PyTorch)

Day 1 Talks: JAX, Flax & Transformers 🤗

Day 1 Talks: JAX, Flax & Transformers 🤗

Day 2 Talks: JAX, Flax & Transformers 🤗

Day 2 Talks: JAX, Flax & Transformers 🤗

Day 3 Talks JAX, Flax, Transformers 🤗

Day 3 Talks JAX, Flax, Transformers 🤗

Chapter 4 live sessions with Omar

Chapter 4 live sessions with Omar

Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker

Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker

Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker

Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker

Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker

Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker

[Webinar] How to add machine learning capabilities with just a few lines of code

[Webinar] How to add machine learning capabilities with just a few lines of code

Hugging Face + Zapier Demo Video

Hugging Face + Zapier Demo Video

Hugging Face + Google Sheets Demo

Hugging Face + Google Sheets Demo

Hugging Face Infinity Launch - 09/28

Hugging Face Infinity Launch - 09/28

Build and Deploy a Machine Learning App in 2 Minutes

Build and Deploy a Machine Learning App in 2 Minutes

Hugging Face Infinity - GPU Walkthrough

Hugging Face Infinity - GPU Walkthrough

Otto - 🤗 Infinity Case Study

Otto - 🤗 Infinity Case Study

Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it

Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it

Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models

Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models

🤗 Tasks: Causal Language Modeling

🤗 Tasks: Causal Language Modeling

🤗 Tasks: Masked Language Modeling

🤗 Tasks: Masked Language Modeling

This video teaches how to fine-tune a pre-trained Transformers model in TensorFlow using Keras, covering key concepts such as sequence classification, loss functions, and optimizers.

Key Takeaways

Load a pre-trained model using tf.auto_model_for_sequence_classification
Compile the model with a loss function and optimizer
Train the model using the fit method
Evaluate the model's performance on a validation set

💡 Always check whether your model is outputting logits or probabilities and make sure your loss is set up to match that, as this can save a lot of debugging headaches.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Fine-tuning LLMs

View skill →

Fine-tuning T5 LLM for Text Generation: Complete Tutorial w/ free COLAB #coding

Fine-tuning T5 LLM for Text Generation: Complete Tutorial w/ free COLAB #coding

Train image classifier using transfer learning - Fine-tuning MobileNet with Keras

Train image classifier using transfer learning - Fine-tuning MobileNet with Keras

Advanced Fine-Tuning in Rust

Advanced Fine-Tuning in Rust

GPT-4o: Fine-tune OpenAI's Multimodal Model | Live Coding & Q&A (Oct 3rd)

GPT-4o: Fine-tune OpenAI's Multimodal Model | Live Coding & Q&A (Oct 3rd)

LLM Fine-tuning: Two Crucial Tips for New Models - LLama 2

LLM Fine-tuning: Two Crucial Tips for New Models - LLama 2

SDXL LORA STYLE Training! Get THE PERFECT RESULTS!

SDXL LORA STYLE Training! Get THE PERFECT RESULTS!

Related Reads

Understanding Deep Learning Through Four Interactive Experiments

Explore deep learning concepts through interactive experiments to gain hands-on understanding

Medium · Data Science

Understanding Deep Learning Through Four Interactive Experiments

Explore deep learning through interactive experiments to gain hands-on understanding

Medium · Deep Learning

Optimizers in Deep Learning: From Gradient Descent to Adam

Learn how optimizers in deep learning work, from basic Gradient Descent to advanced Adam optimizer, to improve model training

Medium · Deep Learning

The Meta-Architecture of Interface Fracture: High-Dimensional Logical Stress and Systemic Collapse…

Learn about the meta-architecture of interface fracture and its relation to high-dimensional logical stress and systemic collapse in deep learning systems

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train