Fine-tuning with TensorFlow

HuggingFace · Beginner ·🧬 Deep Learning ·5y ago

Key Takeaways

Fine-tuning a pre-trained Transformers model in TensorFlow using Keras, covering topics such as loading models, sequence classification, and compiling models with loss functions and optimizers.

Full Transcript

so in this video we're going to see how to load and fine-tune a pre-trained model it's very quick and if you've watched our pipeline videos which i'll link below the process is very similar this time though we're going to be using transfer learning and doing some training ourselves rather than just loading a model and using it as is like we did in the pipeline videos so if you to learn more about transfer learning if you don't know much about it you can head to the what is transfer learning video and i'll link that below as well but for now let's look at this code so to start we pick which model we want to use in this case we're going to use the famous classic bert but what does this this line here this monstrosity this tf auto model for sequence classification what does that mean well the tf stands for tensorflow and the rest means take a language model and stick a sequence classification head onto it if it doesn't have one already so what we're going to do here is load bert which is a general language general purpose language model that doesn't have a sequence classification head we're going to use the from pre-trained method and that method ensures that all our weights come from the pre-trained model so they're not randomly initialized with the exception of the new sequence classification head we're going to add so this method needs to know two things firstly it needs to know the name of the model you wanted to load and secondly it needs to know how many classes your problem has so if you want to follow along with the data from our data sets videos which i'll link below then you'll have two classes positive and negative and thus num labels equals two but what about this compile thing so if you're familiar with keras you've probably seen this already but if not this is one of the core methods in keras you always need to compile your model before you train it compile needs to know two things firstly the loss function which is basically what are we trying to optimize and here we import the sparse categorical cross cross-entropy loss function so that's a mouthful if you've never encountered it before but it's the standard loss function for any neural network that's doing a classification task it basically encourages the network to output large values so large probabilities for the right class and low values of low probabilities for the wrong classes note that you can specify the loss function as a string like we do with the optimizer here but there's a very common pitfall here by default the loss assumes the output is probabilities from a soft max layer but what our model has actually output is the values before the softmax these are often called the logits or logits you saw these before in the video about pipelines if you get that this wrong your model won't train and it'll be very annoying to figure out why in fact i'm going to go so far as to say that if you remember absolutely nothing else from this video remember to always check whether your model is outputting logits or probabilities and make sure your loss is set up to match that so this is going to save you a lot of debugging headaches in your career that would otherwise be very difficult to track down and very annoying but leaving that aside the second thing compile needs to know is the optimizer you want in our case we're going to use adam which is sort of the standard optimizer for deep learning these days the one thing you might want to change is the learning rate and to do that we'll need to import the actual optimizer rather than just calling it by string so much like we did with the loss but we can talk about that in another video and i'll link that below for now let's just try training the model so how do you train the model well if you've used keras before this will all be very familiar to you but if not let's look at what we're doing here fit is pretty much the central method for keras models it tells the model to break the input into batches and then train on it so the first input is tokenized text you'll almost always be getting this from a tokenizer and if you want to learn more about that process what exactly does these inputs look like uh please check out our videos on tokenizers and again there'll be links for those below so those are our inputs but then the second argument is our labels and this is really straightforward this is just a one-dimensional numpy or tensorflow array of integers and they correspond to the classes for our examples and that's it so if you're following along with our data from our data sets video there'll only be two classes so this will just be a vector of zeros and ones but you can have many more classes than that for your own problems so once we have our inputs and our labels we do the same thing with the validation data we pass the validation inputs and the validation labels in a tuple and then we can if we want to specify details like the batch size for training and then you just pass the whole thing to model.fit and you let it rip so if everything works out you should see a little training progress bar as your loss goes down and while that's running you you know you sit back you call your boss and you tell them you're a senior nlp machine learning engineer now and you're going to want a salary review next quarter so this is really i'm kidding a bit but this is really all it takes to apply the power of a massive pre-trained language model to your nlp problem but could we do better than this like is there any changes we could make so there certainly are there's a few more advanced keras features like a tuned scheduled learning rate we could get an even lower loss and an even therefore an even more accurate model and also when we when fit finishes what do we do with our model once it's trained so these are all topics i'm going to cover these and more in the videos and again i'm going to link those subsequent videos below

Original Description

This is the olversion of the Fine-Tuning with TensorFlow video, you should watch https://youtu.be/AUozVp78dhk instead. Let's fine-tune a Transformers models in TensorFlow, using Keras. This video is part of the Hugging Face course: http://huggingface.co/course Open in colab to run the code samples: https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/videos/tensorflow_finetuning.ipynb Related videos: - What is transfer learning: https://youtu.be/BqqfQnyjmgg - What is inside the pipeline function: https://youtu.be/wVN12smEvqg - The tokenization pipeline: https://youtu.be/Yffk5aydLzg - Datasets overview: https://youtu.be/W_gMJF0xomE - Introduction to Keras: https://youtu.be/rnTGBy2ax1c - Learning Rate Scheduling in TensorFlow: https://youtu.be/eKv4rRcCNX0 - Prediction and metrics: https://youtu.be/nx10eh4CoOs Have a question? Checkout the forums: https://discuss.huggingface.co/c/course/20 Subscribe to our newsletter: https://huggingface.curated.co/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from HuggingFace · HuggingFace · 18 of 60

1 The Future of Natural Language Processing
The Future of Natural Language Processing
HuggingFace
2 Trends in Model Size & Computational Efficiency in NLP
Trends in Model Size & Computational Efficiency in NLP
HuggingFace
3 Increasing Data Usage in Natural Language Processing
Increasing Data Usage in Natural Language Processing
HuggingFace
4 In Domain & Out of Domain Generalization in the Future of NLP
In Domain & Out of Domain Generalization in the Future of NLP
HuggingFace
5 The Limits of NLU & the Rise of NLG in the Future of NLP
The Limits of NLU & the Rise of NLG in the Future of NLP
HuggingFace
6 The Lack of Robustness in the Future of NLP
The Lack of Robustness in the Future of NLP
HuggingFace
7 Inductive Bias, Common Sense, Continual Learning in The Future of NLP
Inductive Bias, Common Sense, Continual Learning in The Future of NLP
HuggingFace
8 Train a Hugging Face Transformers Model with Amazon SageMaker
Train a Hugging Face Transformers Model with Amazon SageMaker
HuggingFace
9 What is Transfer Learning?
What is Transfer Learning?
HuggingFace
10 The pipeline function
The pipeline function
HuggingFace
11 Navigating the Model Hub
Navigating the Model Hub
HuggingFace
12 Transformer models: Decoders
Transformer models: Decoders
HuggingFace
13 The Transformer architecture
The Transformer architecture
HuggingFace
14 Transformer models: Encoder-Decoders
Transformer models: Encoder-Decoders
HuggingFace
15 Transformer models: Encoders
Transformer models: Encoders
HuggingFace
16 Keras introduction
Keras introduction
HuggingFace
17 The push to hub API
The push to hub API
HuggingFace
Fine-tuning with TensorFlow
Fine-tuning with TensorFlow
HuggingFace
19 Learning rate scheduling with TensorFlow
Learning rate scheduling with TensorFlow
HuggingFace
20 TensorFlow Predictions and metrics
TensorFlow Predictions and metrics
HuggingFace
21 Welcome to the Hugging Face course
Welcome to the Hugging Face course
HuggingFace
22 The tokenization pipeline
The tokenization pipeline
HuggingFace
23 Supercharge your PyTorch training loop with Accelerate
Supercharge your PyTorch training loop with Accelerate
HuggingFace
24 The Trainer API
The Trainer API
HuggingFace
25 Batching inputs together (PyTorch)
Batching inputs together (PyTorch)
HuggingFace
26 Batching inputs together (TensorFlow)
Batching inputs together (TensorFlow)
HuggingFace
27 Hugging Face Datasets overview (Pytorch)
Hugging Face Datasets overview (Pytorch)
HuggingFace
28 Hugging Face Datasets overview (Tensorflow)
Hugging Face Datasets overview (Tensorflow)
HuggingFace
29 What is dynamic padding?
What is dynamic padding?
HuggingFace
30 What happens inside the pipeline function? (PyTorch)
What happens inside the pipeline function? (PyTorch)
HuggingFace
31 What happens inside the pipeline function? (TensorFlow)
What happens inside the pipeline function? (TensorFlow)
HuggingFace
32 Instantiate a Transformers model (PyTorch)
Instantiate a Transformers model (PyTorch)
HuggingFace
33 Instantiate a Transformers model (TensorFlow)
Instantiate a Transformers model (TensorFlow)
HuggingFace
34 Preprocessing sentence pairs (PyTorch)
Preprocessing sentence pairs (PyTorch)
HuggingFace
35 Preprocessing sentence pairs (TensorFlow)
Preprocessing sentence pairs (TensorFlow)
HuggingFace
36 Write your training loop in PyTorch
Write your training loop in PyTorch
HuggingFace
37 Managing a repo on the Model Hub
Managing a repo on the Model Hub
HuggingFace
38 Chapter 1 Live Session with Sylvain
Chapter 1 Live Session with Sylvain
HuggingFace
39 Chapter 2 Live Session with Lewis
Chapter 2 Live Session with Lewis
HuggingFace
40 The push to hub API
The push to hub API
HuggingFace
41 Chapter 2 Live Session with Sylvain
Chapter 2 Live Session with Sylvain
HuggingFace
42 Chapter 3 live sessions with Lewis (PyTorch)
Chapter 3 live sessions with Lewis (PyTorch)
HuggingFace
43 Day 1 Talks: JAX, Flax & Transformers 🤗
Day 1 Talks: JAX, Flax & Transformers 🤗
HuggingFace
44 Day 2 Talks: JAX, Flax & Transformers 🤗
Day 2 Talks: JAX, Flax & Transformers 🤗
HuggingFace
45 Day 3 Talks JAX, Flax, Transformers 🤗
Day 3 Talks JAX, Flax, Transformers 🤗
HuggingFace
46 Chapter 4 live sessions with Omar
Chapter 4 live sessions with Omar
HuggingFace
47 Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker
Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker
HuggingFace
48 Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker
Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker
HuggingFace
49 Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker
Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker
HuggingFace
50 [Webinar] How to add machine learning capabilities with just a few lines of code
[Webinar] How to add machine learning capabilities with just a few lines of code
HuggingFace
51 Hugging Face + Zapier Demo Video
Hugging Face + Zapier Demo Video
HuggingFace
52 Hugging Face + Google Sheets Demo
Hugging Face + Google Sheets Demo
HuggingFace
53 Hugging Face Infinity Launch - 09/28
Hugging Face Infinity Launch - 09/28
HuggingFace
54 Build and Deploy a Machine Learning App in 2 Minutes
Build and Deploy a Machine Learning App in 2 Minutes
HuggingFace
55 Hugging Face Infinity - GPU Walkthrough
Hugging Face Infinity - GPU Walkthrough
HuggingFace
56 Otto - 🤗 Infinity Case Study
Otto - 🤗 Infinity Case Study
HuggingFace
57 Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it
Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it
HuggingFace
58 Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models
Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models
HuggingFace
59 🤗 Tasks: Causal Language Modeling
🤗 Tasks: Causal Language Modeling
HuggingFace
60 🤗 Tasks: Masked Language Modeling
🤗 Tasks: Masked Language Modeling
HuggingFace

This video teaches how to fine-tune a pre-trained Transformers model in TensorFlow using Keras, covering key concepts such as sequence classification, loss functions, and optimizers.

Key Takeaways
  1. Load a pre-trained model using tf.auto_model_for_sequence_classification
  2. Compile the model with a loss function and optimizer
  3. Train the model using the fit method
  4. Evaluate the model's performance on a validation set
💡 Always check whether your model is outputting logits or probabilities and make sure your loss is set up to match that, as this can save a lot of debugging headaches.

Related Reads

📰
Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
📰
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
📰
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
📰
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →