Fine-tuning with TensorFlow
Key Takeaways
Fine-tuning a pre-trained Transformers model in TensorFlow using Keras, covering topics such as loading models, sequence classification, and compiling models with loss functions and optimizers.
Full Transcript
so in this video we're going to see how to load and fine-tune a pre-trained model it's very quick and if you've watched our pipeline videos which i'll link below the process is very similar this time though we're going to be using transfer learning and doing some training ourselves rather than just loading a model and using it as is like we did in the pipeline videos so if you to learn more about transfer learning if you don't know much about it you can head to the what is transfer learning video and i'll link that below as well but for now let's look at this code so to start we pick which model we want to use in this case we're going to use the famous classic bert but what does this this line here this monstrosity this tf auto model for sequence classification what does that mean well the tf stands for tensorflow and the rest means take a language model and stick a sequence classification head onto it if it doesn't have one already so what we're going to do here is load bert which is a general language general purpose language model that doesn't have a sequence classification head we're going to use the from pre-trained method and that method ensures that all our weights come from the pre-trained model so they're not randomly initialized with the exception of the new sequence classification head we're going to add so this method needs to know two things firstly it needs to know the name of the model you wanted to load and secondly it needs to know how many classes your problem has so if you want to follow along with the data from our data sets videos which i'll link below then you'll have two classes positive and negative and thus num labels equals two but what about this compile thing so if you're familiar with keras you've probably seen this already but if not this is one of the core methods in keras you always need to compile your model before you train it compile needs to know two things firstly the loss function which is basically what are we trying to optimize and here we import the sparse categorical cross cross-entropy loss function so that's a mouthful if you've never encountered it before but it's the standard loss function for any neural network that's doing a classification task it basically encourages the network to output large values so large probabilities for the right class and low values of low probabilities for the wrong classes note that you can specify the loss function as a string like we do with the optimizer here but there's a very common pitfall here by default the loss assumes the output is probabilities from a soft max layer but what our model has actually output is the values before the softmax these are often called the logits or logits you saw these before in the video about pipelines if you get that this wrong your model won't train and it'll be very annoying to figure out why in fact i'm going to go so far as to say that if you remember absolutely nothing else from this video remember to always check whether your model is outputting logits or probabilities and make sure your loss is set up to match that so this is going to save you a lot of debugging headaches in your career that would otherwise be very difficult to track down and very annoying but leaving that aside the second thing compile needs to know is the optimizer you want in our case we're going to use adam which is sort of the standard optimizer for deep learning these days the one thing you might want to change is the learning rate and to do that we'll need to import the actual optimizer rather than just calling it by string so much like we did with the loss but we can talk about that in another video and i'll link that below for now let's just try training the model so how do you train the model well if you've used keras before this will all be very familiar to you but if not let's look at what we're doing here fit is pretty much the central method for keras models it tells the model to break the input into batches and then train on it so the first input is tokenized text you'll almost always be getting this from a tokenizer and if you want to learn more about that process what exactly does these inputs look like uh please check out our videos on tokenizers and again there'll be links for those below so those are our inputs but then the second argument is our labels and this is really straightforward this is just a one-dimensional numpy or tensorflow array of integers and they correspond to the classes for our examples and that's it so if you're following along with our data from our data sets video there'll only be two classes so this will just be a vector of zeros and ones but you can have many more classes than that for your own problems so once we have our inputs and our labels we do the same thing with the validation data we pass the validation inputs and the validation labels in a tuple and then we can if we want to specify details like the batch size for training and then you just pass the whole thing to model.fit and you let it rip so if everything works out you should see a little training progress bar as your loss goes down and while that's running you you know you sit back you call your boss and you tell them you're a senior nlp machine learning engineer now and you're going to want a salary review next quarter so this is really i'm kidding a bit but this is really all it takes to apply the power of a massive pre-trained language model to your nlp problem but could we do better than this like is there any changes we could make so there certainly are there's a few more advanced keras features like a tuned scheduled learning rate we could get an even lower loss and an even therefore an even more accurate model and also when we when fit finishes what do we do with our model once it's trained so these are all topics i'm going to cover these and more in the videos and again i'm going to link those subsequent videos below
Original Description
This is the olversion of the Fine-Tuning with TensorFlow video, you should watch https://youtu.be/AUozVp78dhk instead.
Let's fine-tune a Transformers models in TensorFlow, using Keras.
This video is part of the Hugging Face course: http://huggingface.co/course
Open in colab to run the code samples:
https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/videos/tensorflow_finetuning.ipynb
Related videos:
- What is transfer learning: https://youtu.be/BqqfQnyjmgg
- What is inside the pipeline function: https://youtu.be/wVN12smEvqg
- The tokenization pipeline: https://youtu.be/Yffk5aydLzg
- Datasets overview: https://youtu.be/W_gMJF0xomE
- Introduction to Keras: https://youtu.be/rnTGBy2ax1c
- Learning Rate Scheduling in TensorFlow: https://youtu.be/eKv4rRcCNX0
- Prediction and metrics: https://youtu.be/nx10eh4CoOs
Have a question? Checkout the forums: https://discuss.huggingface.co/c/course/20
Subscribe to our newsletter: https://huggingface.curated.co/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from HuggingFace · HuggingFace · 18 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
▶
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
The Future of Natural Language Processing
HuggingFace
Trends in Model Size & Computational Efficiency in NLP
HuggingFace
Increasing Data Usage in Natural Language Processing
HuggingFace
In Domain & Out of Domain Generalization in the Future of NLP
HuggingFace
The Limits of NLU & the Rise of NLG in the Future of NLP
HuggingFace
The Lack of Robustness in the Future of NLP
HuggingFace
Inductive Bias, Common Sense, Continual Learning in The Future of NLP
HuggingFace
Train a Hugging Face Transformers Model with Amazon SageMaker
HuggingFace
What is Transfer Learning?
HuggingFace
The pipeline function
HuggingFace
Navigating the Model Hub
HuggingFace
Transformer models: Decoders
HuggingFace
The Transformer architecture
HuggingFace
Transformer models: Encoder-Decoders
HuggingFace
Transformer models: Encoders
HuggingFace
Keras introduction
HuggingFace
The push to hub API
HuggingFace
Fine-tuning with TensorFlow
HuggingFace
Learning rate scheduling with TensorFlow
HuggingFace
TensorFlow Predictions and metrics
HuggingFace
Welcome to the Hugging Face course
HuggingFace
The tokenization pipeline
HuggingFace
Supercharge your PyTorch training loop with Accelerate
HuggingFace
The Trainer API
HuggingFace
Batching inputs together (PyTorch)
HuggingFace
Batching inputs together (TensorFlow)
HuggingFace
Hugging Face Datasets overview (Pytorch)
HuggingFace
Hugging Face Datasets overview (Tensorflow)
HuggingFace
What is dynamic padding?
HuggingFace
What happens inside the pipeline function? (PyTorch)
HuggingFace
What happens inside the pipeline function? (TensorFlow)
HuggingFace
Instantiate a Transformers model (PyTorch)
HuggingFace
Instantiate a Transformers model (TensorFlow)
HuggingFace
Preprocessing sentence pairs (PyTorch)
HuggingFace
Preprocessing sentence pairs (TensorFlow)
HuggingFace
Write your training loop in PyTorch
HuggingFace
Managing a repo on the Model Hub
HuggingFace
Chapter 1 Live Session with Sylvain
HuggingFace
Chapter 2 Live Session with Lewis
HuggingFace
The push to hub API
HuggingFace
Chapter 2 Live Session with Sylvain
HuggingFace
Chapter 3 live sessions with Lewis (PyTorch)
HuggingFace
Day 1 Talks: JAX, Flax & Transformers 🤗
HuggingFace
Day 2 Talks: JAX, Flax & Transformers 🤗
HuggingFace
Day 3 Talks JAX, Flax, Transformers 🤗
HuggingFace
Chapter 4 live sessions with Omar
HuggingFace
Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker
HuggingFace
Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker
HuggingFace
Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker
HuggingFace
[Webinar] How to add machine learning capabilities with just a few lines of code
HuggingFace
Hugging Face + Zapier Demo Video
HuggingFace
Hugging Face + Google Sheets Demo
HuggingFace
Hugging Face Infinity Launch - 09/28
HuggingFace
Build and Deploy a Machine Learning App in 2 Minutes
HuggingFace
Hugging Face Infinity - GPU Walkthrough
HuggingFace
Otto - 🤗 Infinity Case Study
HuggingFace
Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it
HuggingFace
Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models
HuggingFace
🤗 Tasks: Causal Language Modeling
HuggingFace
🤗 Tasks: Masked Language Modeling
HuggingFace
More on: Fine-tuning LLMs
View skill →Related Reads
📰
📰
📰
📰
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI