Learning rate scheduling with TensorFlow

HuggingFace · Beginner ·🧬 Deep Learning ·5y ago

Key Takeaways

This video demonstrates how to schedule the learning rate using TensorFlow and Keras, improving the consistency of model training.

Full Transcript

in our other videos we talked about the basics of fine-tuning a language model with tensorflow and as always when i refer to videos i'll link them below but still can we do better so here's the code from our model fine tuning video and while while it works we could definitely tweak a couple of things by far the most important thing is the learning rate in this video we'll talk about how to change it which will make your training much more consistently successful in fact there are two things we want to change about the default learning rate for adam the first is that it's way too high for our models so by default adam uses a learning rate of 10 to the minus 3 which is very high for training transformers we're going to start at 5 by 10 to the minus 5 which is 20 times lower than the default and secondly we don't just want a constant learning rate we can get even better performance if we decay the learning rate down to a tiny value or even to zero over the course of training so that's what this polynomial decay schedule thing is doing that name might be intimidating especially if you only vaguely remember what a polynomial is for maths class so i'll show you what that decay looks like in a second but first we need to tell the scheduler how long training is going to be so that it decays at the right speed and that's what this code here is doing so we're computing how many mini batches the model is going to see over the entire training run and to compute that we're taking the size of the training set dividing it by the batch size which gives us the number of batches per epoch and then we're multiplying that by the number of epochs to get the total number of batches it's going to see over the whole training run so once we know how many batches how many training steps we're taking we just pass all of that information to the scheduler and we're ready to go so what does the polynomial decay schedule look like with default options it's actually just a linear schedule so it looks like this it starts at our initial value which is 5 by 10 to the minus 5 or five e minus five and then it decays down at a constant rate until it hits zero right at the very end of training so why do they call it polynomial and not linear well if you tweak the options you can get a higher order a truly polynomial decay schedule but there's no need to do that right now by default you get a linear schedule and if you were aware that a linear function is a special case of a polynomial you can feel proud so that aside how do we actually use the scheduler so this is easily we just pass it to adam you'll notice the first time when we compiled the model we just passed the string atom keras recognizes the names of common optimizers and loss functions if you pass them as strings so it saves time and it avoids imports to do it that way if you only want the default settings but we're professional machine learners now with our very own learning rate schedule so we have to do things properly so the first thing we do is we import the optimizer then we initialize it with our scheduler in the learning rate argument and then we compile the model using our new optimizer and whatever loss function you want we'll leave that unchanged this will be sparse categorical cross-entropy if you're following along from the fine-tuning video but it can be anything else that you're using yourself so now we have a high performance model ready to go all that remains is to fit the model just like we did before and remember because we've compiled the model with the new optimizer and the new learning rate we actually don't need to change anything about the fit.call at all we just call fit here exactly the same command we used before or if you've seen in other videos but now we get a beautiful training with a nice smooth a good initial learning rate and a solid learning rate decay and you will get much better performance as a result

Original Description

This is the olversion of the Learning rate Scheduling with TensorFlow, you should watch https://youtu.be/cpzq6ESSM5c instead. How to schedule the learning rate using TensorFlow and Keras. This video is part of the Hugging Face course: http://huggingface.co/course Open in colab to run the code samples: https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/videos/tf_lr_scheduling.ipynb Related videos: - Introduction to Keras: https://youtu.be/rnTGBy2ax1c - Fine-tuning with TensorFlow: https://youtu.be/alq1l8Lv9GA - Prediction and metrics: https://youtu.be/nx10eh4CoOs Have a question? Checkout the forums: https://discuss.huggingface.co/c/course/20 Subscribe to our newsletter: https://huggingface.curated.co/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from HuggingFace · HuggingFace · 19 of 60

1 The Future of Natural Language Processing
The Future of Natural Language Processing
HuggingFace
2 Trends in Model Size & Computational Efficiency in NLP
Trends in Model Size & Computational Efficiency in NLP
HuggingFace
3 Increasing Data Usage in Natural Language Processing
Increasing Data Usage in Natural Language Processing
HuggingFace
4 In Domain & Out of Domain Generalization in the Future of NLP
In Domain & Out of Domain Generalization in the Future of NLP
HuggingFace
5 The Limits of NLU & the Rise of NLG in the Future of NLP
The Limits of NLU & the Rise of NLG in the Future of NLP
HuggingFace
6 The Lack of Robustness in the Future of NLP
The Lack of Robustness in the Future of NLP
HuggingFace
7 Inductive Bias, Common Sense, Continual Learning in The Future of NLP
Inductive Bias, Common Sense, Continual Learning in The Future of NLP
HuggingFace
8 Train a Hugging Face Transformers Model with Amazon SageMaker
Train a Hugging Face Transformers Model with Amazon SageMaker
HuggingFace
9 What is Transfer Learning?
What is Transfer Learning?
HuggingFace
10 The pipeline function
The pipeline function
HuggingFace
11 Navigating the Model Hub
Navigating the Model Hub
HuggingFace
12 Transformer models: Decoders
Transformer models: Decoders
HuggingFace
13 The Transformer architecture
The Transformer architecture
HuggingFace
14 Transformer models: Encoder-Decoders
Transformer models: Encoder-Decoders
HuggingFace
15 Transformer models: Encoders
Transformer models: Encoders
HuggingFace
16 Keras introduction
Keras introduction
HuggingFace
17 The push to hub API
The push to hub API
HuggingFace
18 Fine-tuning with TensorFlow
Fine-tuning with TensorFlow
HuggingFace
Learning rate scheduling with TensorFlow
Learning rate scheduling with TensorFlow
HuggingFace
20 TensorFlow Predictions and metrics
TensorFlow Predictions and metrics
HuggingFace
21 Welcome to the Hugging Face course
Welcome to the Hugging Face course
HuggingFace
22 The tokenization pipeline
The tokenization pipeline
HuggingFace
23 Supercharge your PyTorch training loop with Accelerate
Supercharge your PyTorch training loop with Accelerate
HuggingFace
24 The Trainer API
The Trainer API
HuggingFace
25 Batching inputs together (PyTorch)
Batching inputs together (PyTorch)
HuggingFace
26 Batching inputs together (TensorFlow)
Batching inputs together (TensorFlow)
HuggingFace
27 Hugging Face Datasets overview (Pytorch)
Hugging Face Datasets overview (Pytorch)
HuggingFace
28 Hugging Face Datasets overview (Tensorflow)
Hugging Face Datasets overview (Tensorflow)
HuggingFace
29 What is dynamic padding?
What is dynamic padding?
HuggingFace
30 What happens inside the pipeline function? (PyTorch)
What happens inside the pipeline function? (PyTorch)
HuggingFace
31 What happens inside the pipeline function? (TensorFlow)
What happens inside the pipeline function? (TensorFlow)
HuggingFace
32 Instantiate a Transformers model (PyTorch)
Instantiate a Transformers model (PyTorch)
HuggingFace
33 Instantiate a Transformers model (TensorFlow)
Instantiate a Transformers model (TensorFlow)
HuggingFace
34 Preprocessing sentence pairs (PyTorch)
Preprocessing sentence pairs (PyTorch)
HuggingFace
35 Preprocessing sentence pairs (TensorFlow)
Preprocessing sentence pairs (TensorFlow)
HuggingFace
36 Write your training loop in PyTorch
Write your training loop in PyTorch
HuggingFace
37 Managing a repo on the Model Hub
Managing a repo on the Model Hub
HuggingFace
38 Chapter 1 Live Session with Sylvain
Chapter 1 Live Session with Sylvain
HuggingFace
39 Chapter 2 Live Session with Lewis
Chapter 2 Live Session with Lewis
HuggingFace
40 The push to hub API
The push to hub API
HuggingFace
41 Chapter 2 Live Session with Sylvain
Chapter 2 Live Session with Sylvain
HuggingFace
42 Chapter 3 live sessions with Lewis (PyTorch)
Chapter 3 live sessions with Lewis (PyTorch)
HuggingFace
43 Day 1 Talks: JAX, Flax & Transformers 🤗
Day 1 Talks: JAX, Flax & Transformers 🤗
HuggingFace
44 Day 2 Talks: JAX, Flax & Transformers 🤗
Day 2 Talks: JAX, Flax & Transformers 🤗
HuggingFace
45 Day 3 Talks JAX, Flax, Transformers 🤗
Day 3 Talks JAX, Flax, Transformers 🤗
HuggingFace
46 Chapter 4 live sessions with Omar
Chapter 4 live sessions with Omar
HuggingFace
47 Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker
Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker
HuggingFace
48 Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker
Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker
HuggingFace
49 Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker
Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker
HuggingFace
50 [Webinar] How to add machine learning capabilities with just a few lines of code
[Webinar] How to add machine learning capabilities with just a few lines of code
HuggingFace
51 Hugging Face + Zapier Demo Video
Hugging Face + Zapier Demo Video
HuggingFace
52 Hugging Face + Google Sheets Demo
Hugging Face + Google Sheets Demo
HuggingFace
53 Hugging Face Infinity Launch - 09/28
Hugging Face Infinity Launch - 09/28
HuggingFace
54 Build and Deploy a Machine Learning App in 2 Minutes
Build and Deploy a Machine Learning App in 2 Minutes
HuggingFace
55 Hugging Face Infinity - GPU Walkthrough
Hugging Face Infinity - GPU Walkthrough
HuggingFace
56 Otto - 🤗 Infinity Case Study
Otto - 🤗 Infinity Case Study
HuggingFace
57 Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it
Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it
HuggingFace
58 Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models
Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models
HuggingFace
59 🤗 Tasks: Causal Language Modeling
🤗 Tasks: Causal Language Modeling
HuggingFace
60 🤗 Tasks: Masked Language Modeling
🤗 Tasks: Masked Language Modeling
HuggingFace

This video teaches how to schedule the learning rate using TensorFlow and Keras to improve model training consistency. By adjusting the learning rate, you can achieve better performance and more consistent results.

Key Takeaways
  1. Import necessary libraries and load the model
  2. Compute the number of mini-batches for training
  3. Define the polynomial decay schedule
  4. Initialize the Adam optimizer with the learning rate schedule
  5. Compile the model with the new optimizer
  6. Fit the model using the new optimizer
💡 Using a polynomial decay schedule for the learning rate can significantly improve model training consistency and performance.

Related Reads

📰
Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
📰
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
📰
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
📰
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →