Learning rate scheduling with TensorFlow

HuggingFace · Beginner ·🧬 Deep Learning ·5y ago

Skills: LLM Engineering80%Fine-tuning LLMs70%

Key Takeaways

This video demonstrates how to schedule the learning rate using TensorFlow and Keras, improving the consistency of model training.

Full Transcript

in our other videos we talked about the basics of fine-tuning a language model with tensorflow and as always when i refer to videos i'll link them below but still can we do better so here's the code from our model fine tuning video and while while it works we could definitely tweak a couple of things by far the most important thing is the learning rate in this video we'll talk about how to change it which will make your training much more consistently successful in fact there are two things we want to change about the default learning rate for adam the first is that it's way too high for our models so by default adam uses a learning rate of 10 to the minus 3 which is very high for training transformers we're going to start at 5 by 10 to the minus 5 which is 20 times lower than the default and secondly we don't just want a constant learning rate we can get even better performance if we decay the learning rate down to a tiny value or even to zero over the course of training so that's what this polynomial decay schedule thing is doing that name might be intimidating especially if you only vaguely remember what a polynomial is for maths class so i'll show you what that decay looks like in a second but first we need to tell the scheduler how long training is going to be so that it decays at the right speed and that's what this code here is doing so we're computing how many mini batches the model is going to see over the entire training run and to compute that we're taking the size of the training set dividing it by the batch size which gives us the number of batches per epoch and then we're multiplying that by the number of epochs to get the total number of batches it's going to see over the whole training run so once we know how many batches how many training steps we're taking we just pass all of that information to the scheduler and we're ready to go so what does the polynomial decay schedule look like with default options it's actually just a linear schedule so it looks like this it starts at our initial value which is 5 by 10 to the minus 5 or five e minus five and then it decays down at a constant rate until it hits zero right at the very end of training so why do they call it polynomial and not linear well if you tweak the options you can get a higher order a truly polynomial decay schedule but there's no need to do that right now by default you get a linear schedule and if you were aware that a linear function is a special case of a polynomial you can feel proud so that aside how do we actually use the scheduler so this is easily we just pass it to adam you'll notice the first time when we compiled the model we just passed the string atom keras recognizes the names of common optimizers and loss functions if you pass them as strings so it saves time and it avoids imports to do it that way if you only want the default settings but we're professional machine learners now with our very own learning rate schedule so we have to do things properly so the first thing we do is we import the optimizer then we initialize it with our scheduler in the learning rate argument and then we compile the model using our new optimizer and whatever loss function you want we'll leave that unchanged this will be sparse categorical cross-entropy if you're following along from the fine-tuning video but it can be anything else that you're using yourself so now we have a high performance model ready to go all that remains is to fit the model just like we did before and remember because we've compiled the model with the new optimizer and the new learning rate we actually don't need to change anything about the fit.call at all we just call fit here exactly the same command we used before or if you've seen in other videos but now we get a beautiful training with a nice smooth a good initial learning rate and a solid learning rate decay and you will get much better performance as a result

Original Description

This is the olversion of the Learning rate Scheduling with TensorFlow, you should watch https://youtu.be/cpzq6ESSM5c instead. How to schedule the learning rate using TensorFlow and Keras. This video is part of the Hugging Face course: http://huggingface.co/course Open in colab to run the code samples: https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/videos/tf_lr_scheduling.ipynb Related videos: - Introduction to Keras: https://youtu.be/rnTGBy2ax1c - Fine-tuning with TensorFlow: https://youtu.be/alq1l8Lv9GA - Prediction and metrics: https://youtu.be/nx10eh4CoOs Have a question? Checkout the forums: https://discuss.huggingface.co/c/course/20 Subscribe to our newsletter: https://huggingface.curated.co/

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from HuggingFace · HuggingFace · 19 of 60

← Previous Next →

The Future of Natural Language Processing

The Future of Natural Language Processing

Trends in Model Size & Computational Efficiency in NLP

Trends in Model Size & Computational Efficiency in NLP

Increasing Data Usage in Natural Language Processing

Increasing Data Usage in Natural Language Processing

In Domain & Out of Domain Generalization in the Future of NLP

In Domain & Out of Domain Generalization in the Future of NLP

The Limits of NLU & the Rise of NLG in the Future of NLP

The Limits of NLU & the Rise of NLG in the Future of NLP

The Lack of Robustness in the Future of NLP

The Lack of Robustness in the Future of NLP

Inductive Bias, Common Sense, Continual Learning in The Future of NLP

Inductive Bias, Common Sense, Continual Learning in The Future of NLP

Train a Hugging Face Transformers Model with Amazon SageMaker

Train a Hugging Face Transformers Model with Amazon SageMaker

What is Transfer Learning?

What is Transfer Learning?

The pipeline function

The pipeline function

Navigating the Model Hub

Navigating the Model Hub

Transformer models: Decoders

Transformer models: Decoders

The Transformer architecture

The Transformer architecture

Transformer models: Encoder-Decoders

Transformer models: Encoder-Decoders

Transformer models: Encoders

Transformer models: Encoders

Keras introduction

Keras introduction

The push to hub API

The push to hub API

Fine-tuning with TensorFlow

Fine-tuning with TensorFlow

Learning rate scheduling with TensorFlow

Learning rate scheduling with TensorFlow

TensorFlow Predictions and metrics

TensorFlow Predictions and metrics

Welcome to the Hugging Face course

Welcome to the Hugging Face course

The tokenization pipeline

The tokenization pipeline

Supercharge your PyTorch training loop with Accelerate

Supercharge your PyTorch training loop with Accelerate

The Trainer API

The Trainer API

Batching inputs together (PyTorch)

Batching inputs together (PyTorch)

Batching inputs together (TensorFlow)

Batching inputs together (TensorFlow)

Hugging Face Datasets overview (Pytorch)

Hugging Face Datasets overview (Pytorch)

Hugging Face Datasets overview (Tensorflow)

Hugging Face Datasets overview (Tensorflow)

What is dynamic padding?

What is dynamic padding?

What happens inside the pipeline function? (PyTorch)

What happens inside the pipeline function? (PyTorch)

What happens inside the pipeline function? (TensorFlow)

What happens inside the pipeline function? (TensorFlow)

Instantiate a Transformers model (PyTorch)

Instantiate a Transformers model (PyTorch)

Instantiate a Transformers model (TensorFlow)

Instantiate a Transformers model (TensorFlow)

Preprocessing sentence pairs (PyTorch)

Preprocessing sentence pairs (PyTorch)

Preprocessing sentence pairs (TensorFlow)

Preprocessing sentence pairs (TensorFlow)

Write your training loop in PyTorch

Write your training loop in PyTorch

Managing a repo on the Model Hub

Managing a repo on the Model Hub

Chapter 1 Live Session with Sylvain

Chapter 1 Live Session with Sylvain

Chapter 2 Live Session with Lewis

Chapter 2 Live Session with Lewis

The push to hub API

The push to hub API

Chapter 2 Live Session with Sylvain

Chapter 2 Live Session with Sylvain

Chapter 3 live sessions with Lewis (PyTorch)

Chapter 3 live sessions with Lewis (PyTorch)

Day 1 Talks: JAX, Flax & Transformers 🤗

Day 1 Talks: JAX, Flax & Transformers 🤗

Day 2 Talks: JAX, Flax & Transformers 🤗

Day 2 Talks: JAX, Flax & Transformers 🤗

Day 3 Talks JAX, Flax, Transformers 🤗

Day 3 Talks JAX, Flax, Transformers 🤗

Chapter 4 live sessions with Omar

Chapter 4 live sessions with Omar

Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker

Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker

Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker

Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker

Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker

Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker

[Webinar] How to add machine learning capabilities with just a few lines of code

[Webinar] How to add machine learning capabilities with just a few lines of code

Hugging Face + Zapier Demo Video

Hugging Face + Zapier Demo Video

Hugging Face + Google Sheets Demo

Hugging Face + Google Sheets Demo

Hugging Face Infinity Launch - 09/28

Hugging Face Infinity Launch - 09/28

Build and Deploy a Machine Learning App in 2 Minutes

Build and Deploy a Machine Learning App in 2 Minutes

Hugging Face Infinity - GPU Walkthrough

Hugging Face Infinity - GPU Walkthrough

Otto - 🤗 Infinity Case Study

Otto - 🤗 Infinity Case Study

Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it

Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it

Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models

Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models

🤗 Tasks: Causal Language Modeling

🤗 Tasks: Causal Language Modeling

🤗 Tasks: Masked Language Modeling

🤗 Tasks: Masked Language Modeling

This video teaches how to schedule the learning rate using TensorFlow and Keras to improve model training consistency. By adjusting the learning rate, you can achieve better performance and more consistent results.

Key Takeaways

Import necessary libraries and load the model
Compute the number of mini-batches for training
Define the polynomial decay schedule
Initialize the Adam optimizer with the learning rate schedule
Compile the model with the new optimizer
Fit the model using the new optimizer

💡 Using a polynomial decay schedule for the learning rate can significantly improve model training consistency and performance.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train