11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

Weights & Biases · Beginner ·🧬 Deep Learning ·7y ago

Key Takeaways

The video demonstrates transfer learning for domain-specific image classification using ResNet 50 and TensorFlow, allowing for accurate models with small datasets by leveraging pre-trained weights from ImageNet. It covers the process of importing pre-trained weights, loading and preprocessing data, and fine-tuning a subset of ResNet layers for improved accuracy.

Full Transcript

hello everybody my name is Chris and today we're gonna talk about transfer learning I think transfer learning is really cool because it lets you take a small data set and actually create a really accurate model we're gonna do this by leveraging very large networks that were trained for many hours or even days on much larger data sets than ours and actually transfer that knowledge into our own network made specifically for our classification problem so today we're going to be working with the Freiburg Grocery data set which is a small data set about four thousand images of various grocery products and we want to train our classifier to tell us what type of grocery products those are so let's take a look at the data so first we're going to import our carrots model layers so we can we can build a carrots model and we're also importing ResNet 50 which is a image classification network actually out of Microsoft Research that's very large and we're going to be able to transfer knowledge from it into our own network so first we need to load our training data so here we're splitting our training data into a train and a test as well as extracting the class names from this utility library called groceries and let's take a look at one of those images actually looks like there you go jar of pickles hopefully we can train our machine to tell us that so let's look at what the other classes in the dataset look like we've got beans cake pasta and my favorite vinegar all right let's see how the data is actually distributed so as you can see some of the classes don't have nearly as many examples as others hopefully transfer learning can help to compensate for this so before we can train our model we need to convert our categories which are going to be numbers between 0 and 25 into 1 hot encoded vectors so we're calling two categorical on our labels and now just to see how we can perform on this data set with a very simple perceptron let's go ahead normalize our data and then just create a single layer perceptron model we're going to use categorical cross-entropy for our loss because this is a multi-class classification problem our good old friend the atom optimizer and we also want to view accuracy so we can have a better metric to comprehend what's going on lastly we're calling W in it so we can visualize our metrics and let's go ahead and train this model okay so looks like we aren't doing so well our validation accuracy is point zero four percent this is this is very troubling our extra our accuracy on the training data is even lower I mean I'm I look at this and I I feel ill there has to be a better way so Kerris makes it really easy to leverage the research community's progress in computer vision models so here we're going to import resident 50 and actually download the pre trained weights from training on image net which is an image data set with millions of images that takes many days to train so with this one line we're pulling in cutting-edge computer vision research let's go ahead and take a look at a model summary to see what this network looks like oh man so many layers resident 50 is much more complicated than our simple perceptron you can see things like batch normalization many different convolutions and then even this funny add layer so what resonate does is it actually branches off and takes features from earlier in the network and adds them back in in later layers and this helps the network train better and allows researchers to make an even deeper network which gives it more expressive ility and accuracy I can just keep on scrolling so to see what this network can actually do let's run it on a picture of an elephant because why not so here we're loading in our elephant we're changing its size to 224 pixels by 224 pixels because the network expects that size then we're expanding the dimensions because we need to include our batch dimension and we call this really important function pre process input so when they train ResNet the researchers used a very specific way pre processing the images and we're going to use their exact same logic to do that on our own data so that we can have high accuracy results coming out of the model lastly we just call predict and we're using this nice helper method decode predictions which are going to change the the various indices into the last layer and tell us exactly what category that it's predicting look at that the network output Tusker with 49 percent accuracy an indian elephant with thirty-four percent accuracy and there's a slight chance we're looking at an African elephant now I personally probably wouldn't be able to tell you the difference between these three kinds of elephants but a network this powerful is actually able to do it with a high degree of accuracy but we don't want this network to tell us the categories that it a trained on we want it to tell us our categories for our grocery data set so let's look at a way that we can actually do that first let's take our grocery data set and pre-process it exactly the same way that the resonant authors did now we can actually go into the resident model and pull out specific layers that we want to use in this case we're going to pull out the second-to-last lair which is called the average pool lair and now we can create a new model with the same input to our resident model but now instead about putting a thousand categories we're gonna output this last layer as our final category so let's take a look and see what this model actually looks like still a massive model but now instead of a thousand categories at the bottom we have a 2048 length vector which are going to contain what we hope to be the most important features from our data set so now we can actually take our pre-processed grocery data set and run it through this new model that we've created and actually extract the features so now we're going to transform our images into 2048 length vectors of numbers that we can use to train a new model on and we hope that resident has created features that are going to be much easier to learn from than our original image data we're going to do the same for our test data and then finally we can create a new model which is a simple perceptron again with 25 categories for our data set using the same loss and optimizer as we did earlier let's go ahead and fit it and see if we can get better accuracy than our first try look at that right off the bat we're getting into 80% validation accuracy you might also notice that we have a bit of an over fitting problem but there are actually additional techniques we can use to ensure that the network generalized as well across our data set and we can fix this issue so instead of just extracting the features which is great because it actually makes our model train really fast a disadvantage is now if we actually deploy this model we're gonna have to deploy two models side by side and always put our input imagery through all of Brett's net and then separately pass that output into the next model caris makes it really easy for us to make a single model where the output of the model can just go directly into our perceptron so we do this here but creating a new model we add our resonant layers and our new final dense layer then we turn all of our layers to be trainable equals false in the ResNet network so when we're training this network we don't want any of the layers to train in ResNet instead we're just going to tune the weights in our final dense layer so now you can see there are 23 million parameters in this network but only 51,000 of them are trainable now if we run training you'll see that it actually takes a lot longer to train this is because every batch we're passing that data all the way through the rest net network and doing all of those convolutions and different arithmetic so it's taking much longer as opposed to using the cached output features that we had used before but the advantage of this is now you have a single model that you can use to continue to retrain as your data set maybe grows or you change different labels in your data set as well as it's much easier to deploy your models but you see we're getting essentially the same accuracy as we were getting by just extracting the features and then training this less layer alright so there's one more technique we can use with transfer learning that will actually give us even more accuracy this is known as fine-tuning so instead of just training our our layers that we added at the end of the network we can actually take a subset of the layers in the resonant network and allow them to Train as well so the reason behind this is the way these networks tend to learn is that the layers much higher up tend to extract much more higher level features things that would be shared common amongst all the classes in your dataset whereas the layers lower in the network tend to be much more specific and are looking at shapes different edges that are going to be very specific to your classes or in this case the classes that resident was trained on so we can actually take these final layers and fine-tune them enable them to change their weights so that they are better suited for our classes while still enabling the very generic layers at the top of the network to pass down the most meaningful information for our new classifier so to do this we actually set the resonate to be trainable now and we go into the network and actually in this case just say the final 11 layers out of the hundred or so that are in retina we're going to allow to Train whereas the the first layers are all not going to be trainable so one thing to note when you're fine-tuning is that because the weights have been trained on a very large data set and are going to be very specific to the the ResNet data set when we start to fine-tune those weights and move them we're likely going to want to do that much more slowly than we would in a normal network so this is a case where instead of just setting optimizer equal to Adam you would want to actually instantiate a new instance of the optimizer and slow down the learning rate so we really want to move those weights in the last layers a little by little and this can really prevent overfitting on our data set which can be easy given it's it's so small and we have so many parameters you and look at that with only a few lines we were able to leverage cutting-edge models to actually get 72 percent accuracy on our Freiburg grocery data set remember we started at less than 1% accuracy I'd say that's a pretty good day at the office you

Original Description

Transfer learning lets you take a small dataset and produce an accurate model. This method uses large networks that were trained for a long time on huge datasets, transferring that knowledge into our own network. Follow along with Lukas using the Python scripts here: https://github.com/lukas/ml-class/tree/master/videos/transfer-learning This is part of a long, free series of tutorials teaching engineers to do deep learning. Leave questions below, and check out more of our class videos: Class Videos: http://wandb.com/classes Weights & Biases: http://wandb.com
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 18 of 60

1 0. What is machine learning?
0. What is machine learning?
Weights & Biases
2 1. Build Your First Machine Learning Model
1. Build Your First Machine Learning Model
Weights & Biases
3 Intro to ML: Course Overview
Intro to ML: Course Overview
Weights & Biases
4 2. Multi-Layer Perceptrons
2. Multi-Layer Perceptrons
Weights & Biases
5 3. Convolutional Neural Networks
3. Convolutional Neural Networks
Weights & Biases
6 Weights & Biases at OpenAI
Weights & Biases at OpenAI
Weights & Biases
7 Why Experiment Tracking is Crucial to OpenAI
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
8 4. Autoencoders
4. Autoencoders
Weights & Biases
9 5. Sentiment Analysis
5. Sentiment Analysis
Weights & Biases
10 6. Recurrent Neural Networks [RNNs]
6. Recurrent Neural Networks [RNNs]
Weights & Biases
11 7. Text Generation using LSTMs and GRUs
7. Text Generation using LSTMs and GRUs
Weights & Biases
12 8. Text Classification Using Convolutional Neural Networks
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
13 9. Hybrid LSTMs [Long Short-Term Memory]
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
14 Toyota Research Institute on Experiment Tracking with Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
15 Weights and Biases - Developer Tools for Deep Learning
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
16 Introducing Weights & Biases
Introducing Weights & Biases
Weights & Biases
17 10. Seq2Seq Models
10. Seq2Seq Models
Weights & Biases
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
19 12. One-shot learning for teaching neural networks to classify objects never seen before
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
20 13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
21 14. Data Augmentation | Keras
14. Data Augmentation | Keras
Weights & Biases
22 15. Batch Size and Learning Rate in CNNs
15. Batch Size and Learning Rate in CNNs
Weights & Biases
23 Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
24 Grading Rubric for AI Applications with Sergey Karayev  (2019)
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
25 16. Video Frame Prediction using CNNs and LSTMs (2019)
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
26 Image to LaTeX - Applied Deep Learning Fellowship (2019)
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
27 17.  Build and Deploy an Emotion Classifier (2019)
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
28 Applied Deep Learning - Data Management with Josh Tobin (2019)
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
29 Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
30 Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
31 Troubleshooting and Iterating ML Models with Lee Redden (2019)
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
32 Designing a Machine Learning Project with Neal Khosla (2019)
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
33 Lukas Beiwald on ML Tools and Experiment Management (2019)
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
34 Building Machine Learning Teams with Josh Tobin (2019)
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
35 Pieter Abeel on Potential Deep Learning Research Directions  (2019)
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
36 Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
37 Five Lessons for Team-Oriented Research with Peter Welder (2019)
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
38 Applied Deep Learning - Rosanne Liu on AI Research (2019)
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
39 Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
40 Organizing ML projects — W&B walkthrough (2020)
Organizing ML projects — W&B walkthrough (2020)
Weights & Biases
41 Brandon Rohrer — Machine Learning in Production for Robots
Brandon Rohrer — Machine Learning in Production for Robots
Weights & Biases
42 Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Weights & Biases
43 My experiments with Reinforcement Learning with Jariullah Safi
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
44 Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
45 Testing Machine Learning Models with Eric Schles
Testing Machine Learning Models with Eric Schles
Weights & Biases
46 How Linear Algebra is not like Algebra with Charles Frye
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
47 Predicting Protein Structures using Deep Learning with Jonathan King
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
48 Rachael Tatman — Conversational AI and Linguistics
Rachael Tatman — Conversational AI and Linguistics
Weights & Biases
49 Reformer by Han Lee
Reformer by Han Lee
Weights & Biases
50 Sequence Models with Pujaa Rajan
Sequence Models with Pujaa Rajan
Weights & Biases
51 GitHub Actions & Machine Learning Workflows with Hamel Husain
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
52 Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Weights & Biases
53 Jack Clark — Building Trustworthy AI Systems
Jack Clark — Building Trustworthy AI Systems
Weights & Biases
54 Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
55 Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
56 Antipatterns in open source research code with Jariullah Safi
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
57 Attention for time series forecasting & COVID predictions - Isaac Godfried
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
58 Made with ML - Goku Mohandas
Made with ML - Goku Mohandas
Weights & Biases
59 Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Weights & Biases
60 Deep Learning Salon by Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases

This video teaches how to apply transfer learning to domain-specific image classification tasks using ResNet 50 and TensorFlow, allowing for accurate models with small datasets. It covers the process of importing pre-trained weights, loading and preprocessing data, and fine-tuning a subset of ResNet layers for improved accuracy. By following along, viewers can learn how to leverage pre-trained models for their own image classification tasks.

Key Takeaways
  1. Import ResNet 50 and pre-trained weights from ImageNet
  2. Load and split training data into train and test sets
  3. Pre-process images using ResNet's logic
  4. Extract features from ResNet's output using an average pooling layer
  5. Create a new model with ResNet's output as input and a final dense layer for classification
  6. Fine-tune a subset of ResNet layers for improved accuracy
💡 Transfer learning allows for accurate models with small datasets by leveraging pre-trained weights from large networks like ImageNet

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →