12. One-shot learning for teaching neural networks to classify objects never seen before

Weights & Biases · Beginner ·🧬 Deep Learning ·7y ago

Key Takeaways

This video demonstrates one-shot learning using Keras/TensorFlow, where a neural network is trained to classify objects it has never seen before by reframing the problem as pairs of objects to classify. The video uses the M&S dataset and achieves a significant improvement in accuracy using a Siamese network with a custom Euclidean distance function.

Full Transcript

all the examples that we think about when we talk about you know machine learning from like you know like hot dog not hot dog to like you know predicting the stock market - you know like understanding speech all these things that you really think about classifying things you've already seen right when we look at the M&S data set where we're talking about labeling numbers 0 through 9 but we're not talking about recognizing other digits that we've maybe never seen before right in the real world it's actually common that you want to classify something where you've literally never seen it and humans can do this really well right when I see a spatula for the first time I recognize it as a new object that it that that maybe I don't know the name for it but I recognize it that it's a thing and I can recognize if I see it again oh and that's that thing that I saw before so how do we make computers do this kind of classification and this is an example that's called sometimes one shot or zero shot or sometimes a few shot if you have a couple examples and it's really been a challenge for machine learning in general to make this work and one of the approaches that I think is really exciting that I want to talk about today there's a kind of reframing of the problem right so instead of looking at one object and saying what is this object and training on that what I want to do is train on pairs of objects where the question is are these two objects the same and now it's so cool about that is that if instead of building a classifier of one thing at a time I believe I of pairs of things I can actually look at something new and look at one example maybe a canonical example of that thing and say is it that thing right so my classifier can potentially generalize not just to classify the things that it's seen in the training data but maybe to classify anything that it might see out there and this technique is really generalizable I'm gonna do it on images but the same approach can be used on video it can be used on audio it can be using tons of tons of different examples so let's get to it all right so let's walk through an example of how we're gonna do this and I'm gonna do this on the emne Stata set at first just because it's a data set you're probably familiar with from previous videos and it's really fast to run experiments on so first you know we have the requisite lots of imports and then we're going to load the data and we're gonna load it exactly the same as we've done in previous videos where we load the data into the Train ex train as the images why Train is the labels on the train data X test is the imagism test data and then Y train is the labels for the test data and then we're going to normalize just like we've done in a lot of other videos where just divide the values by 255 so that our pixels are between 0 & 1 instead of being between 0 and 255 but now we're gonna do something new we're gonna call this function that I wrote called make pairs and what make pairs does is it takes in input data and labels and it makes a new kind of data set where the data set is actually pairs of images and here the label is actually one if the two images correspond to the same category of thing and 0 if they correspond to different categories of thing so I just wrote a little bit of code here that actually just randomly walks through digits and then picks other digits that match and then adds a label of is the same thing and then it finds two that don't match and adds a label of not the same thing so what comes out of this is a data set where half the images are same things and half the images are different things and so this function at the bottom creates a new variable called Paris train which is going to be the pairs of images and labels train which is going to be either a zero for not the same thing or one for the same thing so we can run this here we got to load the data first then we can run this guy and you know as usual I always recommend taking a little peek at the data so why don't we look at Paris train here for comma zero and it turns out that is a number four that's just an accident the fourth data set is a four and then if we look at Paris train four comma one that's actually a different looking for so we would expect then labels train four to be a one meaning that they're the same image so why don't we just print that out labels train four and yep they're the same image we could look at maybe the 400th example of Paris train so here that's a one and then if we look at the adjacent one it's a another one a little bit different writing but the same the same thing so we've transformed our data and now what are we gonna do with it right so you know naively one thing we could do is actually just pass in each image into a separate dense network and then concatenate those and have a final dense layer to predict same image or a different image so that's what we're gonna do here so here our first sequential model is just a flattened and then a dense layer so this is just the perceptron that you might be used to but we're gonna use a reloj activation function because it's kind of an intermediate piece and then we're actually gonna have the exact same layer but a different set of weights and now here's a new layer you might not have seen before but super useful it's called concatenate so what that does is actually just takes two layers the outputs of two layers and puts them together into a single set of activations so no parameters it just combines the two and then the final layer I'm calling dense layer takes as inputs the things from the merge layer and then outputs a single number and hopefully that's going to be a one if the images are the same and is zero if the images are not the same so use a sigmoid activation function because it's kind of a binary classification and then we use the Charis functional definition to define this because actually not a sequential model right because we have two inputs and then we're combining them it's not just a simple sequential model we might be used to and so we use a more complicated way of defining it then we compile the model we use binary cross-entropy because we're doing a single binary classification and these are our standard atom optimizer and we're gonna output the accuracy so let's take a quick look at what this model looks like before we run it here you can see here that we have 100,000 parameters in our dense layer that corresponds to image 1 and 100,000 parameters that corresponds to our image 2 and then each of those fully connected layers they output 128 numbers we combine those into 256 numbers and then we have a single perceptron with 256 inputs and one single output at the bottom of our network so in total it's about 200 thousand parameters and we can call fit here and now again we'll call fit on actually pairs train 0 right so that's one of the input images pairs train 1 which is the other set of input images and then labels train which is again the binary number 0 if the images don't match and 1 if they do match so let's set that to say 10 epochs and let our model train so this architecture does work barely so you can see that in every step it actually is improving the accuracy but by about like 0.5% and it's starting at a 50 percent accuracy so better than random which is better than a lot of the networks that I've made in my life and we're kind of onto something good but it seems pretty clear that we're gonna need to make this work better so we've done so far it doesn't work super well it's unclear actually how well it'll ever work it does work better than random but it's not working super well it's not typically what people do when they encounter the situation where they want to do one-shot learning what they really do is they share weights across the model right so sharing weights across layers is actually pretty common in more advanced architectures but we haven't done it yet so it's a good thing to know and it's actually really effective in this case it's one of the things we have to do to make this thing really work well and the intuition is that the model that we're running on the first input image and the model it running on the second input image really it seems like they should be the same model right because the images are drawn from the same set of overall images and so the transform that you want to do on one image seems like it shouldn't really be the transform that we do on the other image in order to do this in order to share weights across the model we have to actually use more of Harrises functional model definition and I think this gets a little confusing because when we define a layer in the in the functional definition we actually it sort of just sets up the specification for the layer and it doesn't actually really attach it to some input until we call a function on that layer once specified so then we set up a model right and so we actually say that the input is going to be this input and actually what the model does is going to be this flat and step in then the stent step but now we haven't actually attached this model to any input so what we're gonna do we're actually gonna attach it to two different inputs we're gonna attach it to input 1 and input 2 and so I call the model that's attached to input 1 dense 1 and the model attached to input 2 dense 2 so we have two separate models but they're attached two different inputs so we can actually take those and we can use that same concatenate layer that we used before to combine them and then we can add that same dense layer that we had before and that's gonna output a single number and we're going to use a sigmoid activation layer and that number is obviously going to be one if we think that these two images are corresponding the same number and zero if they're like two different numbers so we can pile the model in the same way we did before and then we can take a quick look at it and we can see that actually this model should have about half the number of parameters of the previous model because we're sharing those parameters right so whereas before we had kind of two layers each with a hundred thousand parameters now we only have one set of a hundred thousand parameters but it's it's actually two different layers are getting called but each of those shared parameters so we can run this model too and spoiler alert it works a little bit better than the last thing we did but not a lot better because there's actually one more fancy optimization that we need to add in and then we'll have kind of the typical setup of what's called the siamese Network which is actually an old concept you came you know it was talked about in the 90s but I feel like it's had renewed interest in various forms as people have gotten more and more excited about deep learning and it's kind of one shot learning problem specifically so you knew from Charis import back-end as Kay and this kind of harkens back to the time when Karis had typically multiple backends these days really it's almost always tensorflow so I just sort of look at any tensorflow operation is something that I can run here and now I define a function where it takes in inputs which actually going to be tensor flow tensors and then I can call K dot and then any tensor flow of operation that I can find so here I'm using some and square and square root and maximum really all this is doing is it's basically looking at the sum of the squares of the differences between the two inputs right so it's kind of a simple they call Euclidean distance it's really just how different are the outputs of my two different networks that's what we're gonna do with it we're gonna feed in the outputs of each network and then we're going to compare them using the Euclidean distance so we define this nice little Euclidean distance function and then we add a new layer it's called a lambda layer which implies kind of a lambda function and we actually pass in our Euclidean distance function and so now we're actually building our own custom tensor flow operations as a layer and what this does is it basically lets the network instead of trying to figure out what it should do with the outputs of these two networks that we've defined it just knows that really what I want is the outputs of these two networks to be similar right so the more similar the outputs of my two networks are the more likely the model thinks that the two inputs are the same of course buying the same number so let's run this this network and we can compile it and look at it I'm just like we did before and we see that it's very similar to the previous model but we don't have that last big dense layer to figure things out and then when you run this network we actually see a market improvement right in the first epoch we're already seeing accuracy above 70% so by taking out some of the complexity and again just pushing the complexity into the code we've actually made a much more effective siamese network so you know the real reason to do this is not the amnesty to set right I mean it seems unlikely that you'd want to generalize to some other digit that we haven't seen before but there actually are lots of cases where you'd want to do it in one case is in handwriting where you might see characters that you haven't seen before and a super cool data set to do this on that's that's really fun and a lot like chemists is the Omniglot data set so I'm actually left in a little bit of code to load in the AMA neglect data set which actually loads in lots and lots of different characters from lots and lots of different languages and so I think a fun next step to do would be to run this exact same architecture on the Omniglot data set and see if you can recognize characters and actually see if you can build a system that can recognize characters in one alphabet and generalize to other alphabets because that is really magical and powerful and really shows off why one-shot learning can be really effective and especially why Siamese networks work really well for this application you

Original Description

Using Keras/TensorFlow for one-shot learning, we’ll classify objects we’ve never seen before. We’ll train on pairs of objects and identify “are these two things the same?” This is a highly generalizable technique that unlocks cool new applications of deep learning. Follow along with Lukas using the Python scripts here: https://github.com/lukas/ml-class/tree/master/videos/one-shot This is part of a long, free series of tutorials teaching engineers to do deep learning. Leave questions below, and check out more of our class videos: Class Videos: https://wandb.ai/site/tutorials Weights & Biases: https://wandb.ai/site
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 19 of 60

1 0. What is machine learning?
0. What is machine learning?
Weights & Biases
2 1. Build Your First Machine Learning Model
1. Build Your First Machine Learning Model
Weights & Biases
3 Intro to ML: Course Overview
Intro to ML: Course Overview
Weights & Biases
4 2. Multi-Layer Perceptrons
2. Multi-Layer Perceptrons
Weights & Biases
5 3. Convolutional Neural Networks
3. Convolutional Neural Networks
Weights & Biases
6 Weights & Biases at OpenAI
Weights & Biases at OpenAI
Weights & Biases
7 Why Experiment Tracking is Crucial to OpenAI
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
8 4. Autoencoders
4. Autoencoders
Weights & Biases
9 5. Sentiment Analysis
5. Sentiment Analysis
Weights & Biases
10 6. Recurrent Neural Networks [RNNs]
6. Recurrent Neural Networks [RNNs]
Weights & Biases
11 7. Text Generation using LSTMs and GRUs
7. Text Generation using LSTMs and GRUs
Weights & Biases
12 8. Text Classification Using Convolutional Neural Networks
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
13 9. Hybrid LSTMs [Long Short-Term Memory]
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
14 Toyota Research Institute on Experiment Tracking with Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
15 Weights and Biases - Developer Tools for Deep Learning
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
16 Introducing Weights & Biases
Introducing Weights & Biases
Weights & Biases
17 10. Seq2Seq Models
10. Seq2Seq Models
Weights & Biases
18 11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
12. One-shot learning for teaching neural networks to classify objects never seen before
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
20 13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
21 14. Data Augmentation | Keras
14. Data Augmentation | Keras
Weights & Biases
22 15. Batch Size and Learning Rate in CNNs
15. Batch Size and Learning Rate in CNNs
Weights & Biases
23 Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
24 Grading Rubric for AI Applications with Sergey Karayev  (2019)
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
25 16. Video Frame Prediction using CNNs and LSTMs (2019)
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
26 Image to LaTeX - Applied Deep Learning Fellowship (2019)
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
27 17.  Build and Deploy an Emotion Classifier (2019)
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
28 Applied Deep Learning - Data Management with Josh Tobin (2019)
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
29 Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
30 Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
31 Troubleshooting and Iterating ML Models with Lee Redden (2019)
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
32 Designing a Machine Learning Project with Neal Khosla (2019)
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
33 Lukas Beiwald on ML Tools and Experiment Management (2019)
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
34 Building Machine Learning Teams with Josh Tobin (2019)
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
35 Pieter Abeel on Potential Deep Learning Research Directions  (2019)
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
36 Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
37 Five Lessons for Team-Oriented Research with Peter Welder (2019)
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
38 Applied Deep Learning - Rosanne Liu on AI Research (2019)
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
39 Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
40 Organizing ML projects — W&B walkthrough (2020)
Organizing ML projects — W&B walkthrough (2020)
Weights & Biases
41 Brandon Rohrer — Machine Learning in Production for Robots
Brandon Rohrer — Machine Learning in Production for Robots
Weights & Biases
42 Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Weights & Biases
43 My experiments with Reinforcement Learning with Jariullah Safi
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
44 Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
45 Testing Machine Learning Models with Eric Schles
Testing Machine Learning Models with Eric Schles
Weights & Biases
46 How Linear Algebra is not like Algebra with Charles Frye
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
47 Predicting Protein Structures using Deep Learning with Jonathan King
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
48 Rachael Tatman — Conversational AI and Linguistics
Rachael Tatman — Conversational AI and Linguistics
Weights & Biases
49 Reformer by Han Lee
Reformer by Han Lee
Weights & Biases
50 Sequence Models with Pujaa Rajan
Sequence Models with Pujaa Rajan
Weights & Biases
51 GitHub Actions & Machine Learning Workflows with Hamel Husain
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
52 Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Weights & Biases
53 Jack Clark — Building Trustworthy AI Systems
Jack Clark — Building Trustworthy AI Systems
Weights & Biases
54 Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
55 Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
56 Antipatterns in open source research code with Jariullah Safi
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
57 Attention for time series forecasting & COVID predictions - Isaac Godfried
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
58 Made with ML - Goku Mohandas
Made with ML - Goku Mohandas
Weights & Biases
59 Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Weights & Biases
60 Deep Learning Salon by Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases

This video teaches one-shot learning using Keras/TensorFlow, where a neural network is trained to classify objects it has never seen before. The video demonstrates how to reframe the problem as pairs of objects to classify and achieve a significant improvement in accuracy using a Siamese network with a custom Euclidean distance function.

Key Takeaways
  1. Load data into Train and Test sets
  2. Normalize pixel values between 0 and 1
  3. Create pairs of images and labels for training
  4. Define a custom Euclidean distance function as a TensorFlow operation
  5. Build a Siamese network with a lambda layer that uses the Euclidean distance function
  6. Run the network and compile it
  7. Load the Omniglot data set to recognize characters in one alphabet and generalize to other alphabets
💡 One-shot learning can be achieved using a Siamese network with a custom Euclidean distance function, which can significantly improve accuracy in classifying objects that have never been seen before.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →