12. One-shot learning for teaching neural networks to classify objects never seen before

Weights & Biases · Beginner ·🧬 Deep Learning ·7y ago

Skills: ML Maths Basics80%Supervised Learning70%ML Pipelines70%Unsupervised Learning60%

Key Takeaways

This video demonstrates one-shot learning using Keras/TensorFlow, where a neural network is trained to classify objects it has never seen before by reframing the problem as pairs of objects to classify. The video uses the M&S dataset and achieves a significant improvement in accuracy using a Siamese network with a custom Euclidean distance function.

Full Transcript

all the examples that we think about when we talk about you know machine learning from like you know like hot dog not hot dog to like you know predicting the stock market - you know like understanding speech all these things that you really think about classifying things you've already seen right when we look at the M&S data set where we're talking about labeling numbers 0 through 9 but we're not talking about recognizing other digits that we've maybe never seen before right in the real world it's actually common that you want to classify something where you've literally never seen it and humans can do this really well right when I see a spatula for the first time I recognize it as a new object that it that that maybe I don't know the name for it but I recognize it that it's a thing and I can recognize if I see it again oh and that's that thing that I saw before so how do we make computers do this kind of classification and this is an example that's called sometimes one shot or zero shot or sometimes a few shot if you have a couple examples and it's really been a challenge for machine learning in general to make this work and one of the approaches that I think is really exciting that I want to talk about today there's a kind of reframing of the problem right so instead of looking at one object and saying what is this object and training on that what I want to do is train on pairs of objects where the question is are these two objects the same and now it's so cool about that is that if instead of building a classifier of one thing at a time I believe I of pairs of things I can actually look at something new and look at one example maybe a canonical example of that thing and say is it that thing right so my classifier can potentially generalize not just to classify the things that it's seen in the training data but maybe to classify anything that it might see out there and this technique is really generalizable I'm gonna do it on images but the same approach can be used on video it can be used on audio it can be using tons of tons of different examples so let's get to it all right so let's walk through an example of how we're gonna do this and I'm gonna do this on the emne Stata set at first just because it's a data set you're probably familiar with from previous videos and it's really fast to run experiments on so first you know we have the requisite lots of imports and then we're going to load the data and we're gonna load it exactly the same as we've done in previous videos where we load the data into the Train ex train as the images why Train is the labels on the train data X test is the imagism test data and then Y train is the labels for the test data and then we're going to normalize just like we've done in a lot of other videos where just divide the values by 255 so that our pixels are between 0 & 1 instead of being between 0 and 255 but now we're gonna do something new we're gonna call this function that I wrote called make pairs and what make pairs does is it takes in input data and labels and it makes a new kind of data set where the data set is actually pairs of images and here the label is actually one if the two images correspond to the same category of thing and 0 if they correspond to different categories of thing so I just wrote a little bit of code here that actually just randomly walks through digits and then picks other digits that match and then adds a label of is the same thing and then it finds two that don't match and adds a label of not the same thing so what comes out of this is a data set where half the images are same things and half the images are different things and so this function at the bottom creates a new variable called Paris train which is going to be the pairs of images and labels train which is going to be either a zero for not the same thing or one for the same thing so we can run this here we got to load the data first then we can run this guy and you know as usual I always recommend taking a little peek at the data so why don't we look at Paris train here for comma zero and it turns out that is a number four that's just an accident the fourth data set is a four and then if we look at Paris train four comma one that's actually a different looking for so we would expect then labels train four to be a one meaning that they're the same image so why don't we just print that out labels train four and yep they're the same image we could look at maybe the 400th example of Paris train so here that's a one and then if we look at the adjacent one it's a another one a little bit different writing but the same the same thing so we've transformed our data and now what are we gonna do with it right so you know naively one thing we could do is actually just pass in each image into a separate dense network and then concatenate those and have a final dense layer to predict same image or a different image so that's what we're gonna do here so here our first sequential model is just a flattened and then a dense layer so this is just the perceptron that you might be used to but we're gonna use a reloj activation function because it's kind of an intermediate piece and then we're actually gonna have the exact same layer but a different set of weights and now here's a new layer you might not have seen before but super useful it's called concatenate so what that does is actually just takes two layers the outputs of two layers and puts them together into a single set of activations so no parameters it just combines the two and then the final layer I'm calling dense layer takes as inputs the things from the merge layer and then outputs a single number and hopefully that's going to be a one if the images are the same and is zero if the images are not the same so use a sigmoid activation function because it's kind of a binary classification and then we use the Charis functional definition to define this because actually not a sequential model right because we have two inputs and then we're combining them it's not just a simple sequential model we might be used to and so we use a more complicated way of defining it then we compile the model we use binary cross-entropy because we're doing a single binary classification and these are our standard atom optimizer and we're gonna output the accuracy so let's take a quick look at what this model looks like before we run it here you can see here that we have 100,000 parameters in our dense layer that corresponds to image 1 and 100,000 parameters that corresponds to our image 2 and then each of those fully connected layers they output 128 numbers we combine those into 256 numbers and then we have a single perceptron with 256 inputs and one single output at the bottom of our network so in total it's about 200 thousand parameters and we can call fit here and now again we'll call fit on actually pairs train 0 right so that's one of the input images pairs train 1 which is the other set of input images and then labels train which is again the binary number 0 if the images don't match and 1 if they do match so let's set that to say 10 epochs and let our model train so this architecture does work barely so you can see that in every step it actually is improving the accuracy but by about like 0.5% and it's starting at a 50 percent accuracy so better than random which is better than a lot of the networks that I've made in my life and we're kind of onto something good but it seems pretty clear that we're gonna need to make this work better so we've done so far it doesn't work super well it's unclear actually how well it'll ever work it does work better than random but it's not working super well it's not typically what people do when they encounter the situation where they want to do one-shot learning what they really do is they share weights across the model right so sharing weights across layers is actually pretty common in more advanced architectures but we haven't done it yet so it's a good thing to know and it's actually really effective in this case it's one of the things we have to do to make this thing really work well and the intuition is that the model that we're running on the first input image and the model it running on the second input image really it seems like they should be the same model right because the images are drawn from the same set of overall images and so the transform that you want to do on one image seems like it shouldn't really be the transform that we do on the other image in order to do this in order to share weights across the model we have to actually use more of Harrises functional model definition and I think this gets a little confusing because when we define a layer in the in the functional definition we actually it sort of just sets up the specification for the layer and it doesn't actually really attach it to some input until we call a function on that layer once specified so then we set up a model right and so we actually say that the input is going to be this input and actually what the model does is going to be this flat and step in then the stent step but now we haven't actually attached this model to any input so what we're gonna do we're actually gonna attach it to two different inputs we're gonna attach it to input 1 and input 2 and so I call the model that's attached to input 1 dense 1 and the model attached to input 2 dense 2 so we have two separate models but they're attached two different inputs so we can actually take those and we can use that same concatenate layer that we used before to combine them and then we can add that same dense layer that we had before and that's gonna output a single number and we're going to use a sigmoid activation layer and that number is obviously going to be one if we think that these two images are corresponding the same number and zero if they're like two different numbers so we can pile the model in the same way we did before and then we can take a quick look at it and we can see that actually this model should have about half the number of parameters of the previous model because we're sharing those parameters right so whereas before we had kind of two layers each with a hundred thousand parameters now we only have one set of a hundred thousand parameters but it's it's actually two different layers are getting called but each of those shared parameters so we can run this model too and spoiler alert it works a little bit better than the last thing we did but not a lot better because there's actually one more fancy optimization that we need to add in and then we'll have kind of the typical setup of what's called the siamese Network which is actually an old concept you came you know it was talked about in the 90s but I feel like it's had renewed interest in various forms as people have gotten more and more excited about deep learning and it's kind of one shot learning problem specifically so you knew from Charis import back-end as Kay and this kind of harkens back to the time when Karis had typically multiple backends these days really it's almost always tensorflow so I just sort of look at any tensorflow operation is something that I can run here and now I define a function where it takes in inputs which actually going to be tensor flow tensors and then I can call K dot and then any tensor flow of operation that I can find so here I'm using some and square and square root and maximum really all this is doing is it's basically looking at the sum of the squares of the differences between the two inputs right so it's kind of a simple they call Euclidean distance it's really just how different are the outputs of my two different networks that's what we're gonna do with it we're gonna feed in the outputs of each network and then we're going to compare them using the Euclidean distance so we define this nice little Euclidean distance function and then we add a new layer it's called a lambda layer which implies kind of a lambda function and we actually pass in our Euclidean distance function and so now we're actually building our own custom tensor flow operations as a layer and what this does is it basically lets the network instead of trying to figure out what it should do with the outputs of these two networks that we've defined it just knows that really what I want is the outputs of these two networks to be similar right so the more similar the outputs of my two networks are the more likely the model thinks that the two inputs are the same of course buying the same number so let's run this this network and we can compile it and look at it I'm just like we did before and we see that it's very similar to the previous model but we don't have that last big dense layer to figure things out and then when you run this network we actually see a market improvement right in the first epoch we're already seeing accuracy above 70% so by taking out some of the complexity and again just pushing the complexity into the code we've actually made a much more effective siamese network so you know the real reason to do this is not the amnesty to set right I mean it seems unlikely that you'd want to generalize to some other digit that we haven't seen before but there actually are lots of cases where you'd want to do it in one case is in handwriting where you might see characters that you haven't seen before and a super cool data set to do this on that's that's really fun and a lot like chemists is the Omniglot data set so I'm actually left in a little bit of code to load in the AMA neglect data set which actually loads in lots and lots of different characters from lots and lots of different languages and so I think a fun next step to do would be to run this exact same architecture on the Omniglot data set and see if you can recognize characters and actually see if you can build a system that can recognize characters in one alphabet and generalize to other alphabets because that is really magical and powerful and really shows off why one-shot learning can be really effective and especially why Siamese networks work really well for this application you

Original Description

Using Keras/TensorFlow for one-shot learning, we’ll classify objects we’ve never seen before. We’ll train on pairs of objects and identify “are these two things the same?” This is a highly generalizable technique that unlocks cool new applications of deep learning. Follow along with Lukas using the Python scripts here: https://github.com/lukas/ml-class/tree/master/videos/one-shot This is part of a long, free series of tutorials teaching engineers to do deep learning. Leave questions below, and check out more of our class videos: Class Videos: https://wandb.ai/site/tutorials Weights & Biases: https://wandb.ai/site

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 19 of 60

← Previous Next →

0. What is machine learning?

0. What is machine learning?

Weights & Biases

1. Build Your First Machine Learning Model

1. Build Your First Machine Learning Model

Weights & Biases

Intro to ML: Course Overview

Intro to ML: Course Overview

Weights & Biases

2. Multi-Layer Perceptrons

2. Multi-Layer Perceptrons

Weights & Biases

3. Convolutional Neural Networks

3. Convolutional Neural Networks

Weights & Biases

Weights & Biases at OpenAI

Weights & Biases at OpenAI

Weights & Biases

Why Experiment Tracking is Crucial to OpenAI

Why Experiment Tracking is Crucial to OpenAI

Weights & Biases

4. Autoencoders

4. Autoencoders

Weights & Biases

5. Sentiment Analysis

5. Sentiment Analysis

Weights & Biases

6. Recurrent Neural Networks [RNNs]

6. Recurrent Neural Networks [RNNs]

Weights & Biases

7. Text Generation using LSTMs and GRUs

7. Text Generation using LSTMs and GRUs

Weights & Biases

8. Text Classification Using Convolutional Neural Networks

8. Text Classification Using Convolutional Neural Networks

Weights & Biases

9. Hybrid LSTMs [Long Short-Term Memory]

9. Hybrid LSTMs [Long Short-Term Memory]

Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Weights & Biases

Weights and Biases - Developer Tools for Deep Learning

Weights and Biases - Developer Tools for Deep Learning

Weights & Biases

Introducing Weights & Biases

Introducing Weights & Biases

Weights & Biases

10. Seq2Seq Models

10. Seq2Seq Models

Weights & Biases

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

Weights & Biases

12. One-shot learning for teaching neural networks to classify objects never seen before

12. One-shot learning for teaching neural networks to classify objects never seen before

Weights & Biases

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

Weights & Biases

14. Data Augmentation | Keras

14. Data Augmentation | Keras

Weights & Biases

15. Batch Size and Learning Rate in CNNs

15. Batch Size and Learning Rate in CNNs

Weights & Biases

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Weights & Biases

Grading Rubric for AI Applications with Sergey Karayev (2019)

Grading Rubric for AI Applications with Sergey Karayev (2019)

Weights & Biases

16. Video Frame Prediction using CNNs and LSTMs (2019)

16. Video Frame Prediction using CNNs and LSTMs (2019)

Weights & Biases

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Weights & Biases

17. Build and Deploy an Emotion Classifier (2019)

17. Build and Deploy an Emotion Classifier (2019)

Weights & Biases

Applied Deep Learning - Data Management with Josh Tobin (2019)

Applied Deep Learning - Data Management with Josh Tobin (2019)

Weights & Biases

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Weights & Biases

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Weights & Biases

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Weights & Biases

Designing a Machine Learning Project with Neal Khosla (2019)

Designing a Machine Learning Project with Neal Khosla (2019)

Weights & Biases

Lukas Beiwald on ML Tools and Experiment Management (2019)

Lukas Beiwald on ML Tools and Experiment Management (2019)

Weights & Biases

Building Machine Learning Teams with Josh Tobin (2019)

Building Machine Learning Teams with Josh Tobin (2019)

Weights & Biases

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Weights & Biases

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Weights & Biases

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Weights & Biases

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Weights & Biases

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Weights & Biases

Organizing ML projects — W&B walkthrough (2020)

Organizing ML projects — W&B walkthrough (2020)

Weights & Biases

Brandon Rohrer — Machine Learning in Production for Robots

Brandon Rohrer — Machine Learning in Production for Robots

Weights & Biases

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Weights & Biases

My experiments with Reinforcement Learning with Jariullah Safi

My experiments with Reinforcement Learning with Jariullah Safi

Weights & Biases

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Weights & Biases

Testing Machine Learning Models with Eric Schles

Testing Machine Learning Models with Eric Schles

Weights & Biases

How Linear Algebra is not like Algebra with Charles Frye

How Linear Algebra is not like Algebra with Charles Frye

Weights & Biases

Predicting Protein Structures using Deep Learning with Jonathan King

Predicting Protein Structures using Deep Learning with Jonathan King

Weights & Biases

Rachael Tatman — Conversational AI and Linguistics

Rachael Tatman — Conversational AI and Linguistics

Weights & Biases

Reformer by Han Lee

Reformer by Han Lee

Weights & Biases

Sequence Models with Pujaa Rajan

Sequence Models with Pujaa Rajan

Weights & Biases

GitHub Actions & Machine Learning Workflows with Hamel Husain

GitHub Actions & Machine Learning Workflows with Hamel Husain

Weights & Biases

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Weights & Biases

Jack Clark — Building Trustworthy AI Systems

Jack Clark — Building Trustworthy AI Systems

Weights & Biases

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Weights & Biases

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Weights & Biases

Antipatterns in open source research code with Jariullah Safi

Antipatterns in open source research code with Jariullah Safi

Weights & Biases

Attention for time series forecasting & COVID predictions - Isaac Godfried

Attention for time series forecasting & COVID predictions - Isaac Godfried

Weights & Biases

Made with ML - Goku Mohandas

Made with ML - Goku Mohandas

Weights & Biases

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Weights & Biases

Deep Learning Salon by Weights & Biases

Deep Learning Salon by Weights & Biases

Weights & Biases

This video teaches one-shot learning using Keras/TensorFlow, where a neural network is trained to classify objects it has never seen before. The video demonstrates how to reframe the problem as pairs of objects to classify and achieve a significant improvement in accuracy using a Siamese network with a custom Euclidean distance function.

Key Takeaways

Load data into Train and Test sets
Normalize pixel values between 0 and 1
Create pairs of images and labels for training
Define a custom Euclidean distance function as a TensorFlow operation
Build a Siamese network with a lambda layer that uses the Euclidean distance function
Run the network and compile it
Load the Omniglot data set to recognize characters in one alphabet and generalize to other alphabets

💡 One-shot learning can be achieved using a Siamese network with a custom Euclidean distance function, which can significantly improve accuracy in classifying objects that have never been seen before.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Coding the GARCH Model : Time Series Talk

Coding the GARCH Model : Time Series Talk

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train