Real-world image classification using convolutional neural networks | Machine Learning Foundations

Google for Developers · Intermediate ·🧬 Deep Learning ·6y ago

Skills: CV Basics90%ML Pipelines70%

Key Takeaways

Builds real-world image classification models using convolutional neural networks with TensorFlow

Full Transcript

Hi, and welcome back to machine learning foundations for Google developers. I'm Laurence Moroney, your host, and today we're going to look at how to use convolutional neural networks to classify complex features. In the last video, you took what you had learned about CNNs, and you saw how to improve the fashion classifier that you had created much earlier on. You had an exercise to apply that learning to handwriting, so before we get started, let's look at the answer to that exercise. So, the last exercise had you looking at using MNIST and trying to build a convolutional neural network that could classify MNIST. For bonus points, you could use the callback that you learned about earlier, so that once it reached a certain accuracy, you would cancel training. And here's the solution. It's very similar to the fashion MNIST, and don't forget you have to reshape your training images and your test images to add this extra dimension, so that your input shape can be 28 by 28 by 1. For fun, I've just done a single convolutional layer, and hopefully it'll run a little bit faster. Let's say if I try train it for 100 epochs, what would happen? Now, important thing to note when you're doing this with convolutional neural networks is to make sure that your runtime type is actually GPU before you start running. So, I'm going to do that. And now I'm going to run the code. It will take a minute to allocate a GPU, but there it goes. It's connected, and now the code is running. It's going to start by downloading the data. And let's see how many epochs we actually need. I've said it to train for 100, but our callback is going to stop training once it reaches 99.8 accuracy. And here we can see after 90 epochs, it hit 99.81% accuracy, so the callback was hit, and the training was canceled. So, here's the source code. You can take a look at it for yourself. I've given the URL in the slides, and have fun playing with it. That wasn't so bad, was it? You were able to improve the performance of the computer vision simply by using convolutions to try to spot features in the image and then match those features to labels instead of just doing raw pixels and hoping for the best. That technique should in theory work for images that are far more complex than the fashion ones. And in the fashion ones there was just one subject and it was centered and zoomed in. So in this video, we'll see if we can apply that to other images like these. This is a data set of horses and humans with a number of pictures of different horses in different positions as well as different very diverse humans also in different poses. So for example, consider the horse in the top middle. You can only see three of its legs. In the one at the bottom, you can only see two. The men and women are also in different poses and some have parts of their body obscured like the woman with the red dress is actually cut off at the knees. As you can hopefully see, this is a far more challenging problem than we had with fashion MNIST and handwriting digits where every item was posed similarly. The first thing that we need to do before we start coding our network is to have an easy way to label these images where we can tell the computer which ones are horses and which ones are humans. One way to do this that TensorFlow supports to make your life easier is to use subdirectories. So if I have a master directory of images and I then subdivide that into training and validation images and the training directory contains subdirectories called horses and humans each containing the appropriate images with horses in the horses directory and humans in the humans directory. And similarly, if I have a validation directory containing horses and humans in subfolders, I now have a fully labeled set of images for training and testing. With TensorFlow, I can pass these directories to something called a generator and it will auto label the images based on the directory name. This labeling is achieved using an image data generator in the Keras libraries. You import it like this. You can then use this to do some transforms on the image such as normalizing them. Now, this is a very simple normalization where I just divide each channel by 255. There's better ways of doing that, but I'll keep it simple for now. You then create a generator for the images by flowing them out of the directory by calling the flow from directory method. You specify the directory that contains the label subdirectories. So, for example, in this case we're training, so this will be the training directory that contains the horses and humans subdirectories. It's a common area to use one of those instead. Make sure to use the parent directory. You need to specify the size of the images that the generator will provide to the model a little later. Remember when we did the fashion images, they were all 28 by 28. When using real-world images, you aren't guaranteed to have them all the same size. So, thus when flowing them from the directory, as well as rescaling, it's a good idea to resize them, too. You can specify the batch size for training. So, for example, in this case they'll be taken from the directory 128 at a time in order to be fed into the neural network. And finally, there's the class mode. Keep an eye on this as it's an easy source of bugs. If you only have two classes like we do here, keep this as binary. If you have more classes, it should be categorical. For your validation data set, you do exactly the same except that you create a validation generator and point it at the validation directory. These two generators now provide the images that your model can use for training and validation. So, now it's time to define your model architecture that can use these, and later we'll see how you can use them when you're fitting your images to your labels. Here's the code for a simple CNN that can classify the horses and humans images. Now, this should look a little familiar by now. First is a few stacked convolutional layers where every convolutional layer is followed by a max pooling one, as we saw in the last video. The number of convolutions in each layer is purely arbitrary. You can experiment to get the best results. I've done it here by increasing the number of filters as the image size decreases. Remember that the pooling layers quarter the size of your image, so with the smaller images, I'm trying more filters, but you're free to experiment. Remember the input shape though. That's super important. Here we're telling the initial layer to expect to be fed data in a 300 by 300 by 3 format. So, each image is 300 by 300 pixels, and there are three bytes per pixel. This needs to match the size that you specified in the generator a little earlier. Finally is your output layer. The number of neurons should match the number of classes that you have. There is one exception. With a binary classifier like this one, you can get away with only one neuron and a sigmoid activation function. And this pushes the value towards zero for one class and towards one for the other class. If we look at our model architecture, we'll see something like this. And the journey of the image through the layers is apparent. It starts 300 by 300, loses a pixel border to become 298 by 298, gets halved in each dimension by the pooling to become 149 by 149, and it loses another pixel border, etc., etc. Overall, this network will need to learn about 40 million parameters. So, training it might take a little while. When you compile your model, you specify a loss function and an optimizer. And in this case, we're going to use a loss function called binary cross-entropy. And this is a common one when there's a binary classification. We'll also use an optimizer called RMSprop, which is able to accept parameters for something called the learning rate. Start with 0.001 like this and you can tweak it as you go. And we'll capture the accuracy metrics while we're training. The learning rate parameter defines how the mathematical functions in the transformer can learn using something called gradient descent. It's a little bit beyond this video to go into that in detail, but if you want to learn more, there's a great video from Andrew Ng at this URL. Now it's time to do the training. And if you're familiar with model.fit, which we saw previously where we specified the data and the labels, when using a generator, all you have to do is specify the generator and it will infer the labels from the provided subdirectories. And you can see that here. My training data and that has the images and the labels that are provided by the training generator. I'll just train it for short time, 15 epochs, and you'll see how well this model can perform. The validation data will come from the validation generator and in the same way as the training, it will get both the images and the labels. The verbose parameter just tells TensorFlow how much detail to report per epoch. If you remember back when we specified the training generator and we defined batch sizes to flow the images from the directory, you can also specify the batches for training and you'll find that training will be drastically faster. It takes a bit of experimentation to get the right combination, but when I didn't use any, using Colab, even with a GPU, it was taking many minutes per epoch. However, if I specify the number of steps to use when training, it massively increases performance so that each epoch was taking less than 10 seconds. So, watch out for this while you're training. Here's the code with the step set. The rule of thumb here is to consider the amount of items in the data set and then divide it by the batch size. If you remember earlier, we had 128 batch size and there's a little over 1,024 items in the data set, so I chose a step size of eight. After 15 epochs, this model should reach close to 100% accuracy on the training set and about 85% on the validation set. This is called overfitting and it's a common error in neural networks that can lead you to a false sense of security. What has happened is that the network got to be really, really good, almost perfect, at classifying data that it has already seen, and that's the training data, but it's not quite as good at understanding data that it hadn't previously seen, such as the stuff in the validation data set. It's a little bit like if the only shoes you'd ever seen in your life were hiking boots, then you may not recognize a high heel as a shoe. You've overfit yourself into thinking that all shoes look like hiking boots. There are techniques to avoid this in neural networks and we'll look at some of them soon, but before that, let's just explore how to use the network that you've just created to classify images. Here's the code to upload an image to Colab and have it use the model to predict the contents of the image. This works in Colab only and you import the files library to use it. And to get a list of uploaded files by calling files.upload like this. Then, for each file that's uploaded, you have to convert it to be 300 by 300, turn it into an array, and call expand_dims to ensure that it's a three-dimensional array with the third element being the color depth that we saw when we were defining the model. Once you have your image or images in a list, you can then call model.predict to get the results back. If the value is greater than 0.5, it's a human, otherwise, it's a horse. Now that we've gone through everything, let's take a look at a hands-on lab where you can build this for yourself. I'll run through it first and then I'll provide a URL where you can try it out for yourself. Let's start looking at using convolutions with complex images. And in this case, we're going to use the horses or humans data set that we spoke about during the lessons. This download will give you the horses or humans training set, and this one will be the validation set. They're separate zips, so you can download and have a play with them. This following code will then use Python's operating system libraries just to unzip that and put it into our required directories. Here, I'm going to specify what the training directory is for horses and humans, and what the validation directory is for horses and humans. Taking a look at /tmp/horse2human/horses, we can actually look at the files. In Colab, it's quite nice. You can look at the file system, and we can see that we have a horse2human directory with horses and humans, and we have a validation horse2human directory with horses and humans. If we list them, then we can start seeing some of the files that are in there. So, within the training horse directory, we have files like horse487, horse211, etc. And within the training human directory, we have images like human1201, human1518. We can see the total numbers that we have. So, training horse images, there's 500. Training human images, there's 527. We can plot some of these to take a look at what they look like. And here is a bunch of eight horses and eight humans. And I've designed this data set to have humans of various skin colors, hair length, dress, pose, all that type of thing. And you can begin to see some of the diversity of it there. Though, these two images are very similar. And the similar thing with horses. Different colors, different shades of horse, different pose, different backgrounds, that kind of thing. The idea is to try and extract features as much as possible from these, and it's the feature that tells us what makes a horse a horse, and and what makes a human a human. And now, I'm going to take a look at building my model. I'm going to run this code and import TensorFlow. I'll print out the TensorFlow version. We see we're on 2.2 release candidate 3. If you're watching this later, you'll see a later version. I'm going to create my model. I can do a model.summary, so we can see our neural network and we can see the journey of the image through the network. I'm going to use RMSprop, root mean square propagation, as my optimizer and set the learning rate to be 0.001. And now I need to create my generators that are going to flow the images from the directories that we specified earlier on. We can see that my training generator found 1,027 images belonging to two classes and my validation generator 256 images belonging to two classes. So now all we have to do is model fit. Just going to run it for 15 epochs just to see what it looks like. So now at the end of 15 epochs, we can see that we actually reached a 100% accuracy on the training set and about 78% accuracy on the validation set. What's happening here is something called overfitting, where the network has become too specialized to be able to spot what's in the training set and it's not general enough to spot other things. If you notice on the validation set, the loss is actually increasing per epoch, increasing quite sharply in fact and as a result that's a great sign that shows your network is overfitting. In later lessons, we'll talk about techniques to avoid overfitting like this, but this can lead you to a false sense of security that your network is actually 100% accurate, where according to the validation set, it's more like 80% accurate. But let's try it anyway and see what happens. And one of the things that's kind of interesting about this data set is that if you notice when you're looking at the images, they're all actually CGI. But because they're CGI images of horses and humans, they still have features that are horses and human features. And as a result, we should be able to try them against real images. So if I run this piece of code, it's going to ask me to choose some files. So, I'll pick files and I have some pictures on my desktop. For example, this one here. From the file name, you can tell it's a horse. It will upload that. It will classify it and actually classifies it correctly as a horse. Another thing that's pretty cool that you can do with this code is you can actually upload multiple images and have them all classified. So, I'm going to pick all of these. The first three are horses with horse in the name. The next three are humans. Uh two of them are female and one of them is male. And let's take a look at what we get. So, here's all the images being uploaded. And now we can see. So, horse is a horse, horse is a horse, horse is a horse, which is great. This image beautiful. JPEG is a horse. So, it's miscategorized a human as a horse. And these two are correctly categorizing human. So, we can take a look at what's in that image, the beautiful 1274. And I got these images from pixabay.com. But, this image, for example, ended up being classified as a horse. Now, you and I can see that it clearly isn't a horse, it's a human. But, what about the features? What is it about this image that made the network think it was a horse? This is an example of the overfitting that was actually happening. It may be her long hair. It's one of those things that as you're working through your images and as you're working through convolutional neural networks that you'll see if you can fix. But, for now, it's done a pretty good job. And give it a try for yourself. Maybe change the model architecture. And later on, you'll learn about something called image augmentation that may be able to help with fixing this. Also, as you work through the lab, you'll see these intermediate representations that you can have a play with. So, you can see what the image looked like as it was traveling through the network. It'll pick one at random and it will show you. Like here's one of a man and we can see features beginning to be extracted from that. Have fun with it and I'd love to see what you come up with. Now, you can try it for yourself if you like. Pause the video and then go to this URL. When you're done, come back and then you can try the exercise. As always, I'll have the answer to the exercise in the next video, so don't forget to hit that subscribe button, and I'll see you soon.

Original Description

Machine Learning Foundations is a free training course where you’ll learn the fundamentals of building machine learned models using TensorFlow. In Episode 5, Google Developers can learn how to use convolutional neural networks to classify complex features and build machine learning models with Tensorflow. This tutorial explores real world image classification, with a hands-on example to tackle a more challenging computer vision problem, classifying images of horses and humans! Exercise 3 answer → https://goo.gle/3dml4e3 Example: Classifying complex images → https://goo.gle/2YLupZ7 Exercise 4 → https://goo.gle/2WbPo5E TensorFlow is Google’s end-to-end open source machine learning platform. For more videos about TensorFlow, subscribe to the TF YouTube channel → https://goo.gle/TensorFlow Machine Learning Foundations playlist → https://goo.gle/ml-foundations Subscribe to Google Developers → https://goo.gle/developers

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Google for Developers · Google for Developers · 51 of 60

← Previous Next →

Developer Journey - Sunnyvale DSC Summit ‘19

Developer Journey - Sunnyvale DSC Summit ‘19

Google for Developers

How Google is working with students - Sunnyvale DSC Summit ‘19

How Google is working with students - Sunnyvale DSC Summit ‘19

Google for Developers

Starting your career in the Cloud - Sunnyvale DSC Summit ‘19

Starting your career in the Cloud - Sunnyvale DSC Summit ‘19

Google for Developers

The Solution Challenge - Sunnyvale DSC Summit ‘19

The Solution Challenge - Sunnyvale DSC Summit ‘19

Google for Developers

Firebase - Sunnyvale DSC Summit ‘19

Firebase - Sunnyvale DSC Summit ‘19

Google for Developers

Cloud Hero - Sunnyvale DSC Summit ‘19

Cloud Hero - Sunnyvale DSC Summit ‘19

Google for Developers

Panel discussion - Sunnyvale DSC Summit ‘19

Panel discussion - Sunnyvale DSC Summit ‘19

Google for Developers

The art of negotiation - Sunnyvale DSC Summit ‘19

The art of negotiation - Sunnyvale DSC Summit ‘19

Google for Developers

Courage to care, solve and share - Sunnyvale DSC Summit ‘19

Courage to care, solve and share - Sunnyvale DSC Summit ‘19

Google for Developers

Version 9 of Angular, Glass Enterprise Edition 2, path to DX deprecation, & more!

Version 9 of Angular, Glass Enterprise Edition 2, path to DX deprecation, & more!

Google for Developers

[DEPRECATING] Introducing a new series (Assistant for Developers Pro Tips)

[DEPRECATING] Introducing a new series (Assistant for Developers Pro Tips)

Google for Developers

Detecting memory bugs with HWASan, Bazel 2.1, Next ‘20 session guide, & more!

Detecting memory bugs with HWASan, Bazel 2.1, Next ‘20 session guide, & more!

Google for Developers

Why Podcast.app chose a .app domain name

Why Podcast.app chose a .app domain name

Google for Developers

Machine Learning Bootcamp Jakarta 2019

Machine Learning Bootcamp Jakarta 2019

Google for Developers

Android Studio 3.6, Android 11 Developer Preview, Kubeflow 1.0, & more!

Android Studio 3.6, Android 11 Developer Preview, Kubeflow 1.0, & more!

Google for Developers

[DEPRECATING] Importance of community (Assistant on Air)

[DEPRECATING] Importance of community (Assistant on Air)

Google for Developers

Why the Flutter team switched from .io to a .dev domain name

Why the Flutter team switched from .io to a .dev domain name

Google for Developers

3 website-building tips from .dev creators

3 website-building tips from .dev creators

Google for Developers

Why NimbleDroid chose a .app domain name

Why NimbleDroid chose a .app domain name

Google for Developers

Android Platform Codelab, Bazel 2.2, Maps Android Utility Library v1.0, & more!

Android Platform Codelab, Bazel 2.2, Maps Android Utility Library v1.0, & more!

Google for Developers

Google for Games Developer Summit: A free, digital experience for game developers

Google for Games Developer Summit: A free, digital experience for game developers

Google for Developers

Inspecting Home Graph (Assistant for Developers Pro Tips)

Inspecting Home Graph (Assistant for Developers Pro Tips)

Google for Developers

Google for Games Developer Summit Keynote

Google for Games Developer Summit Keynote

Google for Developers

Stadia Games & Entertainment presents: Keys to a great game pitch (Google Games Dev Summit)

Stadia Games & Entertainment presents: Keys to a great game pitch (Google Games Dev Summit)

Google for Developers

Empowering game developers with Stadia R&D (Google Games Dev Summit)

Empowering game developers with Stadia R&D (Google Games Dev Summit)

Google for Developers

Supercharging discoverability with Stadia (Google Games Dev Summit)

Supercharging discoverability with Stadia (Google Games Dev Summit)

Google for Developers

Stadia Games & Entertainment presents: Creating for content creators (Google Games Dev Summit)

Stadia Games & Entertainment presents: Creating for content creators (Google Games Dev Summit)

Google for Developers

Bringing Destiny to Stadia: A postmortem (Google Games Dev Summit)

Bringing Destiny to Stadia: A postmortem (Google Games Dev Summit)

Google for Developers

Live Captioning in Google Slides

Live Captioning in Google Slides

Google for Developers

[DEPRECATING] User engagement for the Google Assistant

[DEPRECATING] User engagement for the Google Assistant

Google for Developers

TensorFlow Dev Summit ‘20, Google for Games Dev Summit, Cloud AI Platform Pipelines, & much more!

TensorFlow Dev Summit ‘20, Google for Games Dev Summit, Cloud AI Platform Pipelines, & much more!

Google for Developers

Top 5 from the TensorFlow Dev Summit 2020

Top 5 from the TensorFlow Dev Summit 2020

Google for Developers

Developer Student Clubs 2019 Turkey Leads Summit

Developer Student Clubs 2019 Turkey Leads Summit

Google for Developers

Building simpler payment experiences | Google Pay Plugin for Magento 2

Building simpler payment experiences | Google Pay Plugin for Magento 2

Google for Developers

Become A Developer Student Club Lead

Become A Developer Student Club Lead

Google for Developers

Firebase Kotlin Extensions, ARM apps on the Android Emulator, Angular v9.1, & more!

Firebase Kotlin Extensions, ARM apps on the Android Emulator, Angular v9.1, & more!

Google for Developers

Test suite for Smart Home (Assistant for Developers Pro Tips)

Test suite for Smart Home (Assistant for Developers Pro Tips)

Google for Developers

Google Play updates, Bazel 3.0, Business Console for Google Pay, & more!

Google Play updates, Bazel 3.0, Business Console for Google Pay, & more!

Google for Developers

How to use error logs (Assistant for Developers Pro Tips)

How to use error logs (Assistant for Developers Pro Tips)

Google for Developers

Contact Center AI, Android Studio 4.1 Canary 5, TensorFlow QAT API, & more!

Contact Center AI, Android Studio 4.1 Canary 5, TensorFlow QAT API, & more!

Google for Developers

WebView DevTools, Kotlin meets gRPC, Flutter CodePen support, & more! (Episode 200)

WebView DevTools, Kotlin meets gRPC, Flutter CodePen support, & more! (Episode 200)

Google for Developers

Offline handling for Smart Home (Assistant for Developers Pro Tips)

Offline handling for Smart Home (Assistant for Developers Pro Tips)

Google for Developers

Android 11 Dev Preview 3, Google Fonts for Flutter, Shielded VM, & more!

Android 11 Dev Preview 3, Google Fonts for Flutter, Shielded VM, & more!

Google for Developers

Machine Learning Foundations: Ep #1 - What is ML?

Machine Learning Foundations: Ep #1 - What is ML?

Google for Developers

Flutter web support updates, BigQuery materialized views, Cloud Spanner emulator, & more!

Flutter web support updates, BigQuery materialized views, Cloud Spanner emulator, & more!

Google for Developers

Computer vision by building a neural network with TensorFlow | Machine Learning Foundations

Computer vision by building a neural network with TensorFlow | Machine Learning Foundations

Google for Developers

Machine Learning Foundations: Ep #3 - Convolutions and pooling

Machine Learning Foundations: Ep #3 - Convolutions and pooling

Google for Developers

Android 11 Beta plans, Flutter 1.17, Dart 2.8, & much more!

Android 11 Beta plans, Flutter 1.17, Dart 2.8, & much more!

Google for Developers

Machine Learning Foundations: Ep #4 - Coding with Convolutional Neural Networks

Machine Learning Foundations: Ep #4 - Coding with Convolutional Neural Networks

Google for Developers

Google Developers ML Summit

Google Developers ML Summit

Google for Developers

Real-world image classification using convolutional neural networks | Machine Learning Foundations

Real-world image classification using convolutional neural networks | Machine Learning Foundations

Google for Developers

Adobe XD support for Flutter, Architecture Framework, temporary closures with Places API, & more!

Adobe XD support for Flutter, Architecture Framework, temporary closures with Places API, & more!

Google for Developers

Machine Learning Foundations: Ep #6 - Convolutional cats and dogs

Machine Learning Foundations: Ep #6 - Convolutional cats and dogs

Google for Developers

Machine Learning Foundations: Ep #7 - Image augmentation and overfitting

Machine Learning Foundations: Ep #7 - Image augmentation and overfitting

Google for Developers

Announcing Firebase Live, Flutter Day, Java 11 on Google Cloud Functions, & more!

Announcing Firebase Live, Flutter Day, Java 11 on Google Cloud Functions, & more!

Google for Developers

Machine Learning Foundations: Ep #8 - Tokenization for Natural Language Processing

Machine Learning Foundations: Ep #8 - Tokenization for Natural Language Processing

Google for Developers

Android 11 Beta, Google Play Asset Delivery, Firebase Crashlytics SDK, & much more!

Android 11 Beta, Google Play Asset Delivery, Firebase Crashlytics SDK, & much more!

Google for Developers

Natural Language Processing: Using sequencing APIs in TensorFlow | Machine Learning Foundations

Natural Language Processing: Using sequencing APIs in TensorFlow | Machine Learning Foundations

Google for Developers

Build a sarcasm classifier using NLP and TensorFlow | Machine Learning Foundations

Build a sarcasm classifier using NLP and TensorFlow | Machine Learning Foundations

Google for Developers

AR Realism with the ARCore Depth API

AR Realism with the ARCore Depth API

Google for Developers

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train