Loading in your own data - Deep Learning basics with Python, TensorFlow and Keras p.2

sentdex · Beginner ·🧬 Deep Learning ·7y ago

Key Takeaways

The video demonstrates how to load and preprocess a custom dataset, specifically the Dogs vs Cats dataset from Microsoft, using Python, TensorFlow, and Keras for image classification tasks. It covers various steps including data loading, preprocessing, and model definition using convolutional neural networks.

Full Transcript

what is going on everybody and welcome to part two of our deep learning with python tensorflow in Kos tutorial uh in this tutorial what we're going to be talking about is how to load in an outside data set the outside data set we're going to use is this uh cats and dogs data set from Microsoft it was for initially a kaggle challenge and the idea is to take pictures of cats and dogs and then identify them by feeding them through a neural network and have the neural Network say whether or not that's a cat or a dog so go ahead and download that data set and then once you have that data set let me uh pull up an example here uh what you should see is like this so uh you should get two directories these two I've added these two things I've added in but what you should have is cat and dog and then in here you should have some like images of cats and and dogs and this case a bunch of dogs so um each one has like 12,500 samples so you should have plenty of examples to teach a model what's a cat and what's a dog so go ahead and download those extract those and then we'll come over here and we will get to work so we're going to import first of all numpy is NP we're going to import met plot li. pyplot as PLT we're going to import OS we're going to import CV2 and from and actually just CB2 uh if you don't have numpy Pip install numpy if you don't have map plot Li pip install map plot Li and if you don't have CV2 you will need to do a pip install open cv- python all right once we have those basically I'm going to use map plot lib just to show the image uh we're going to use OS to iterate through directories and join paths CV2 to show or I'm sorry to do some image operations and then numpy to do various array operations so uh the first thing we're going to do is specify a data directory uh my data is located in my X Files under data sets and pet images then we're going to have categories uh the two categories we've got to deal with here are uh dog and cat I can't leave those different and in fact we should probably just double quote since we double quoted the other ones so now what we want to do is iterate through all of the are examples of of dog and cat so the way we're going to do that is for category in categories what do we want to do well we're going to say path is os. path. jooin data dur and whatever that category is so basically this just gets our us into our basically the the path to cats or dogs dur then what we want to say is for image in os. liser that path so it'll just be a bunch of these images basically that are named by just number um once we have that we can just iterate through all of those images so uh the images are we can convert them immediately to an array with CV2 IM and we can read those with os. path. jooin and we're going to join path and image now and that's the full path to that image then what we're going to do is we're going to convert it to gray scale so we read in um that image and then we say CV2 oops CV2 _ grayscale so we're going to convert it to grayscale because one uh you know RGB data is three times the size of grayscale data and I just don't think that uh color is that essential in this specific task in a lot of identifying tasks it is but at least in you know the difference between a cat and a dog um yes they have a lot of color but does the is the color the differentiating factor between cats and dogs I don't think so if it was like you know dog versus some reptile or something like that then yeah probably you'd want to bring in um color okay so once we've done that uh the next thing that we want to do is like we can at least we can graph this and and look at what we're dealing with just to make sure it's what we expect so PLT imow image array cmap equals gray again the only reason I'm using M plot lib here is because um I don't know how to do like uh in line in uh jupyter notebook uh with CV2 I'm sure there's a way if somebody knows it go ahead and leave it below then I'm just going to throw a break here in a break just so we can look at this picture real quick so as you can see it's a grayscale image of a dog no surprise there it's kind of what we expected um also our data this is what our data looks like image array okay just a bunch of numbers now what if we didn't convert it to gray scale so in this case you can see it's just a bunch of number you know pixel values here in in this 2D array what if we didn't do grayscale though and what if we just did this blue dog um now no longer is it 2D because these are actual RGB actually probably BGR I believe is how CB2 reads things in that's why this is all why the dog is uh blue in the photo anyway uh we're we are going to keep grayscale though and just take a note you know like what how that changes your your actual data so for example we could say image. shape for example there and it's a 398 by 500 which brings us to the next problem we have which as you can even tell like some of all these dogs are like different shaped photos some are landscape some are portrait some are square and we really need everything to be normalized at all you know if at all possible there are ways to have variable sized images to make classifications on um but in the interest of keeping things as simple as possible uh we'd like to make everything the same shape so that's that's what we're going to do next now we have to decide on a shape so for example what if we say image size 50 so maybe we're going to try every image is a 50 by 50 so the way we would do that is just uh new array equals CV2 resize and we resize the image array and then we do image size image size and that's our new array and we really should look at it uh so mow uh new array and then cmap for the color map oh my gosh can we get this please uh gray and then PLT do show so I can still tell that's a dog but eventually we can't right if we do 10 make it a 10 x 10 I can't tell that's a dog I don't think anybody can we go with uh 20 still probably not maybe though you you might be able to get away with it um at 30 it's still pretty hard but now you can start to see like you know like the forearm or I guess the wrist you know the area after the wrist the hand of a dog I guess I don't know anyways um that's usually longer in dogs so I could make this classification at this point but you know at 50 I'm pretty comfortable but you do have to be be careful because this dog takes up quite a bit of the image whereas some of these might not like for example I'm I'm sure I can find one eventually um this one's a pretty small he Blends into the bench quite a bit uh most of these are pretty pretty well done actually but sometimes it'll it won't it'll be smaller part of the image um and it'll be a little harder when we make it a small image look at this camo dog anyway um so keep that in mind I'm going to I guess I'll stick with 50 we'll see how that goes so uh once you decide on the size that you want to go with uh let's go ahead and create our training data set so I'm going to say training data equals an empty list and then I'm going to start this function that I'm going to call create training data and I'm going to give myself uh I'm just going to say pass I'm just going to give myself some space here there we go okay create training data and what we want to do is now iterate through everything and build it um the data set so I'm just going to take this copy and then come down here paste it in tab this over um now the next thing we need to do is we our neural network kind of like in our mnus data set right did we we we have to map basically we got the features as numbers but our label our classification is not yet a number right we can't map things to a string dog or a string cat right we we need to map things to uh some sort of numerical value so we're going to say zero is a dog one is a cat and the way we can do that is we can just get the index value of dog and cat um and make that index value in that list the arbitrary classification it doesn't matter which one is the one and which one is the zero it's just you have to convert to a number so the way we're going to do that is I'm just going to say class uncore num is going to be equal to to categories. index index lowercase uh whatever that category is so zero for a dog one for a cat then we're going to iterate over the images uh we don't need to show them we don't need to break anymore and uh all I want to do now is resize with this new array so we'll come down here we'll perform that resizing operation and uh I think that's about it so now we just want to uh this to our training data list above there so trainingcore data. upend and we want to append our newcore array and whatever that classification is so class uncore num okay so we want to do that for everything uh the other thing we probably should do is for image we probably should encase this in a trr except I've been through this data set before and some of the images are like broken so except exception as e and you know what I'm going to do I am going to I'm just going to pass actually I already know that there's some that are broken you normally you would throw the exception so you could read it and figure out what's going wrong um but I'm going to go ahead and just pass U but there's like you'll get like an OS error and some other warning information and all that fun stuff but I'll just pass for now create training data and then um running that and then whenever that's done let's go ahead and print print the the Len of training data now uh a couple things we should talk about is the balance of your training data so it's really important that your training data especially like in the case of classification is properly balanced so in the case of a binary Choice like we have cats and dogs you want to make sure you have 50/50 right uh just as many cats and just as many dogs now sometimes you can have different numbers and then when you train the model you can inform the model and say hey these are our class weights so they have weights that you can pass and the way this will work is it will weight the you know the loss a little a little differently uh in an attempt to handle for your imbalanced data set but if at all possible you definitely want to balance the data set instead so but sometimes like let's say you had so let's say you had a data set that was 75% dog 25% cat you feed that through the neural network the neural network is going to learn really quickly just always predict dog and you'll be 75% right and then when it tries to learn from there it's going to have a really really hard time so um so if you balance it so it's a perfect 50/50 you'll be better the next thing you want to do is Shuffle the data so we've got the training data but as you can see the first thing we did was iterate over the category uh then go from there so everything's dog all dogs and then all cats well if you feed that to the normal Network it's going to learn okay only predict dog and then it switches over to cats and it's like oh I'm wrong I'm wrong I'm wrong okay cat cat everything's a cat and then it just keeps keeps going back and forth and it's probably not going to learn very well that way either so we definitely want to shuffle the data so we can import random and then we can do a random. shle there we go uh training data since training is a list mutable there it is it's already uh shuffled at this point so for example we could now uh for I Don't Know sample in training data we we can check that our um labels are correct by doing um print uh sample one so this will be the label so sample zeroi would be oh we went through them all anyways sample zero would be um and let's just do up to 10 here would be the actual image array itself I don't want to run the whole thing all over again well we'll just wait for that that's fine so now let's do the um let's take this data now and now that it's shuffled let's pack it into the variables that we're going to use right before we feed it into our normal Network so uh that's going to be an empty list for x and an empty list for y General capital x is your feature set lowercase Y is your labels uh sometimes you'll see train X test Y and like all these kinds of things and we we could we can split those up but we can still we can actually um specify a validation set um so you don't have to split them up sometimes you you may do that anyways but uh you don't really have to do that so you can use built-in methods to to properly do an out of sample test Okay so now we're going to say for uh features label in training data we're going to x. aend uh features y. aend aend uh label so we build those out into lists and we really can't pass especially for the features we we can't pass a list to the neural network um I wish we could I think we should be able to especially if we're using a highle API like Kos it's kind of silly that we still got to do this um hopefully soon one day this will change um but we need to convert both things well actually y can stay a list but X has to be first of all a numpy array so let's go ahead and do that so x equals NP array X but also we need to reshape it to be -1 by the shape of each X um so negative 1 is just how many features do we have well you can say negative 1 and that's kind of a catch all it's anything any number and then we can say uh we know the shape of the data to be image size by image size you could also do like X10 X11 and so on but anyways we'll we we'll just go with this um so image size by image size and then by one so this one is because it is a gray scale and just as a a bit of a homework challenge for you guys um I encourage you to attempt to make a convenant to work rather than with gray scale work with color images so see if you can convert what we're going to build here here from a grayscale convolutional normal Network to be color so just as an aside that that might be the hardest thing to remember to do is change that one to a three because then it would be three values so that's what we have to do again I think this reshape is kind of silly I wish somebody from hopefully Kos eventually would would make this not be a requirement we should not have to deal with that shape it should just intelligently shape it because it's always the same kind of thing like we always have to do the same step uh and it's silly okay so once we have our data uh oh um X when did we convert it first of all when do we Define an x x. a when did it get converted when did we use an X I'm just tripping out I guess because I never ran this I don't know when we had an X so why how when was x uh a thing oh man someone's got to help me out with that one when did X become a numpy array before this was run uh did we ever use an X variable or maybe it's cuz I'm redoing this and X already did have a yeah so the First Recording got screwed up that's what happened man that was killing me I was like wait a second so X already was if I should have restarted the kernel anyway moving along uh uh okay the last thing that we want to do is we want to save these so you don't want to generate your data like you don't want to like do all this crunching and reshaping and all that every single time because chances are you're going to be like tweaking your model so the reality of neural networks is that you don't have the answer right away or you're not working on a problem that's is easy as the one that we're about to be working on so it's going to be it's not going to be as simple as like oh just throw this model together use exactly these uh you know features and this many nodes this many layers you're not going to know you're going to be tweaking so you don't want to have to be rebuilding your data set every time so the last thing we're going to do is we're going to once we've created our training data we're just going to import pickle and then in some way you don't have to use pickle we could np. save or whatever somehow save your data so you don't have to redo it every time so pickle out equals open uh and I'm going to open x. pickle x. pickle WB pickle. dump we want to dump X where to pickle out and then pickle uh pickle out. close and then we want to do the same thing for the Y so I'm just going to do Y come down here Y and then later if we want to read it back in you can do something like this like pickle in equals open uh x. pickle RB uh and then uh or really we kind of did an extra step there but anyway x equals pickle. load uh pickle in okay and then x one for example is our image and then x uh so it'll be um our feature basically so that would be our image and then y1 would be the label for X1 anyways that's all for now what we're going to do is in the next tutorial we're going to take this data set that we've you know compiled and then we're going to feed it through our convolutional noral Network after going over some of the basics of confidence and all that so uh if you've got questions comments concerns whatever something you think could be done better um whatever feel free to leave them below otherwise I will see you guys in the next tutorial where we are uh going to feed it through a noral network and hopefully uh get our correct classifications so till next time

Original Description

Welcome to a tutorial where we'll be discussing how to load in our own outside datasets, which comes with all sorts of challenges! First, we need a dataset. Let's grab the Dogs vs Cats dataset from Microsoft: https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765 Text tutorials and sample code: https://pythonprogramming.net/loading-custom-data-deep-learning-python-tensorflow-keras/ Discord: https://discord.gg/sentdex Support the content: https://pythonprogramming.net/support-donate/ Twitter: https://twitter.com/sentdex Facebook: https://www.facebook.com/pythonprogramming.net/ Twitch: https://www.twitch.tv/sentdex G+: https://plus.google.com/+sentdex
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from sentdex · sentdex · 0 of 60

← Previous Next →
1 Matplotlib Python Tutorial Part 1: Basics and your first Graph!
Matplotlib Python Tutorial Part 1: Basics and your first Graph!
sentdex
2 Python Encryption Tutorial with PyCrypto
Python Encryption Tutorial with PyCrypto
sentdex
3 Python's Logging Function
Python's Logging Function
sentdex
4 wxPython Tutorials 1: Making Windows GUIs with Python : Installing + 1st window!
wxPython Tutorials 1: Making Windows GUIs with Python : Installing + 1st window!
sentdex
5 wxPython Tutorials 2: Making Windows GUIs with Python: Customizing Window Parameters
wxPython Tutorials 2: Making Windows GUIs with Python: Customizing Window Parameters
sentdex
6 wxPython Programming Tutorial 3: Menu Bar and Menu Button
wxPython Programming Tutorial 3: Menu Bar and Menu Button
sentdex
7 wxPython Programming Tutorial 4: Panels
wxPython Programming Tutorial 4: Panels
sentdex
8 wxPython Programming Tutorial 5: User Input Saved To Variables
wxPython Programming Tutorial 5: User Input Saved To Variables
sentdex
9 wxPython Programming Tutorial 6: Multiple Choice Input
wxPython Programming Tutorial 6: Multiple Choice Input
sentdex
10 wxPython Programming Tutorial 7: Adding Static Text and Colors
wxPython Programming Tutorial 7: Adding Static Text and Colors
sentdex
11 wxPython Programming Tutorial 8: Custom Button Images
wxPython Programming Tutorial 8: Custom Button Images
sentdex
12 wxPython Programming Tutorial 9: Tool Bar Items and Sub Menus!
wxPython Programming Tutorial 9: Tool Bar Items and Sub Menus!
sentdex
13 Basic PHP Tutorial 13: Multi-dimensional Array
Basic PHP Tutorial 13: Multi-dimensional Array
sentdex
14 Basic PHP Tutorial 15: Functions and Global Variables
Basic PHP Tutorial 15: Functions and Global Variables
sentdex
15 Basic PHP Tutorial 12: Associative Array
Basic PHP Tutorial 12: Associative Array
sentdex
16 Basic PHP Tutorial 14: Foreach loop
Basic PHP Tutorial 14: Foreach loop
sentdex
17 Basic PHP Tutorial 16: Include and Require
Basic PHP Tutorial 16: Include and Require
sentdex
18 Basic PHP Tutorial 7: Assignment, comparison and Logical operators
Basic PHP Tutorial 7: Assignment, comparison and Logical operators
sentdex
19 Basic PHP Tutorial 4: Variables and Comments
Basic PHP Tutorial 4: Variables and Comments
sentdex
20 Basic PHP Tutorial 11: Arrays part 1, basic array
Basic PHP Tutorial 11: Arrays part 1, basic array
sentdex
21 Basic PHP Tutorial 6: If else and else if conditionals cont'd
Basic PHP Tutorial 6: If else and else if conditionals cont'd
sentdex
22 Basic PHP Tutorial 1: Intro to PHP
Basic PHP Tutorial 1: Intro to PHP
sentdex
23 Basic PHP Tutorial 3: HTML with PHP
Basic PHP Tutorial 3: HTML with PHP
sentdex
24 Basic PHP Tutorial 9: While Loop
Basic PHP Tutorial 9: While Loop
sentdex
25 Basic PHP Tutorial 10: Switch Statement
Basic PHP Tutorial 10: Switch Statement
sentdex
26 Basic PHP Tutorial 2: Print and Echo
Basic PHP Tutorial 2: Print and Echo
sentdex
27 Basic PHP Tutorial 5: If else and else if conditional statements
Basic PHP Tutorial 5: If else and else if conditional statements
sentdex
28 Basic PHP Tutorial 8: Arithmatic Operators: Doing math with php
Basic PHP Tutorial 8: Arithmatic Operators: Doing math with php
sentdex
29 Basic PHP Tutorial 17: User Input Form Example / String Manipulation
Basic PHP Tutorial 17: User Input Form Example / String Manipulation
sentdex
30 Basic PHP Tutorial 18: HTML Entities and forms cont'd
Basic PHP Tutorial 18: HTML Entities and forms cont'd
sentdex
31 Basic PHP Tutorial 19: Finding words in strings
Basic PHP Tutorial 19: Finding words in strings
sentdex
32 Basic PHP Programming Tutorial 20: Saving to a File / writing and appending
Basic PHP Programming Tutorial 20: Saving to a File / writing and appending
sentdex
33 Basic PHP Programming Tutorial 22: Hashing part 2: salting
Basic PHP Programming Tutorial 22: Hashing part 2: salting
sentdex
34 Basic PHP Programming Tutorial 23: Variables in Strings and tokenizing
Basic PHP Programming Tutorial 23: Variables in Strings and tokenizing
sentdex
35 Basic PHP Programming Tutorial 21: MD5 Hashing For Security
Basic PHP Programming Tutorial 21: MD5 Hashing For Security
sentdex
36 Basic PHP Programming Tutorial 24: String similarity
Basic PHP Programming Tutorial 24: String similarity
sentdex
37 Basic PHP Programming Tutorial 25: Time and Time stamps
Basic PHP Programming Tutorial 25: Time and Time stamps
sentdex
38 Basic PHP Programming Tutorial 26: Die and Exit
Basic PHP Programming Tutorial 26: Die and Exit
sentdex
39 Basic PHP Programming Tutorial 27: MySQL Databases Part 1
Basic PHP Programming Tutorial 27: MySQL Databases Part 1
sentdex
40 Basic PHP Programming Tutorial 28: MySQL Database Part 2: Reading From Database
Basic PHP Programming Tutorial 28: MySQL Database Part 2: Reading From Database
sentdex
41 Basic PHP Programming Tutorial 29: MySQL Database Part 3: Inputting Data
Basic PHP Programming Tutorial 29: MySQL Database Part 3: Inputting Data
sentdex
42 Basic PHP Programming Tutorial 30: MySQL database in Use
Basic PHP Programming Tutorial 30: MySQL database in Use
sentdex
43 Django Tutorial Web Development with Python Part 1: Installing Django
Django Tutorial Web Development with Python Part 1: Installing Django
sentdex
44 Python Tutorial: File Deletion and Folder Deletion / directory deletion
Python Tutorial: File Deletion and Folder Deletion / directory deletion
sentdex
45 Python Tutorial: How to Rename Files and Move Files with Python
Python Tutorial: How to Rename Files and Move Files with Python
sentdex
46 3D Graphs in Matplotlib for Python: Basic 3D Line
3D Graphs in Matplotlib for Python: Basic 3D Line
sentdex
47 3D Plotting in Matplotlib for Python: 3D Scatter Plot
3D Plotting in Matplotlib for Python: 3D Scatter Plot
sentdex
48 3D Charts in Matplotlib for Python: Multiple datasets scatter plot
3D Charts in Matplotlib for Python: Multiple datasets scatter plot
sentdex
49 Sikuli Tutorial 1: Visually programming in python!
Sikuli Tutorial 1: Visually programming in python!
sentdex
50 Sikuli Tutorial 2: Program visually in python!
Sikuli Tutorial 2: Program visually in python!
sentdex
51 Sikuli Tutorial 3: Program visually in python!
Sikuli Tutorial 3: Program visually in python!
sentdex
52 3D Bar Charts in Python and Matplotlib
3D Bar Charts in Python and Matplotlib
sentdex
53 3D Plane wire frame Graph Chart in Python
3D Plane wire frame Graph Chart in Python
sentdex
54 Raspberry Pi Part 1 Introduction
Raspberry Pi Part 1 Introduction
sentdex
55 Raspberry Pi Part 8: First Download and Update! (Firmware)
Raspberry Pi Part 8: First Download and Update! (Firmware)
sentdex
56 Raspberry Pi Part 10: How to set up a Linux Web Server on your Pi
Raspberry Pi Part 10: How to set up a Linux Web Server on your Pi
sentdex
57 Raspberry Pi Part 11: Remote Desktop
Raspberry Pi Part 11: Remote Desktop
sentdex
58 Twitter Analysis: How to rank a user's influence
Twitter Analysis: How to rank a user's influence
sentdex
59 GPIO Tutorial for Pi Part 2 - Programming the GPIO
GPIO Tutorial for Pi Part 2 - Programming the GPIO
sentdex
60 GPIO Tutorial for Raspberry Pi Part 1 - Setting up
GPIO Tutorial for Raspberry Pi Part 1 - Setting up
sentdex

This video teaches how to load and preprocess a custom dataset for image classification tasks using Python, TensorFlow, and Keras. It covers the basics of data preprocessing, convolutional neural networks, and model definition for deep learning tasks.

Key Takeaways
  1. Import necessary libraries
  2. Specify data directory
  3. Iterate through categories and images
  4. Convert images to grayscale
  5. Display images using matplotlib
  6. Plot grayscale image
  7. Resize image to 50x50
  8. Create training dataset
  9. Map neural network inputs to image data
  10. Shuffle the data
💡 Using convolutional neural networks can significantly improve the performance of image classification tasks, especially when combined with proper data preprocessing techniques.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →