0. What is machine learning?
Key Takeaways
This video introduces the basics of machine learning, covering topics such as the difference between AI, machine learning, and deep learning, and explaining key concepts like linear regression, overfitting, and loss functions, using tools like scikit-learn and Kaggle.
Full Transcript
today we're gonna talk about the big picture what is machine learning what is deep learning how does it really work and where can we apply it and unlike some of the other videos that we're doing here this isn't just for engineers this is really for anyone that wants to get a deeper understanding about how machine learning actually works and there's tons of videos out there that talk about various aspects of machine learning but the gap that I want to fill in is really showing people where they can apply machine learning because it applies to so many things but it doesn't apply to every single thing on the planet and I think you really need to kind of have a sense of how it actually works behind the scenes because if you just think if it is magic it's really unclear where I should be thinking ah that's a machine learning problem and that's not a machine learning problem we have to acknowledge there's an incredible amount of hype around machine learning right now and one of the manifestations of that may be while you're watching this video is because the people that understand how to do machine learning are being paid huge salaries right now startups are being acquired left and right not for the technology but just for their machine learning and deep learning expertise even McKinsey thinks this is going to be a huge market not tomorrow but in the in the near near future so I think it's important to back up and ask ourselves and talk about what can machine learning really do right now today the best resource for all the applications is a blog written by Siobhan Zyliss where she covered this in depth but I want to go through a range of the applications that are out there so first of all the TSA now says that deep learning can find weapons on passengers better than human agents deep learning can count your cells and it can look for cancer in a biopsy it can find endangered animals and aerial photos it can automatically detect weeds on farms from tractors and it can help you build crazy robots that impress your friends so all these different examples come from different industries and involve incredibly different types of inputs and outputs so you might be really surprised to learn that machine learning the whole disap and actually has an incredibly restrictive API or data type that it needs for inputs and outputs and so actually getting things like audio and images and text into the format that machine learning takes and actually interpreting the very restrictive format that it outputs for your application is a huge piece of machine learning that no one really talks about and it's what we're gonna talk about for the rest of this video first let's get our definition straight because there's a lot of confusion so deep learning is a type of machine learning maybe the most exciting type of machine learning right now machine learning is a discipline of artificial intelligence probably the most exciting field in artificial intelligence right now and so all of the use cases that I gave are actually machine learning problems and what we care about today is machine learning most AI departments right now focus on machine learning because it's the part of AI that's really working I think of machine learning as statistics applied to AI so here's the canonical machine learning problem we have a picture of a cat we want to do some black magic and somehow classify our picture as a cat so how does that work so in order to answer that question let's back up a second and talk about the canonical statistics problem there's many data sets that could have used but for some reason I use the data set of baby chickens where I have their ages and weight in machine learning these examples would be called training data imagine we're a farmer and we want to predict from our data that we've collected if we have a baby chicken that's 18 days old how much would we expect it to weigh here we're gonna build a model to fit to our training data to answer that question you may not have had exactly this problem before but you may have done something like this you can actually do it in Excel and it's called linear regression if you've ever made a trendline through your data it's probably using linear regression we can actually plot these points and these are the ages and the weights of the chickens and this is our training data that we use to build a model this line this trend line this linear regression actually makes predictions for any age so we can look at eight ten days in the x-axis and see that the line is at 170 on the y-axis so our models predicting that a baby chicken will weigh a hundred and seventy grams when it's 18 days old now that's linear regression but we can do fancier things too even with this tiny data set in this case I fitted an exponential curve and it makes a slightly different regression you might ask yourself do you think this line models the data better okay now here's another valid regression I did where I fit a more complicated equation this line doesn't look to me like it models the data very well but it goes through every point we have meaning that it models the training data perfectly so what's happening here is something called overfitting I'm a complicated line went through all the points perfectly but it won't generalize as well to new points that we haven't seen before and as models become more complicated they tend to overfit we don't usually worry too much about overfitting in a statistics 101 class with linear regression because it's such a simple model that it's hard for it to overfit but as our equations get more complicated and our data gets more complicated overfitting becomes more and more of an issue the graph on the left is modeling the data in a simple way but it's probably missing some of the pattern in the data the graph on the right is actually touching all the dots meaning that it's perfectly fitting the training data but it's probably overfitting the training data the graph in the middle gets closer to the training data points but doesn't get as close to the graph on the right but maybe models the data better than the graph on the left as we collect more and more data in the world we're able to build more and more complicated models deep learning really describes a trend towards extremely complicated equations with potentially millions of parameters and millions of data points so how complicated should we make our models and how should we constrain them to keep them from overfitting that's what machine learning research is really all about so these single graphs might seem like toy problems but predicting one variable from one other variable can get really complicated on its own if you could predict where this graph of the stock market is going better than anyone else you can make a money so another fundamental question in deep learning is actually what are we optimizing these two lines both try to go as close as possible to the points in the graph or all the points in the training data but they actually of a different definition of close one of the lines here is the line with the smallest sum of the vertical distance from the line to all the points while the other is optimizing the smallest squared sum of the vertical distance it might seem like a small difference but clearly these graphs look very different if you just do a default regression you're usually optimizing the square of the distance which you also may know as the squared error can you tell which line is optimizing which metric if you want you could pause the video and think about it because I'm about to tell you okay you're back I'm sure you thought deeply about which is optimizing squared which is optimizing absolute error and you probably concluded that the left is optimizing the squared error and the right is optimizing the absolute error when you optimize the squared error the outliers actually affect the line a lot more pulling it away from the majority of points to me the left doesn't look as good of a fit as the graph on the right but actually if you model this in Excel or any normal stats program it'll probably default to the graph on the Left which actually might not be what you really want so which model is really better it actually depends on what you're doing and what's happening downstream from your model in all the graphs we've looked at so far we only have one input and one output but usually we have more than one input so here we have not just the age of the chicken we add the type of diet that it was exposed to encoded as a number now from those two variables we want to predict weight this is something we actually still can do in Excel we have input training data and output we can use linear regression but it's harder to visualize what's going on this makes it actually easier to overfit and we're getting closer to what's traditionally thought of as machine learning and deep learning back to our cat classification problem what does this have to do with what we've been talking about so far here's the statistics API and it's very strict we have an input that has to be a fixed list of numbers in our first example it was a single number in our multivariable regression example it was two numbers check age and check diet our model also outputs a fixed list of numbers so far all of our models have output only one number the way we generate our statistical models is by feeding in a set of examples in machine learning this is usually called training data and the examples always have the fixed inputs and outputs in the case of the chicks we fed in five lists of age and weight and built a linear regression model it turns out and a lot of people are surprised by this the machine learning API is identical to the statistics API we usually have more than one input and we often have more than one output but the inputs and outputs still have to be fixed length lists of numbers and behind the scenes we're still just generating a model from the training data just like with linear regression the model is just an equation but in machine learning we think of it as a very complicated equation we search for the best model according to some metric or some loss function but often that metric is just squared error in the same way that we use it for linear regression training is just searching for the best model according to some metric or loss function and often that loss function is just squared error the same as we normally use for linear regression so what are some of the machine learning techniques besides linear regression and why would we pick one over the other it actually depends a lot on what kind of overfitting we're worried about how much training data we have and how many input and output variables there are a very popular and useful Python library called scikit-learn that you may have used actually built a fantastic flow chart that summarizes five years of grad school and helps you pick the best possible model based on these aspects of your training data another way to think about model to use is to look at what other people are doing kaggle a super cool data science platform had a survey of all its machine learning practitioners and asked them which techniques they use in their day-to-day jobs you've probably heard lately that neural networks are becoming popular but good old logistic regression which is just a modification of the linear regression we were talking about earlier it's still really the most commonly used technique another popular machine learning technique you may have heard of is called decision trees one thing I like about this algorithm is it's really easy to explain what it does is it picks one of the input variables and it chooses a threshold to say if the variables above the threshold go left and if the variables below the threshold go right and then at the leaves of this tree it predicts a specific number for the output a popular and useful variant of the decision tree algorithm is called decision forests or random forests when we use decision forests we actually build up hundreds or thousands or tens of thousands of decision trees we like each of those trees make a prediction and then we aggregate the predictions in some way neural networks are another type of model that recently has become very popular and has had a lot of breakthroughs so we're gonna go really deep on it in this video series and when we talk about deep learning we're usually talking about neural networks but one thing I really want to demystify here is despite the evocative name neural networks they're really just an equation like anything else and the inputs and the outputs are just like all these other machine learning techniques and statistical techniques that we've been talking about ok so how do we get this cat problem into this machine learning API that we keep talking about this picture of a cat is not a fixed array of numbers and the output is definitely not a fixed array of numbers so first we have to turn the cat image into a fix like the numbers and we can do this by taking the red green and blue values from each pixel and putting them into a long list then we have to make our network output something we can interpret as a label light cat one way to do this is to set up our network to output a number for any particular type of image we might see here we have a network that's outputting four numbers a cat score a fish score a dog score and an other score and we're gonna interpret one in the cat score to me that the image is a cat now we have a machine learning problem in the machine learning API so now behind the scenes we can build a neural network we can build linear regression or we can build a decision tree or anything else we might want to solve this machine learning problem actually though there's one more important step which is that we need to find more images of cats this is called training data collection and it's often the most important and usually overlooked step these models are just mathematical equations they have no common sense built into them all they can do is find patterns in the numbers so for example if all the cats in our training data look the same no machine learning model will be able to figure out what a cat actually is we also need to find examples of anything else that we want to classify it training that it is actually so important to machine learning and so important to me personally that over a decade ago I started a company called figure 8 that helps companies collect training data if you need more training data and if you're doing machine learning you probably need more training data you actually could check out figure it and use it or you could try one of its vastly inferior competitors it might seem trivial to turn images into a fixed array of numbers from just using the bitmap values but what about something like speech what if we want to build a mini Alexa that classifies sounds it's a hello or goodbye feel free to pause the video and ponder how we would do this it turns out there's no real consensus on the best way to turn audio into numbers but one trendy way to do this now is to just use the waveform of the sound as a list of numbers now there's one problem with this which is that all the sounds will be different lengths and actually all the arrays in our data have to be the same length but one simple obvious way to deal with this is truncate the sounds to a fixed length of time or assume that the sounds are completely quiet once the utterance is complete there are actually several other common ways to do this transformation and it turns out that the transform a in itself from the data into this very constrained API of machine learning is often the most critical choice in building a machine learning model what if we don't have audio or video what if we have text oftentimes companies want to classify it tweets about them as being positive or negative about their brand I actually have a video later on that that goes into detail about exactly how to do this in build models but for now let's just think about how we transform that text into numbers again amazingly there's no real consensus on how to do this transformation one very common approach is actually to make a list of all the words in the English language or whatever languages your text is in and count the number of times each word occurs into your document you end up with a list of lots of zeros but actually fits our criteria it's always the same length and it's always full of numbers here's a harder one this is common in self-driving cars we want to look at every single picture an image and classify what objects each pixel corresponds to so for example you can't just say there's a road in the image we have to say which parts the image are the road and which parts of the image are a sidewalk so here's an example image and here's actually an example output how does this work once again there's more than one way to do it and this video will probably be soon out-of-date but the most common way to do it right now is to literally treat the input of numbers and the output of numbers as arrays of the same length so in this case the output numbers are actually labels of what's given in each pixel here's an even trickier one bounding boxes we want to put boxes around the things in the image that we care about there could be any number of things that we care about but remember our output has to be a fixed length one way to do it and there are actually other good ways to do it but one way is to generate a candidate list of possible boxes and then run a classifier that looks at the pixels in an image and the candidate box itself and classifies not only what's in the box but whether or not that box is a good image that should be considered a bounding box a downside of this method is that you have to consider potentially thousands or millions of classifications per image you may need to look at that last part a few times to really get it I know that I had to but the key takeaway here is that framing the machine learning problem really really matters for example with object recognition the way we frame the problem earlier you have no chance of seeing an object or recognizing an object that you've never seen in your training data so how would you ever recognize an object that you haven't seen before people can do this one way to possibly recognize an object that you haven't had in your training data is to actually frame the problem instead of recognizing a single object as recognizing if two objects are the same so now our input is actually two objects and the classification task is are these two objects the same thing are not the same thing this is called a pairwise classifier and this actually sometimes can classify objects that's never seen before like the eggbeater in this diagram voice recognition identifying endangered animals and aerial photography building crazy face recognizing drones what are these applications all have in common why do we think of them as machine learning applications it's because we're able to fit them into this very constrained specific API that's common to all machine learning and deep learning problems and so if you're thinking okay is my problem suitable for machine learning or deep learning what you should be asking yourself is can I turn it into this kind of problem where I have a fixed length of numbers as input and if fixed like the numbers is output and can I collect training data or examples to show my model to build my machine learning system if the answer to those questions are yes then you really do have a machine learning problem and hopefully that's got you excited enough about all the applications of machine learning that you want to watch further videos that explain actually how to build these models and how to deploy these models and we're gonna keep creating these videos so you should probably subscribe so that you're the first to know when a new video comes out
Original Description
What is the difference between AI, Machine Learning and Deep Learning? Why does machine learning matter? What can it do, and perhaps more importantly, what can’t it do? Get started by looking at your first machine learning model and learning about multivariate linear regression, overfitting, loss functions and the machine learning API.
Shivon Zilis's blog: http://www.shivonzilis.com/
See all classes: http://wandb.com/classes
Weights & Biases: http://wandb.com
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Weights & Biases · Weights & Biases · 1 of 60
← Previous
Next →
▶
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
0. What is machine learning?
Weights & Biases
1. Build Your First Machine Learning Model
Weights & Biases
Intro to ML: Course Overview
Weights & Biases
2. Multi-Layer Perceptrons
Weights & Biases
3. Convolutional Neural Networks
Weights & Biases
Weights & Biases at OpenAI
Weights & Biases
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
4. Autoencoders
Weights & Biases
5. Sentiment Analysis
Weights & Biases
6. Recurrent Neural Networks [RNNs]
Weights & Biases
7. Text Generation using LSTMs and GRUs
Weights & Biases
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
Introducing Weights & Biases
Weights & Biases
10. Seq2Seq Models
Weights & Biases
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
14. Data Augmentation | Keras
Weights & Biases
15. Batch Size and Learning Rate in CNNs
Weights & Biases
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
Organizing ML projects — W&B walkthrough (2020)
Weights & Biases
Brandon Rohrer — Machine Learning in Production for Robots
Weights & Biases
Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Weights & Biases
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
Testing Machine Learning Models with Eric Schles
Weights & Biases
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
Rachael Tatman — Conversational AI and Linguistics
Weights & Biases
Reformer by Han Lee
Weights & Biases
Sequence Models with Pujaa Rajan
Weights & Biases
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Weights & Biases
Jack Clark — Building Trustworthy AI Systems
Weights & Biases
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
Made with ML - Goku Mohandas
Weights & Biases
Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2
Medium · JavaScript
Stop Overfitting With Basically One Line of Code
Medium · AI
Stop Overfitting With Basically One Line of Code
Medium · Machine Learning
Stop Overfitting With Basically One Line of Code
Medium · Data Science
🎓
Tutor Explanation
DeepCamp AI