[MINI] Backpropagation

Data Skeptic · Advanced ·📐 ML Fundamentals ·9y ago

Skills: Supervised Learning80%ML Maths Basics70%

Key Takeaways

The video explains backpropagation, a common algorithm for training neural networks, using a map analogy and discussing its application in supervised learning, including the calculation of error, adjustment of weights, and techniques to prevent overfitting.

Full Transcript

[Music] data skeptic is the official podcast of datas skeptic.com bringing you stories interviews and manyi episodes on topics in data science machine learning statistics and artificial [Music] intelligence I don't know if I have an official list of top 10 algorithms but surely the back propagation algorithm belongs on there because it's very important this is how you train a neural network I came up with a little game that'll help with some of the intuition why don't you tell me what I've placed in in front of you I have two phones in front of me they're both in the maps app well one of them looks like the desert and one of them has a river that says Yellow Stone River so effectively random locations let's just say we're going to do something like the back propagation algorithm not exactly but it's a game it's going to show you the basic concept so what I want you to do is to move these Maps around in very specific steps so in one step you can pinch Zoom if you want by like a lot or a little your choice or you could scroll you know like just move it a lot a little or you could fling you know where it shoots it across the map so let's go in different rounds maybe even count the rounds I want you to focus one of the maps on our current home here in Los Angeles and the other map on your former home where you did some growing up not all of you growing up but a lot of you growing up in North Carolina okay okay so I guess I'll just start with our current home okay I'll just hit the pinpoint button no that's cheating I want you to do it the options I said pinch Zoom fling or just scroll oh okay then I will pinch and zoom out all right that's a single move just do one pinch I can only do one yeah I mean you can do it as big or as small as you want do as a fling pinch Zoom no combinations just the but I'm just going to do it quickly so can zoom out yeah yeah that's fine now did you zoom out by a lot or a little I think it was a lot well first of all can you tell where it is says Fallon nope go ahead and do the other one make an action there my other phone the other phone yeah which I'm trying to go to you get to pick which phone shows which place but by the end of this and we're on round one you have to have your current and previous home on the maps okay I'll just do the zoom out too yep now I see there Seattle how far would you say each of these maps are from their goal state in terms of Miles yeah roughly speaking well Seattle from North Carolina is probably at least 2,000 so how far from Los Angeles Seattle is closer to La so now let's go to round two and same thing you can do a pinch a fling or a move Zone one I will zoom out again says it's an Indian reserve all right well that narrows it down a little bit it's probably not helpful to you I don't know where all the reserves are all right let's go to the other phone I'm going to fling it South as much as I can cuz La is south of Seattle okay oops a big fling off into the ocean I'm in the ocean okay round three let's do it again okay with the phone one I will pinch the zoom out again now you went a little too far I see the entire continent yeah that's okay Eastward cuz I do not know where I am could be in Mexico I'm not sure though all right now let's reassess again approximately how close are these two phones to their destination well I'm on the right continent all right yeah let a start so I'd say I'm near let's go on to round four phone one I'll zoom in all right so phone one is North Carolina all right so you did one big Zoom you went from the whole continent down to what North Carol the East Coast okay so you got a couple of States though so you did a big Zoom how come you did a big Zoom each of these movements are by a lot because I'm off by a lot zooming in all right you zoomed in from the East Coast down to approximately what now the triangle area all right so you got the whole triangle so that's progress you're closer right okay the other phone I will slipe north and west cuz I believe I'm in Texas all right where did you end up I'm not sure but it says luk oh yeah L Texas Texas I just sent somebody a t-shirt in luk Texas phone number one I will zoom in again so you see the whole triangle now you're going to do a Durham full fingered Zoom yep you're getting closer right how how many moves you think you have left till you get to your home and your I don't actually remember where my home was so I'm not actually sure that part of the analogy breaks down because in most cases in back propagation and in most post neural networks you know the goal state so you can measure the difference between what you have and where you want to be I think I remember I will pinch zoom in to the area I believe it's there all right now I noticed you didn't pinch Zoom the entire screen you held back a little bit how come I was waiting for it to load to see what road it was okay phone number two I will now zoom in to La looks like I could see Baker field in La all right you're getting closer keep going okay phone number one oh I I think I can see our house street so I will zoom in on that street okay well I'm basically on the street of where we live so I'm not sure how close I should get oh get it get exact get as as zoomed in as you can where you focus the screen on your old home going to phone number two I have Irvine and Sam Bernardo all right getting close big Zoom there going back to the North Carolina one all right what's going on there it looks like you might be down to the maximum zoom on phone number one I mean that's the house so I Center it in my next move all right in your next move you'll Center it zooming I could see the highways that are very close to us yeah it's only maybe 50 ft to the left what would you estimate was the distance on your earliest move thousands of miles yeah probably all right keep going you basically solved one phone okay I'm only going to do phone number two now okay I can see our neighborhood streets I'll zoom in again I'll slide it down I'll zoom in again centering it two blocks BLS away so I will zoom in again I could see our house okay I was there my house is you pretty much nailed it here's the analogy I want to draw from this little ordeal I put you through first of all why two phones well in a neural network you have a ton of Weights what do you mean by weights so a neural network is composed of a bunch of neurons and they each have links from one another so they depend on previous neurons and each of those links is weighted for how important they are exactly yeah so so what are the weights on this map the map isn't directly a neural network task the emphasis I want to make here is on how much you change it cuz that's what back propagation is all about so in back propagation you have this neural network and initially you make all the weights random and then you give it some input the input could be like an image or a document or just about anything you want and then you pass it through the network for each neuron you look at all of its input neurons and then you multiply the value by the weight and you come up with some number and you propagate that throughout the whole network until you get to the output and then you compare what you calculated to what the actual value should have been that's what's called the loss function and there's lots of different kinds of loss functions generally you do the sum of the square of the differences so you take your target minus what you calculated which is your output and you square that difference we're going to take a quick break from our show to talk about our sponsor for this week Periscope data I use a lot of different tools on a day-to-day basis yet I find slack is really the centerpiece of communication between my myself and my collaborators we keep our business related slack chatter private just like we keep many of our Periscope data dashboards private security can sometimes be an inconvenience forcing me to adopt slow processes with screenshotting and uploading all kinds of craziness like that that's why I'm excited to tell you about my experience with Periscope data slack integration if I'm in a discussion where a picture truly is worth a th000 words I can cut and paste the URL of the Periscope data dashboard and drop that directly into slack because I have their integration installed slack knows how to handle that URL and also takes care of the security it retrieves our dashboard and publishes it naturally right into the slack Channel allowing all the team to see the data from our dashboarding tool of choice embedded directly in our communication tool of choice if you'd like to try that out for yourself head on over to periscop dat.com Skeptics to start a free trial once again that's periscoped dat.com Skeptics let's talk about the loss function in terms of this little game here if you centered your map precis L on the location you wanted that would be zero loss right because the difference between where you wanted it to be and where it is is no different they're the same place but if the map on your screen doesn't show the destination you're looking for then you can take the distance from what you're showing to where you want to be and that's the error yeah but how does this relate to neural networks let's say we have a neural network that wants to detect if there's a bird in a photo all of the inputs are the pixel intensity values from an image and the output is just a simple yes or no if you initialize it randomly and you give it a picture that either does or doesn't have a bird in it it's going to do a really bad job right because why would random weights come up with anything useful but you say well this answer should have been a yes but it wasn't so now you want to improve all those weights until you get weights that are actually good at making predictions so how do you do that Improvement well you have to do some calculus here did you take calculus yeah do you remember the chain rule no all right well you don't have to remember it for this mini episode that is the real key to understanding back propagation and like actually passing a test on the subject but just to get the high level understanding of what it is do you remember what a derivative is I don't remember I think it had like the shape of a violin like FX uh that's the way you write it the derivative is the rate at which something is changing so if you think of like being in a car you're driving let's say you're at a starting line on a racetrack and then as you start to drive you have a distance from the start that's your position the rate of change is your velocity the rate of the rate of change is acceleration see how that works that's the second derivative but all we really need here is the first derivative you can calculate the error for the total neural network for a given instance how close was it to the correct answer you want to look at all the weights in the system and you say well if this weight changed just a little bit how much does that affect the error so let's say there's a neuron in there that's not really doing anything it's not involved in the actual calculation if you change it you don't end up changing the error very much but let's say you find one neuron that is misbehaving very badly and when you change its weight maybe make it much higher or much lower and suddenly the error goes down well that's a weight you want to improve right what if it's just influential what do you mean it's only sensitive because it matters that sounds profoundly correct but I have no idea what it means can you walk me through that I don't have anything else to say it was just I thought I was what what if the weight was correct that should just never happen if the weight is correct in other words if it's at a good value yeah then changing it could only hurt the overall error right so if it's a reliable weight that's very influential then you don't want to change how it does things so when you update that weight you maybe change it by only a tiny amount or not at all but if there's another neuron somewhere that's doing a really bad job it's like hurting the overall calculation then changing it by some amount should reduce the overall error of the system so then you go through each weight and you say it's kind of like imagine if every weight was a little knob and your job was to like make a light as bright as possible and you just go twist every knob by a little bit you just fiddle with it and if it helps the problem be like oh I need to turn this one up a lot but if it's you know already pretty good you maybe only find tune it a tiny amount it's kind of like the other analogy I thought of do you remember those games we seem to have when we were kids where it's like a 2d surface that's like a maze and you have a marble and then you have these two knobs and one turns the board left you know like side to side the other one turns it left to right kind of mhm and you have to navigate the ball through the home maze mhm and make it not fall in the holes that are there did you ever ball maze ball maze did you ever have one of those growing up no really only the things that came in a Happy Meal really I fig I thought everyone had one of those games at some point how big was it usually they were like 12 by 12 that is big mine were like the size of our cell phone smaller you know what there were some like party favor ones that had a ball bearing inside now that I think of it oh okay so those games similar idea if moving the knob a little bit makes the system overall a little bit better you would move it a little bit but if it needs to go a lot you'd shift it a lot so the whole key to back propagation is going through every weight and finding the gradient with respect to the overall error so it's kind of like saying hey let's think about just this one neuron for a second if I change it does it seem to make this better now back propagation can get stuck in local Minima that's probably a topic for another day but I should just mention they're likely to get you to a good answer but they struggle in plateaus and stuff like that so this is to the best of my knowledge the most popular way of training a neural network so it goes in two phases just to review a little bit first is the forward propagation step you want to give it an input push it through your network and calculate your output based on the current weight you have what output do you calculate then you compare that to the output you'd like to get and probably at the start it's really far off so then then you go through and you start looking mathematically at each weight and you say well what is the local gradient if I change it a little bit how does that affect the error and you move it in the direction that reduces the error and you keep doing that for every neuron and then a new example and you just keep iterating until you Converge on a nice solution do you over correct yeah actually you can because the gradient you're only measuring locally it might seem like oh for where we are now I want to fling the map way across the country or like do fling it really far but then uh when you see that it's gotten you close but maybe you overshot it then you have to go back and fling a little less kind of like golfing that way too right I guess I mean I'm not a golfer but I assume your first swing you just want to get maximum distance maybe you end up a little bit past the T and then you got to progressively make shots that go smaller distances and become more refined and that's what back propagation is doing it's doing that with gradient descent and you can overfit like any model yeah pretty much every machine learning algorithm in the world can overfit back propagation is no exception although there are some very specific techniques developed primarily for this algorithm to help prevent it from overfitting so we'll talk about some of those actually we did we talked about Dropout once before but there's a couple other clever tricks like that too but good instincts yeah overfitting can be a major issue when you have a whole bunch of neurons too much tweaking yeah you over tweaked it that can definitely be the case tweak tweak tweak anyway thanks as always for joining me Linda thank you and until next time I want to remind everyone to keep thinking skeptically of and with data good night Linda good night data skeptic is a listener supported program to support the show visit datas skeptic.com and click on the membership [Music] tab

Original Description

Backpropagation is a common algorithm for training a neural network.  It works by computing the gradient of each weight with respect to the overall error, and using stochastic gradient descent to iteratively fine tune the weights of the network.  In this episode, we compare this concept to finding a location on a map, marble maze games, and golf.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Skeptic · Data Skeptic · 13 of 60

← Previous Next →

Data Skeptic book giveaway contest winner selection

Data Skeptic book giveaway contest winner selection

OpenHouse - Front end and API overview

OpenHouse - Front end and API overview

OpenHouse Crawling with AWS Lambda

OpenHouse Crawling with AWS Lambda

[MINI] Logistic Regression on Audio Data

[MINI] Logistic Regression on Audio Data

Data Provenance and Reproducibility with Pachyderm

Data Provenance and Reproducibility with Pachyderm

[MINI] Primer on Deep Learning

[MINI] Primer on Deep Learning

Big Data Tools and Trends

Big Data Tools and Trends

[MINI] Automated Feature Engineering

[MINI] Automated Feature Engineering

The Data Refuge Project

The Data Refuge Project

[MINI] The Perceptron

[MINI] The Perceptron

[MINI] Feed Forward Neural Networks

[MINI] Feed Forward Neural Networks

Data Science at Patreon

Data Science at Patreon

[MINI] Backpropagation

[MINI] Backpropagation

[MINI] Generative Adversarial Networks

[MINI] Generative Adversarial Networks

[MINI] AdaBoost

[MINI] AdaBoost

[MINI] The Bootstrap

[MINI] The Bootstrap

[MINI] Gini Coefficients

[MINI] Gini Coefficients

[MINI] Random Forest

[MINI] Random Forest

[MINI] Heteroskedasticity

[MINI] Heteroskedasticity

Urban Congestion

Urban Congestion

[MINI] The CAP Theorem

[MINI] The CAP Theorem

Unstructured Data for Finance

Unstructured Data for Finance

Detecting Terrorists with Facial Recognition?

Detecting Terrorists with Facial Recognition?

Predictive Models on Random Data

Predictive Models on Random Data

[MINI] F1 Score

[MINI] F1 Score

Machine Learning on Images with Noisy Human-centric Labels

Machine Learning on Images with Noisy Human-centric Labels

The Library Problem

The Library Problem

Stealing Models from the Cloud

Stealing Models from the Cloud

Data Science at eHarmony

Data Science at eHarmony

Multiple Comparisons and Conversion Optimization

Multiple Comparisons and Conversion Optimization

Election Predictions

Election Predictions

[MINI] Calculating Feature Importance

[MINI] Calculating Feature Importance

MS Connect Conference

MS Connect Conference

The Police Data and the Data Driven Justice Initiatives

The Police Data and the Data Driven Justice Initiatives

Studying Competition and Gender Through Chess

Studying Competition and Gender Through Chess

[MINI] Goodhart's Law

[MINI] Goodhart's Law

Trusting Machine Learning Models with LIME

Trusting Machine Learning Models with LIME

Predictive Policing

Predictive Policing

Mutli-Agent Diverse Generative Adversarial Networks

Mutli-Agent Diverse Generative Adversarial Networks

[MINI] Convolutional Neural Networks

[MINI] Convolutional Neural Networks

Unsupervised Depth Perception

Unsupervised Depth Perception

[MINI] Max-pooling

[MINI] Max-pooling

Activation Functions

Activation Functions

[MINI] The Vanishing Gradient

[MINI] The Vanishing Gradient

Estimating Sheep Pain with Facial Recognition

Estimating Sheep Pain with Facial Recognition

[MINI] Conditional Independence

[MINI] Conditional Independence

MINI: Bayesian Belief Networks

MINI: Bayesian Belief Networks

Project Common Voice

Project Common Voice

[MINI] Recurrent Neural Networks

[MINI] Recurrent Neural Networks

This video teaches backpropagation, a key algorithm in training neural networks, by explaining how it calculates error and adjusts weights to minimize it, and discusses techniques to prevent overfitting. It provides a foundational understanding of the concept and its application in supervised learning. By watching this video, viewers will gain a deeper understanding of backpropagation and its role in neural networks.

Key Takeaways

Use a map analogy to understand backpropagation
Calculate the error for the total neural network for a given instance
Adjust the weights to minimize the error
Go through each weight and find the gradient with respect to the overall error
Apply the chain rule to understand backpropagation
Use techniques like Dropout to prevent overfitting

💡 Backpropagation is a powerful algorithm for training neural networks, but it can get stuck in local minima and may lead to overfitting if not properly regularized.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Supervised Learning

View skill →

Auto Machine Learning (AutoML) Using AutoGluon

Auto Machine Learning (AutoML) Using AutoGluon

Coding the SARIMA Model : Time Series Talk

Coding the SARIMA Model : Time Series Talk

Code With Me : Logistic Regression (from scratch) !

Code With Me : Logistic Regression (from scratch) !

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Predicting the Winning Team with Machine Learning

Predicting the Winning Team with Machine Learning

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Related AI Lessons

How to Learn a Hard Technical Skill Without Burning Out

Learn how to acquire hard technical skills without burnout by creating a sustainable learning plan

Dev.to · Anas Kalthoum | FreeBrain

After interviewing over 100 ML Candidates. Last Week Someone Walked In and Made Me Take Notes.

Learn what makes a standout ML candidate after interviewing over 100 applicants

Medium · Machine Learning

How AI Learns with Less Labeled Data

Discover how AI can learn with less labeled data, a crucial aspect of machine learning beyond model selection

Medium · Machine Learning

Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2

Learn the basics of the TypeScript compiler to write better JavaScript code

Medium · JavaScript

Learn Deep Learning by Hand (Beginner's Guide - Part 1)