Neural Network from Scratch - Machine Learning Python
Key Takeaways
The video demonstrates a from-scratch implementation of a neural network in Python using the numpy library, covering weight initialization, backpropagation, and parameter updates. It also showcases the creation of a three-layer neural network with 25 nodes in the first hidden layer.
Full Transcript
[Music] what's going on guys hope you're doing absolutely amazing so in this video we're gonna code a neural network from scratch to really understand the theory by neural networks you need some math and I've made all the derivations for what we're implementing today in a blog post as well as three previous videos going through all the theory so if you want to learn more about that check out those videos so in this video we're just gonna focus on the implementation and I will bring up the relevant parts as we go through it so you might be able to follow even if you haven't had those especially if you have some math background but so what we're gonna do is we're gonna import numpy as an MP and then we're gonna do from utils import create dataset and plot contour so these are just two health functions I've created here I've there on my github if you want to read through them but it's just really just to generate some data to make it nice to look at and then we're gonna do class neural network and we're gonna have an init we're just gonna send the X&Y so our data set let me just also write the other sub functions that we're gonna have just to sort of have a skeleton code we're gonna have one function to initialize weights particularly we're gonna use timing initialization which takes sort of the the nodes in the previous layer and then the nodes in the the next layer so a special type of initialization you can also just do some normal like normally distributed weight but this one has been proven to work better then we're gonna do a for prop we're gonna send in X and parameters and all this gonna make sense as we actually write this part but just for the skeleton code that we're gonna do compute cost and we're gonna do Y probs and parameters then we're gonna do a back prop and for the back prop we're gonna send in a cash parameters and Y so I'm not expecting you to know why we send in those it's gonna make sense we're gonna use them and the last thing we're gonna do is we're gonna do update parameters so this is gonna be you know gradient descent step and then we're gonna have a main function that takes in those X&Y and then we're gonna do it for some number of iterations let's say 10,000 okay that's gonna call all of these functions right here so in size weight and we're gonna have for propagation of course and then we're gonna compute the cost so that's sort of the end of the forward prop but we do it in a separate function and then we're gonna have back prop gradient descent step and then a main function that's gonna call all of those functions and yeah we're also gonna have so we're gonna run neuro network from here and yes we're going to do x and y create data set let's say we have three hundred points and class is three classes for example so that's just the util function that I created previously all right so let's start with the init function we take as input x and y so we're gonna do X dot shape and X shape is the structure of the the data for this to work is to have the training examples so we're gonna use em for the amount of training examples as their as the amount of rows that we have and then self dot n are the number of features okay so M training examples and features then we're gonna do self dot lammed so that's for the regularization part and then we're gonna use self dot learning rate and we're just gonna set this to 0.1 and then we're gonna define size of neural network and so what we're gonna implement here is a three layered neural network and that's also what the derivations I've done in the blog post is based on I might do a a sort of a entirely generalized nor network in the future let me know if you'd be interested to see how that looks like so you would kind of just say what nodes you want and how many layers and it would automatically do it for you anyways we're gonna use in the first hidden layer we're gonna use twenty five nodes and then in the last one so the last one is going to be the number of classes that we have right one way to do that is we can you use the length of n P dot unique of Y so this will kind of check how many unique values are input so yeah you can check out how that works really nothing complicated and and then yeah so we're gonna do climbing initialization and so W will be NP dot R and up and Rand N and it's gonna be let's see the shape is gonna be l0 l1 so that's the the so that's the amount of nodes it's coming from and then the amount of nodes it's going to in an ex-player so the previous layer and then the next layer and then we're going to times MP dot square root of two divided by l0 so that that's just timing how it's defined then we're gonna use be to BMP zeros of l1 comma l1 then we're just going to return those then let's see we're gonna go to the forward prop and so parameters here is gonna be a cash I mean a dictionary and we're gonna store all the necessary weights and biases in those parameters so what we're gonna do is we're going to W is parameters of W 2 W 1 will be parameters W 1 and then B 2 will be yeah sort of the same but B 1 B 2 instead and then B 1 like this so how we're using W 1 here is it's gonna take that's the one that goes so the weights are W 1 are the weights that goes from the input layer to hidden layer 1 okay W 2 will take sort of the input from layer 1 to layer 2 okay so then the 2 here is representing the layer that it's going to yeah so we're gonna do for a prop prop we're gonna do we're gonna initialize a 0 to B X just to make it sort of nice we're gonna set one to be NP got a 0 and then W 1 plus B 1 ok that's just the the linear part and then we're gonna use a 1 is gonna be the relu applied so NP duck maximum of 0 comma z 1 and then said to said 2 will be MP dot of a 1 and W 2 plus B 2 so sort of note here why we wrote a 0 here because it's a the exact same form as we do here we just increase one layer right so this is the input layer this is the come this is the result from the first layer and this is the result from the second layer and since second hidden layer rather sorry so it's a total of three layer and then what we're gonna do is we're gonna do as last we're gonna do use soft max because we're now on the last layer the output layer so you're gonna do scores is said to then we're gonna use soft max which is we're gonna do expo scores and then we're gonna do finally profit it's gonna be X scores divided by MP sum of X's of course comm axis one and then we're going to keep x equals true so we're just so what we're doing here is we're taking the x of all of the scores so remember we have and the final one will be sort of the the let's see we're going to have scores to be the number of training examples that we send in perhaps in them in a batch or if we just send in the entire training data and then it's gonna be comma the number of classes so num classes so this will be the shape and so we're taking the exponent earase to each element wise of the number of classes and then what we're doing here we're just normalizing so that we can imagine we can say that each node is sort of a representing the the probability that it's that particular class essentially the sum of all of the classes will be one and then in the end so we've done we're not done with the four prop and we're gonna store some things that we're going to use in the backdrop so we're going to do a zero is X we're going to do probs is props and then a one it's just a one then we're gonna return cash and props alright fair enough the next thing we're gonna do is we're gonna compute the cost and first thing we're going to do we're just going to take w-2 from parameters of W 2 W 1 parameters of W 1 and then we're gonna sort of compute the loss in two parts we're gonna have a regularization part and a sort of the loss from the from the actual data so we're not gonna have data loss which is gonna be MP sum of - and P log of prop so this is cross-entropy loss if you're familiar with that and then we're gonna do NP dot arrange self dot M comma Y and so we're only taking the log of the so all of the training examples but we're only taking the log of the one that is support was supposed to be the correct class and that's how cross entropy is defined then we're gonna do divide by self dot m and that's the data loss so we're going to then use the regularization loss which is going to be 0.5 times the regularization parameter set that lamp then we're gonna use MP dot sum of w1 times w1 and then plus 0.5 times the self dot lammed times MP sum of W 2 of w2 and then the total cost will then be the data loss plus the regularization loss and we're just gonna return the total cost all right now perhaps for the most so I guess the hardest part I'm just gonna copy in these so we're just gonna unpack our parameters and then I'm gonna copy this as well well we're just gonna unpack from the forward prop so there are some computation that we're gonna need when doing the back prop from the forward propagation and we're just gonna take them out from the cache that we send in here then what we're gonna do so we're gonna start with calling DZ 2 which is the cost with respect to do to Z 2 I'm just writing it here for more compact notation we're gonna set this to probs so remember we're now starting at sort of the end that we computed during the for prop right that's what we computed as our last step in the forward in the backward we're just starting from that and we're going one step at a time backwards until we get to two you know W 1 W 2 and then B 1 B 2 and so what we want from this is we want so these two I mean DW 1 DW 2 DB 1 DB 2 those are the parameters that impact our models probable predictions right so what we're gonna do then is doing the the derivations as I've done in in other videos and in my blog post it's it's gonna end up pretty nice but we're so did the gradient the the loss with respect to the the props or DZ 2 is gonna be - equal to 1/4 for the specific photo for altering examples but only minus 1 if for the correct node for the correct class and then we're gonna divide everything by self dot m and then we want to now when we have computed those we came back prop 2 d w2 and DB - I can show sort of a graph of how this looks like and where we're currently at and in in the graph and now we want DW 2 which is going to be NP dot of a1 transpose the one that we brought from the cache and DZ - and then we're gonna plus self-taught lamp times w2 yeah so that's just DW 2 DB - will be the MP sum of D Z to X is equal zero and then keep Tim's equals true now that we've done that we're gonna go to the next so the the day in this case the only hidden layer we're gonna tease that one so you can sort of view it as a graph and we're just moving one step at a time backwards I'm going to MP dot of these at two W transpose again all these derivations are in my blog and I'm gonna show them as well as to perhaps make it more clear then we'll get dessert 1 times a 1 greater than 0 so this is just the the Rattler right if it's a if it's greater than 0 it's gonna be 1 so this is just gonna be 1 or 0 and then we're ready to back prop to the W 1 we're just gonna be pretty much exactly like this one so we can we can copy that one we're gonna change it to 1 and we're gonna change this to 0 and then we'll change this to 1 and then 1 and the B 1 is gonna be sort of the same as this one except we're gonna have these add one here all right so now we computed all of the gradients right we want all of those and that's exactly what we've what we have now and then we have grads so we have TW 1 to be DW 1 DW 2 to be DW 2 DB 1 DB 1 DB 2 tb2 and so we want to return our the grads awesome so that's this is the hardest part of the entire neural network so what we're gonna do now in the update parameters this is gonna be I guess pretty easy we're just gonna first we're gonna I'm gonna copy this in we're gonna get all of the the parameters and we're gonna get all of the gradients and remember gradients step the gradient step it's just gonna be W to minus equals learning rate so I guess self thought learning rate times w2 and see the same thing for for for W 1 and W 1 and then also similarly for B 2 and b1 right so those are sort of DD now updated values of those and we're gonna store all of those in parameters in a in a dictionary and we're going to do C W 2 W 2 P 1 P 1 P 2 P 2 and then return parameters right so now we've done sort of all of the functions that lead up to the to the actual main function and what we're gonna do here first of all we need to define I mean run the initialization so W 1 V 1 is itself that init timing wait I guess that's what we call it right in it climbing weights yeah and we're gonna send in the first so that's the number of features right for W 1 is gonna take the input layer to the first hidden layer and it's it's gonna take so the input layer self dot n and then the next one is self dot H 1 and then W 2 and B 2 will be the same thing right here except we're now gonna send the in self dot H 1 to septa H to write H 2 in this case is the number of classes that's the output layer we're gonna pack everything let's bring this down so it's more centered we're gonna have parameters now we're gonna send in everything to this one see B 1 B 1 B 2 B 2 great so now when we have all of that we're gonna do for iteration in range of num eater and then let's do plus 1 just for a nice plot since it doesn't yeah anyways so we're gonna run it for a specific number of iterations in this case 10,000 I guess since we added one it's gonna win a ten thousand and one but since we're gonna plot the the cost sometimes it's gonna be nice if we just had a close one right there it's not important though and then we're gonna do sort of we're gonna do the for prop and so this is kind of forward prop so this should be easy now right we've all made we've already made the functions so we all need to do just cash props is the output from the for prop and we're going to send in X and the parameters then we're gonna calculate cost and the cost will just be self dot compute cost it needs why probes and parameters then sometimes so if iteration is 2500 we're gonna print at iteration iteration we have a loss of cost and then after computing cost we're gonna do the back prop so the gradients are self-taught back prop of cache parameters and why and then we're gonna update pram parameters so parameters it's gonna be self dot update parameters of parameters and grads and that's sort of it right at the end of the main function we perhaps want to return parameters and yeah so hopefully there are no errors now let's now let's first of all let's just do yeah let's run this first and we're gonna do see the plot that's okay maybe it's already so right here so yeah this is how the data looks like so it's three classes and now what we're gonna do is we're gonna do first we're going to do y is y as type int it's just I can show you it I think it's needed for let's see this part right here when we're indexing indexing right like this in a in a matrix this needs to be integer and not float so that's why we're doing as type int here then we're gonna do NN is no network we're just gonna initialize our class we're gonna send in x and y then we're gonna do train parameters is n n dot main of X&Y and x and y and then we're gonna call plot contour of XY and we're just an NN and this is not really important this is just for the nice plot so what this is gonna do is gonna show the decision boundary that we obtained let's first let's run this first so we can see right here see at iteration 0 2500 we sort of see their loss is decreasing right here and if we look at the plot now we can see that this is the final decision boundary that we obtained which looks pretty nice and we can see that ok it sort of makes sense what it's what it's doing it's finding the pattern in the data so that was a really really I guess quick walkthrough of how to code in your network hopefully it was clear if you have any questions leave them in a comment and I'll do my best to answer them neural networks are hard and that's why I've done three previous videos and a separate blog post just for the theory of this and thanks for watching the video and hope to see you in the next one [Music]
Original Description
From scratch implementation of neural network in Python using only the numpy library. There's a lot of mathematics behind Neural Networks particularly for back-prop and I've made previous videos going through the mathematics for NN and a blog post to enable us to focus on the implementation in this video.
❤️ Support the channel ❤️
https://www.youtube.com/channel/UCkzW5JSFwvKRjXABI-UTAkQ/join
Paid Courses I recommend for learning (affiliate links, no extra cost for you):
⭐ Machine Learning Specialization https://bit.ly/3hjTBBt
⭐ Deep Learning Specialization https://bit.ly/3YcUkoI
📘 MLOps Specialization http://bit.ly/3wibaWy
📘 GAN Specialization https://bit.ly/3FmnZDl
📘 NLP Specialization http://bit.ly/3GXoQuP
✨ Free Resources that are great:
NLP: https://web.stanford.edu/class/cs224n/
CV: http://cs231n.stanford.edu/
Deployment: https://fullstackdeeplearning.com/
FastAI: https://www.fast.ai/
💻 My Deep Learning Setup and Recording Setup:
https://www.amazon.com/shop/aladdinpersson
GitHub Repository:
https://github.com/aladdinpersson/Machine-Learning-Collection
✅ One-Time Donations:
Paypal: https://bit.ly/3buoRYH
▶️ You Can Connect with me on:
Twitter - https://twitter.com/aladdinpersson
LinkedIn - https://www.linkedin.com/in/aladdin-persson-a95384153/
Github - https://github.com/aladdinpersson
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Aladdin Persson · Aladdin Persson · 49 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
▶
50
51
52
53
54
55
56
57
58
59
60
computeCost.m Linear Regression Cost Function - Machine Learning
Aladdin Persson
gradientDescent.m Gradient Descent Implementation - Machine Learning
Aladdin Persson
Neural Network from scratch - Part 1 (Standard Notation)
Aladdin Persson
Neural Network from scratch - Part 2 (Forward Propagation)
Aladdin Persson
Neural Network from scratch - Part 3 (Backward Propagation)
Aladdin Persson
Neural Network from scratch - Part 4 (With Python)
Aladdin Persson
sigmoid.m - Programming Assignment 2 Machine Learning
Aladdin Persson
costFunction.m - Programming Assignment 2 Machine Learning
Aladdin Persson
predict.m - Programming Assignment 2 Machine Learning
Aladdin Persson
costFunctionReg.m - Programming Assignment 2 Machine Learning
Aladdin Persson
lrCostFunction.m - Programming Assignment 3 Machine Learning
Aladdin Persson
oneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
predictOneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
predict.m - Programming Assignment 3 Machine Learning
Aladdin Persson
Caesar Cipher Encryption and Decryption with example
Aladdin Persson
Cryptography: Caesar Cipher Python
Aladdin Persson
Vigenere Cipher Explained (with Example)
Aladdin Persson
Cryptography: Vigenere Cipher Python
Aladdin Persson
Hill Cipher Explained (with Example)
Aladdin Persson
Cryptography: Hill Cipher Python
Aladdin Persson
Interval Scheduling Greedy Algorithm: Python
Aladdin Persson
Weighted Interval Scheduling Algorithm Explained
Aladdin Persson
Weighted Interval Scheduling Python Code
Aladdin Persson
Sequence Alignment | Needleman Wunsch Algorithm
Aladdin Persson
Sequence Alignment | Needleman Wunsch in Python
Aladdin Persson
Codility BinaryGap Python
Aladdin Persson
Codility CyclicRotation Python
Aladdin Persson
Derivation Linear Regression with Gradient Descent
Aladdin Persson
Linear Regression Gradient Descent From Scratch in Python
Aladdin Persson
Pytorch Neural Network example
Aladdin Persson
Pytorch CNN example (Convolutional Neural Network)
Aladdin Persson
Pytorch LeNet implementation from scratch
Aladdin Persson
Pytorch VGG implementation from scratch
Aladdin Persson
Pytorch GoogLeNet / InceptionNet implementation from scratch
Aladdin Persson
How to save and load models in Pytorch
Aladdin Persson
How to build custom Datasets for Images in Pytorch
Aladdin Persson
Pytorch Transfer Learning and Fine Tuning Tutorial
Aladdin Persson
Pytorch Data Augmentation using Torchvision
Aladdin Persson
Pytorch Quick Tip: Weight Initialization
Aladdin Persson
Pytorch Quick Tip: Using a Learning Rate Scheduler
Aladdin Persson
Pytorch ResNet implementation from Scratch
Aladdin Persson
Pytorch TensorBoard Tutorial
Aladdin Persson
Pytorch DCGAN Tutorial (See description for updated video)
Aladdin Persson
Naive Bayes from Scratch - Machine Learning Python
Aladdin Persson
Spam Classifier using Naive Bayes in Python
Aladdin Persson
K-Nearest Neighbor from scratch - Machine Learning Python
Aladdin Persson
Linear Regression Normal Equation Python
Aladdin Persson
SVM from Scratch - Machine Learning Python (Support Vector Machine)
Aladdin Persson
Neural Network from Scratch - Machine Learning Python
Aladdin Persson
Pytorch RNN example (Recurrent Neural Network)
Aladdin Persson
Pytorch Bidirectional LSTM example
Aladdin Persson
Pytorch Text Generator with character level LSTM
Aladdin Persson
Logistic Regression from Scratch - Machine Learning Python
Aladdin Persson
K-Means Clustering from Scratch - Machine Learning Python
Aladdin Persson
Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files
Aladdin Persson
Pytorch Torchtext Tutorial 2: Built in Datasets with Example
Aladdin Persson
Pytorch Torchtext Tutorial 3: From Textfiles to Dataset
Aladdin Persson
Paper Review: Sequence to Sequence Learning with Neural Networks
Aladdin Persson
Pytorch Seq2Seq Tutorial for Machine Translation
Aladdin Persson
Pytorch Seq2Seq with Attention for Machine Translation
Aladdin Persson
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
How to Learn a Hard Technical Skill Without Burning Out
Dev.to · Anas Kalthoum | FreeBrain
After interviewing over 100 ML Candidates. Last Week Someone Walked In and Made Me Take Notes.
Medium · Machine Learning
How AI Learns with Less Labeled Data
Medium · Machine Learning
Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2
Medium · JavaScript
🎓
Tutor Explanation
DeepCamp AI