Pytorch RNN example (Recurrent Neural Network)
Key Takeaways
This video demonstrates how to implement a simple Recurrent Neural Network (RNN) using PyTorch, including defining the RNN architecture, initializing parameters, and training the model on the MNIST dataset. The video also explores the use of Gated Recurrent Units (GRU) and Long Short-Term Memory (LSTM) networks for sequence classification.
Full Transcript
ladies and gentlemen welcome back for another PI torch video in this video I want to show you how to code a simple Arnon as well as how to code a GRU or LS tiem in pi torch so all I have here is the code for a fully connected neural network that we coded in a previous video but I'll just recap it quickly if you haven't watched that one so all that we have is a very very simple fully connected net that's yeah training on the M nice data set so we're just loading M nest we're initializing the network optimizer and we have some training loop and then in the end we're checking the accuracy of how good our model is so that's really all that we have and let's so I didn't want to repeat the code for all that I'll check out a previous video if you want to see more of that all right in this video I'll just focus on the RNN so let's see first of all what we want to do is we want to change our hi parameters and that we can remove this right here we're gonna create a RM and first thing we want to do is we want to change our hi parameters so when we load the M nice data set let's see I think when we load the amnesty to set so the shape when we load eminence data set it's going to be 64 I mean I guess I batch size let's say n by 1 by 28 by 28 and so what we can view this as is that we have 28 time sequences and each sequence has a 28 features ok so that's sort of how we can view the RNN working in this case and i also want to add that normally you wouldn't use an RNN for images but we can just we just want to kind of learn how to how to create RNN so we can use that input size should be 28 and we can say that the sequence of length is 28 so we're sort of viewing i guess we're taking one row at a time and that's what we're sending in to the RNN at each time step and then we're gonna have a number of layers to our RNN let's say we have two and let's say we want hidden size to be five 256 nodes in the in the hidden and let's see the learning rate is still that and then yeah we can still have let's say number of e-books is two that's really all that we want for i parameters and we're gonna you're gonna see why we need those so let's do class RN n and the dot module and then module like this we're gonna have our init function and so what we're gonna send in here is first of all the input size the hidden size the number of layers and also the number of classes okay first thing we're gonna call a super RNN self in it yeah so what we're gonna start with now it's just a very basic Arnon and then we'll take it to the GRU and in LST m first thing we're going to do is self dot hidden size is gonna be hidden size and self-tan on layers it's just gonna be known layers and then we're gonna define self dot RNN which will be n n dot R n N and it's gonna be so the input size is gonna be input size and that's sort of write the number of number of features for each time step okay so we don't have to explicitly say how many sequences we want to have the RN n will just work for any number of sequences that we send them just in this case it will be 28 sequences and then we're gonna do hidden size that's the number of nodes in each time step and lastly the number of layers for the RN n and one other additional argument that we're gonna do is batch first equals true yeah so since we the data set that we load their mistake I said is gonna have the batches as the exist axis then we need to say batch first equals true yeah you can read more about in the like I told documentation for how they expect the input to be but if we write batch first equals true as we do in this case we're gonna have so we need the input needs to be the number of batch the batch size first and then we're gonna have time sequence and then it's gonna be time times features okay so that's just what we're gonna send in in this case and then let's see so we're gonna also have a fully connected at the end so we're going to do nm dot linear and what we're gonna do here is we're going to do the hidden size and we're gonna do times sequence length and then number of classes so here what I as I said we have 28 time time sequences right time steps and what we're gonna do is we're gonna concatenate all of those sequences and that's what we're gonna send into the linear layer so it's gonna use information from every hidden state you could also just take the last the absolute last hidden state and I'm gonna show you in the end of this video how to do that that as well but let's just start with this one and so now we're down with the initialization that's the RNN that's the linear and then we're going to define forward self comma X and we need to sort of initialize the hidden state first so we're gonna do hidden state I guess we can do h0h torch torch that zeros and then self that num layers and yeah so the hidden state here needs to be initialized as the number of layers first and then X dot size and zero so that's sort of how many mini bashes we send in at the same time and then self dot hidden sighs and they were just gonna do dot two device and then so we're gonna do forward [Music] for for prop so forward rap we're going to do self dot RNN and we're just gonna send in X and the hidden state and then we're just gonna do out and then what would what would be the output here is just the hidden state but since we're not going to store the hidden state since every example has its own hidden state we're just gonna ignore that that output and then what we're gonna do is going to do out out that reshape and then we're gonna keep keep the batch as the first access and then we're just going to concatenate everything else so what this would be is I guess 28 times so the sequence length right 28 times the hidden size which is 256 and then we're just gonna do out equals self dot FC of out right so we just pass it through the linear layer and then return out and I think that should be it let's see we need to do aren't in here and we need to send in all of these things so let's just change to this so we send in the input size the hidden size number of layers and number of classes okay and we define those here in the high parameters the rest of the code should not change so we should be able to run this now and we do not so let's see what's wrong input must have three dimensions got to yeah so yeah I know what what's wrong here as I said the Emnes dataset has one by 28 by 28 but the Orion expects this kind of shape so n times 28 by 28 so what we got to do is we got to do dot squeeze and then one so this will remove the the one for that particular axis so that's X is one and we're gonna just remove that one and hopefully it should work now yeah all right yeah so we also have to let's here we can't have this and this is from the previous fully connected so that needs to be removed and I don't think there should be anything else now so I'm gonna let it rain and I'll get back to you when it's done alright so it's done training and we get about so we get ninety seven point five percent accuracy on the training and ninety seven point twenty eight on the test set which is actually quite good right we just trained it for two epochs and and it's just a basic basic RNN one thing i forgot to mention is that we need to do the same thing here dot squeeze of one when we do the check accuracy but yeah it's just a a detail so now let's see if we can improve on this result by changing this to a GRU instead so what we can do is we can do n n dot GRU instead of just a basic RN and yeah we really don't have to change anything else except that so we can just change sub top GRU instead and that should be all we have to change so I'll rerun this and we'll see what we get so after letting me train we get so we kind of see here that we got a little bit of an improvement we got ninety eight point forty one on the training and ninety eight point ten on the test set now let's change this to an LST M instead and what we need to do then is we need to do n n dot LST m and yeah let's do yourself that LST M and now what we need to do is we need to actually have a separate a separate cell state so we're gonna torch that zeros self dot num layers because if you remember the LST M sort of has a hidden state and a Cell State that's not the case for a GRU or basic owner but for an LCM we need to define a separate one kind of the same as the hidden state and what we're gonna do is we're gonna send in self at LST M we're gonna send in H zero comma C zero so they sell hidden state and the sell state as a tuple in the second argument and that's really all we need to change so I'm gonna run this again see what we get all right so we get comparatively this similar results as the gru in this case the gru is actually outperforming the LST m and yeah I guess in practice you most commonly see the lsdm performing better but really they are comparable and and yeah there's really no none of them that are better than the other but I think using an L stem is a good default choice but let's see what I want to do now yeah so I I said that now we're kind of using information from every hidden state but perhaps sort of just using the last hidden state is is okay right because the last hidden state has information from all of the previous ones so what we can do for that is that we can just remove the end and uh for every so it doesn't we don't need to do this concatenation of all of the hidden state and so we're just taking the last one and what we're gonna do then so we're gonna remove this reshape and we're gonna do so out here is gonna take all mini-batch all training examples at the same time and then it's just gonna take the last hidden state and then it's gonna take all features okay so that's really all we need to change just for it to take a specific hidden state in this case the last one of course I like just thinking about it we're losing information by doing this so the result is probably gonna be worse but perhaps in a few cases like just taking the most relevant information and training longer on that one is better than taking all information so let's see what we get alright it seems that I lied I'm not really sure how it's becoming better but it seems that the its performing better now when just using the last hidden state I really just think that's a matter of training longer but yeah that doesn't really matter that much so that's it that's it anyways that's how you would use just the last hidden state and yeah that's all for RNN and gr use and Ellis TMS in the next video I'll show how to do a bi-directional Alice TM yeah if you have any questions leave them below I think you so much for watching and the hope to see you in the next video [Music]
Original Description
In this video we go through how to code a simple rnn, gru and lstm example. Focus is on the architecture itself rather than the data etc. and we use the simple MNIST dataset for this example.
❤️ Support the channel ❤️
https://www.youtube.com/channel/UCkzW5JSFwvKRjXABI-UTAkQ/join
Paid Courses I recommend for learning (affiliate links, no extra cost for you):
⭐ Machine Learning Specialization https://bit.ly/3hjTBBt
⭐ Deep Learning Specialization https://bit.ly/3YcUkoI
📘 MLOps Specialization http://bit.ly/3wibaWy
📘 GAN Specialization https://bit.ly/3FmnZDl
📘 NLP Specialization http://bit.ly/3GXoQuP
✨ Free Resources that are great:
NLP: https://web.stanford.edu/class/cs224n/
CV: http://cs231n.stanford.edu/
Deployment: https://fullstackdeeplearning.com/
FastAI: https://www.fast.ai/
💻 My Deep Learning Setup and Recording Setup:
https://www.amazon.com/shop/aladdinpersson
GitHub Repository:
https://github.com/aladdinpersson/Machine-Learning-Collection
✅ One-Time Donations:
Paypal: https://bit.ly/3buoRYH
▶️ You Can Connect with me on:
Twitter - https://twitter.com/aladdinpersson
LinkedIn - https://www.linkedin.com/in/aladdin-persson-a95384153/
Github - https://github.com/aladdinpersson
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Aladdin Persson · Aladdin Persson · 50 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
▶
51
52
53
54
55
56
57
58
59
60
computeCost.m Linear Regression Cost Function - Machine Learning
Aladdin Persson
gradientDescent.m Gradient Descent Implementation - Machine Learning
Aladdin Persson
Neural Network from scratch - Part 1 (Standard Notation)
Aladdin Persson
Neural Network from scratch - Part 2 (Forward Propagation)
Aladdin Persson
Neural Network from scratch - Part 3 (Backward Propagation)
Aladdin Persson
Neural Network from scratch - Part 4 (With Python)
Aladdin Persson
sigmoid.m - Programming Assignment 2 Machine Learning
Aladdin Persson
costFunction.m - Programming Assignment 2 Machine Learning
Aladdin Persson
predict.m - Programming Assignment 2 Machine Learning
Aladdin Persson
costFunctionReg.m - Programming Assignment 2 Machine Learning
Aladdin Persson
lrCostFunction.m - Programming Assignment 3 Machine Learning
Aladdin Persson
oneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
predictOneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
predict.m - Programming Assignment 3 Machine Learning
Aladdin Persson
Caesar Cipher Encryption and Decryption with example
Aladdin Persson
Cryptography: Caesar Cipher Python
Aladdin Persson
Vigenere Cipher Explained (with Example)
Aladdin Persson
Cryptography: Vigenere Cipher Python
Aladdin Persson
Hill Cipher Explained (with Example)
Aladdin Persson
Cryptography: Hill Cipher Python
Aladdin Persson
Interval Scheduling Greedy Algorithm: Python
Aladdin Persson
Weighted Interval Scheduling Algorithm Explained
Aladdin Persson
Weighted Interval Scheduling Python Code
Aladdin Persson
Sequence Alignment | Needleman Wunsch Algorithm
Aladdin Persson
Sequence Alignment | Needleman Wunsch in Python
Aladdin Persson
Codility BinaryGap Python
Aladdin Persson
Codility CyclicRotation Python
Aladdin Persson
Derivation Linear Regression with Gradient Descent
Aladdin Persson
Linear Regression Gradient Descent From Scratch in Python
Aladdin Persson
Pytorch Neural Network example
Aladdin Persson
Pytorch CNN example (Convolutional Neural Network)
Aladdin Persson
Pytorch LeNet implementation from scratch
Aladdin Persson
Pytorch VGG implementation from scratch
Aladdin Persson
Pytorch GoogLeNet / InceptionNet implementation from scratch
Aladdin Persson
How to save and load models in Pytorch
Aladdin Persson
How to build custom Datasets for Images in Pytorch
Aladdin Persson
Pytorch Transfer Learning and Fine Tuning Tutorial
Aladdin Persson
Pytorch Data Augmentation using Torchvision
Aladdin Persson
Pytorch Quick Tip: Weight Initialization
Aladdin Persson
Pytorch Quick Tip: Using a Learning Rate Scheduler
Aladdin Persson
Pytorch ResNet implementation from Scratch
Aladdin Persson
Pytorch TensorBoard Tutorial
Aladdin Persson
Pytorch DCGAN Tutorial (See description for updated video)
Aladdin Persson
Naive Bayes from Scratch - Machine Learning Python
Aladdin Persson
Spam Classifier using Naive Bayes in Python
Aladdin Persson
K-Nearest Neighbor from scratch - Machine Learning Python
Aladdin Persson
Linear Regression Normal Equation Python
Aladdin Persson
SVM from Scratch - Machine Learning Python (Support Vector Machine)
Aladdin Persson
Neural Network from Scratch - Machine Learning Python
Aladdin Persson
Pytorch RNN example (Recurrent Neural Network)
Aladdin Persson
Pytorch Bidirectional LSTM example
Aladdin Persson
Pytorch Text Generator with character level LSTM
Aladdin Persson
Logistic Regression from Scratch - Machine Learning Python
Aladdin Persson
K-Means Clustering from Scratch - Machine Learning Python
Aladdin Persson
Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files
Aladdin Persson
Pytorch Torchtext Tutorial 2: Built in Datasets with Example
Aladdin Persson
Pytorch Torchtext Tutorial 3: From Textfiles to Dataset
Aladdin Persson
Paper Review: Sequence to Sequence Learning with Neural Networks
Aladdin Persson
Pytorch Seq2Seq Tutorial for Machine Translation
Aladdin Persson
Pytorch Seq2Seq with Attention for Machine Translation
Aladdin Persson
More on: LLM Foundations
View skill →Related Reads
📰
📰
📰
📰
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI