Pytorch RNN example (Recurrent Neural Network)

Aladdin Persson · Beginner ·🧬 Deep Learning ·6y ago

Key Takeaways

This video demonstrates how to implement a simple Recurrent Neural Network (RNN) using PyTorch, including defining the RNN architecture, initializing parameters, and training the model on the MNIST dataset. The video also explores the use of Gated Recurrent Units (GRU) and Long Short-Term Memory (LSTM) networks for sequence classification.

Full Transcript

ladies and gentlemen welcome back for another PI torch video in this video I want to show you how to code a simple Arnon as well as how to code a GRU or LS tiem in pi torch so all I have here is the code for a fully connected neural network that we coded in a previous video but I'll just recap it quickly if you haven't watched that one so all that we have is a very very simple fully connected net that's yeah training on the M nice data set so we're just loading M nest we're initializing the network optimizer and we have some training loop and then in the end we're checking the accuracy of how good our model is so that's really all that we have and let's so I didn't want to repeat the code for all that I'll check out a previous video if you want to see more of that all right in this video I'll just focus on the RNN so let's see first of all what we want to do is we want to change our hi parameters and that we can remove this right here we're gonna create a RM and first thing we want to do is we want to change our hi parameters so when we load the M nice data set let's see I think when we load the amnesty to set so the shape when we load eminence data set it's going to be 64 I mean I guess I batch size let's say n by 1 by 28 by 28 and so what we can view this as is that we have 28 time sequences and each sequence has a 28 features ok so that's sort of how we can view the RNN working in this case and i also want to add that normally you wouldn't use an RNN for images but we can just we just want to kind of learn how to how to create RNN so we can use that input size should be 28 and we can say that the sequence of length is 28 so we're sort of viewing i guess we're taking one row at a time and that's what we're sending in to the RNN at each time step and then we're gonna have a number of layers to our RNN let's say we have two and let's say we want hidden size to be five 256 nodes in the in the hidden and let's see the learning rate is still that and then yeah we can still have let's say number of e-books is two that's really all that we want for i parameters and we're gonna you're gonna see why we need those so let's do class RN n and the dot module and then module like this we're gonna have our init function and so what we're gonna send in here is first of all the input size the hidden size the number of layers and also the number of classes okay first thing we're gonna call a super RNN self in it yeah so what we're gonna start with now it's just a very basic Arnon and then we'll take it to the GRU and in LST m first thing we're going to do is self dot hidden size is gonna be hidden size and self-tan on layers it's just gonna be known layers and then we're gonna define self dot RNN which will be n n dot R n N and it's gonna be so the input size is gonna be input size and that's sort of write the number of number of features for each time step okay so we don't have to explicitly say how many sequences we want to have the RN n will just work for any number of sequences that we send them just in this case it will be 28 sequences and then we're gonna do hidden size that's the number of nodes in each time step and lastly the number of layers for the RN n and one other additional argument that we're gonna do is batch first equals true yeah so since we the data set that we load their mistake I said is gonna have the batches as the exist axis then we need to say batch first equals true yeah you can read more about in the like I told documentation for how they expect the input to be but if we write batch first equals true as we do in this case we're gonna have so we need the input needs to be the number of batch the batch size first and then we're gonna have time sequence and then it's gonna be time times features okay so that's just what we're gonna send in in this case and then let's see so we're gonna also have a fully connected at the end so we're going to do nm dot linear and what we're gonna do here is we're going to do the hidden size and we're gonna do times sequence length and then number of classes so here what I as I said we have 28 time time sequences right time steps and what we're gonna do is we're gonna concatenate all of those sequences and that's what we're gonna send into the linear layer so it's gonna use information from every hidden state you could also just take the last the absolute last hidden state and I'm gonna show you in the end of this video how to do that that as well but let's just start with this one and so now we're down with the initialization that's the RNN that's the linear and then we're going to define forward self comma X and we need to sort of initialize the hidden state first so we're gonna do hidden state I guess we can do h0h torch torch that zeros and then self that num layers and yeah so the hidden state here needs to be initialized as the number of layers first and then X dot size and zero so that's sort of how many mini bashes we send in at the same time and then self dot hidden sighs and they were just gonna do dot two device and then so we're gonna do forward [Music] for for prop so forward rap we're going to do self dot RNN and we're just gonna send in X and the hidden state and then we're just gonna do out and then what would what would be the output here is just the hidden state but since we're not going to store the hidden state since every example has its own hidden state we're just gonna ignore that that output and then what we're gonna do is going to do out out that reshape and then we're gonna keep keep the batch as the first access and then we're just going to concatenate everything else so what this would be is I guess 28 times so the sequence length right 28 times the hidden size which is 256 and then we're just gonna do out equals self dot FC of out right so we just pass it through the linear layer and then return out and I think that should be it let's see we need to do aren't in here and we need to send in all of these things so let's just change to this so we send in the input size the hidden size number of layers and number of classes okay and we define those here in the high parameters the rest of the code should not change so we should be able to run this now and we do not so let's see what's wrong input must have three dimensions got to yeah so yeah I know what what's wrong here as I said the Emnes dataset has one by 28 by 28 but the Orion expects this kind of shape so n times 28 by 28 so what we got to do is we got to do dot squeeze and then one so this will remove the the one for that particular axis so that's X is one and we're gonna just remove that one and hopefully it should work now yeah all right yeah so we also have to let's here we can't have this and this is from the previous fully connected so that needs to be removed and I don't think there should be anything else now so I'm gonna let it rain and I'll get back to you when it's done alright so it's done training and we get about so we get ninety seven point five percent accuracy on the training and ninety seven point twenty eight on the test set which is actually quite good right we just trained it for two epochs and and it's just a basic basic RNN one thing i forgot to mention is that we need to do the same thing here dot squeeze of one when we do the check accuracy but yeah it's just a a detail so now let's see if we can improve on this result by changing this to a GRU instead so what we can do is we can do n n dot GRU instead of just a basic RN and yeah we really don't have to change anything else except that so we can just change sub top GRU instead and that should be all we have to change so I'll rerun this and we'll see what we get so after letting me train we get so we kind of see here that we got a little bit of an improvement we got ninety eight point forty one on the training and ninety eight point ten on the test set now let's change this to an LST M instead and what we need to do then is we need to do n n dot LST m and yeah let's do yourself that LST M and now what we need to do is we need to actually have a separate a separate cell state so we're gonna torch that zeros self dot num layers because if you remember the LST M sort of has a hidden state and a Cell State that's not the case for a GRU or basic owner but for an LCM we need to define a separate one kind of the same as the hidden state and what we're gonna do is we're gonna send in self at LST M we're gonna send in H zero comma C zero so they sell hidden state and the sell state as a tuple in the second argument and that's really all we need to change so I'm gonna run this again see what we get all right so we get comparatively this similar results as the gru in this case the gru is actually outperforming the LST m and yeah I guess in practice you most commonly see the lsdm performing better but really they are comparable and and yeah there's really no none of them that are better than the other but I think using an L stem is a good default choice but let's see what I want to do now yeah so I I said that now we're kind of using information from every hidden state but perhaps sort of just using the last hidden state is is okay right because the last hidden state has information from all of the previous ones so what we can do for that is that we can just remove the end and uh for every so it doesn't we don't need to do this concatenation of all of the hidden state and so we're just taking the last one and what we're gonna do then so we're gonna remove this reshape and we're gonna do so out here is gonna take all mini-batch all training examples at the same time and then it's just gonna take the last hidden state and then it's gonna take all features okay so that's really all we need to change just for it to take a specific hidden state in this case the last one of course I like just thinking about it we're losing information by doing this so the result is probably gonna be worse but perhaps in a few cases like just taking the most relevant information and training longer on that one is better than taking all information so let's see what we get alright it seems that I lied I'm not really sure how it's becoming better but it seems that the its performing better now when just using the last hidden state I really just think that's a matter of training longer but yeah that doesn't really matter that much so that's it that's it anyways that's how you would use just the last hidden state and yeah that's all for RNN and gr use and Ellis TMS in the next video I'll show how to do a bi-directional Alice TM yeah if you have any questions leave them below I think you so much for watching and the hope to see you in the next video [Music]

Original Description

In this video we go through how to code a simple rnn, gru and lstm example. Focus is on the architecture itself rather than the data etc. and we use the simple MNIST dataset for this example. ❤️ Support the channel ❤️ https://www.youtube.com/channel/UCkzW5JSFwvKRjXABI-UTAkQ/join Paid Courses I recommend for learning (affiliate links, no extra cost for you): ⭐ Machine Learning Specialization https://bit.ly/3hjTBBt ⭐ Deep Learning Specialization https://bit.ly/3YcUkoI 📘 MLOps Specialization http://bit.ly/3wibaWy 📘 GAN Specialization https://bit.ly/3FmnZDl 📘 NLP Specialization http://bit.ly/3GXoQuP ✨ Free Resources that are great: NLP: https://web.stanford.edu/class/cs224n/ CV: http://cs231n.stanford.edu/ Deployment: https://fullstackdeeplearning.com/ FastAI: https://www.fast.ai/ 💻 My Deep Learning Setup and Recording Setup: https://www.amazon.com/shop/aladdinpersson GitHub Repository: https://github.com/aladdinpersson/Machine-Learning-Collection ✅ One-Time Donations: Paypal: https://bit.ly/3buoRYH ▶️ You Can Connect with me on: Twitter - https://twitter.com/aladdinpersson LinkedIn - https://www.linkedin.com/in/aladdin-persson-a95384153/ Github - https://github.com/aladdinpersson
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Aladdin Persson · Aladdin Persson · 50 of 60

1 computeCost.m Linear Regression Cost Function - Machine Learning
computeCost.m Linear Regression Cost Function - Machine Learning
Aladdin Persson
2 gradientDescent.m Gradient Descent Implementation -  Machine Learning
gradientDescent.m Gradient Descent Implementation - Machine Learning
Aladdin Persson
3 Neural Network from scratch - Part 1 (Standard Notation)
Neural Network from scratch - Part 1 (Standard Notation)
Aladdin Persson
4 Neural Network from scratch - Part 2 (Forward Propagation)
Neural Network from scratch - Part 2 (Forward Propagation)
Aladdin Persson
5 Neural Network from scratch - Part 3 (Backward Propagation)
Neural Network from scratch - Part 3 (Backward Propagation)
Aladdin Persson
6 Neural Network from scratch - Part 4 (With Python)
Neural Network from scratch - Part 4 (With Python)
Aladdin Persson
7 sigmoid.m - Programming Assignment 2 Machine Learning
sigmoid.m - Programming Assignment 2 Machine Learning
Aladdin Persson
8 costFunction.m - Programming Assignment 2 Machine Learning
costFunction.m - Programming Assignment 2 Machine Learning
Aladdin Persson
9 predict.m - Programming Assignment 2 Machine Learning
predict.m - Programming Assignment 2 Machine Learning
Aladdin Persson
10 costFunctionReg.m - Programming Assignment 2 Machine Learning
costFunctionReg.m - Programming Assignment 2 Machine Learning
Aladdin Persson
11 lrCostFunction.m - Programming Assignment 3 Machine Learning
lrCostFunction.m - Programming Assignment 3 Machine Learning
Aladdin Persson
12 oneVsAll.m - Programming Assignment 3 Machine Learning
oneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
13 predictOneVsAll.m - Programming Assignment 3 Machine Learning
predictOneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
14 predict.m - Programming Assignment 3 Machine Learning
predict.m - Programming Assignment 3 Machine Learning
Aladdin Persson
15 Caesar Cipher Encryption and Decryption with example
Caesar Cipher Encryption and Decryption with example
Aladdin Persson
16 Cryptography: Caesar Cipher Python
Cryptography: Caesar Cipher Python
Aladdin Persson
17 Vigenere Cipher Explained (with Example)
Vigenere Cipher Explained (with Example)
Aladdin Persson
18 Cryptography: Vigenere Cipher Python
Cryptography: Vigenere Cipher Python
Aladdin Persson
19 Hill Cipher Explained (with Example)
Hill Cipher Explained (with Example)
Aladdin Persson
20 Cryptography: Hill Cipher Python
Cryptography: Hill Cipher Python
Aladdin Persson
21 Interval Scheduling Greedy Algorithm: Python
Interval Scheduling Greedy Algorithm: Python
Aladdin Persson
22 Weighted Interval Scheduling Algorithm Explained
Weighted Interval Scheduling Algorithm Explained
Aladdin Persson
23 Weighted Interval Scheduling Python Code
Weighted Interval Scheduling Python Code
Aladdin Persson
24 Sequence Alignment | Needleman Wunsch Algorithm
Sequence Alignment | Needleman Wunsch Algorithm
Aladdin Persson
25 Sequence Alignment | Needleman Wunsch in Python
Sequence Alignment | Needleman Wunsch in Python
Aladdin Persson
26 Codility BinaryGap Python
Codility BinaryGap Python
Aladdin Persson
27 Codility CyclicRotation Python
Codility CyclicRotation Python
Aladdin Persson
28 Derivation Linear Regression with Gradient Descent
Derivation Linear Regression with Gradient Descent
Aladdin Persson
29 Linear Regression Gradient Descent From Scratch in Python
Linear Regression Gradient Descent From Scratch in Python
Aladdin Persson
30 Pytorch Neural Network example
Pytorch Neural Network example
Aladdin Persson
31 Pytorch CNN example (Convolutional Neural Network)
Pytorch CNN example (Convolutional Neural Network)
Aladdin Persson
32 Pytorch LeNet implementation from scratch
Pytorch LeNet implementation from scratch
Aladdin Persson
33 Pytorch VGG implementation from scratch
Pytorch VGG implementation from scratch
Aladdin Persson
34 Pytorch GoogLeNet / InceptionNet implementation from scratch
Pytorch GoogLeNet / InceptionNet implementation from scratch
Aladdin Persson
35 How to save and load models in Pytorch
How to save and load models in Pytorch
Aladdin Persson
36 How to build custom Datasets for Images in Pytorch
How to build custom Datasets for Images in Pytorch
Aladdin Persson
37 Pytorch Transfer Learning and Fine Tuning Tutorial
Pytorch Transfer Learning and Fine Tuning Tutorial
Aladdin Persson
38 Pytorch Data Augmentation using Torchvision
Pytorch Data Augmentation using Torchvision
Aladdin Persson
39 Pytorch Quick Tip: Weight Initialization
Pytorch Quick Tip: Weight Initialization
Aladdin Persson
40 Pytorch Quick Tip: Using a Learning Rate Scheduler
Pytorch Quick Tip: Using a Learning Rate Scheduler
Aladdin Persson
41 Pytorch ResNet implementation from Scratch
Pytorch ResNet implementation from Scratch
Aladdin Persson
42 Pytorch TensorBoard Tutorial
Pytorch TensorBoard Tutorial
Aladdin Persson
43 Pytorch DCGAN Tutorial (See description for updated video)
Pytorch DCGAN Tutorial (See description for updated video)
Aladdin Persson
44 Naive Bayes from Scratch - Machine Learning Python
Naive Bayes from Scratch - Machine Learning Python
Aladdin Persson
45 Spam Classifier using Naive Bayes in Python
Spam Classifier using Naive Bayes in Python
Aladdin Persson
46 K-Nearest Neighbor from scratch - Machine Learning Python
K-Nearest Neighbor from scratch - Machine Learning Python
Aladdin Persson
47 Linear Regression Normal Equation Python
Linear Regression Normal Equation Python
Aladdin Persson
48 SVM from Scratch - Machine Learning Python (Support Vector Machine)
SVM from Scratch - Machine Learning Python (Support Vector Machine)
Aladdin Persson
49 Neural Network from Scratch - Machine Learning Python
Neural Network from Scratch - Machine Learning Python
Aladdin Persson
Pytorch RNN example (Recurrent Neural Network)
Pytorch RNN example (Recurrent Neural Network)
Aladdin Persson
51 Pytorch Bidirectional LSTM example
Pytorch Bidirectional LSTM example
Aladdin Persson
52 Pytorch Text Generator with character level LSTM
Pytorch Text Generator with character level LSTM
Aladdin Persson
53 Logistic Regression from Scratch - Machine Learning Python
Logistic Regression from Scratch - Machine Learning Python
Aladdin Persson
54 K-Means Clustering from Scratch - Machine Learning Python
K-Means Clustering from Scratch - Machine Learning Python
Aladdin Persson
55 Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files
Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files
Aladdin Persson
56 Pytorch Torchtext Tutorial 2: Built in Datasets with Example
Pytorch Torchtext Tutorial 2: Built in Datasets with Example
Aladdin Persson
57 Pytorch Torchtext Tutorial 3: From Textfiles to Dataset
Pytorch Torchtext Tutorial 3: From Textfiles to Dataset
Aladdin Persson
58 Paper Review: Sequence to Sequence Learning with Neural Networks
Paper Review: Sequence to Sequence Learning with Neural Networks
Aladdin Persson
59 Pytorch Seq2Seq Tutorial for Machine Translation
Pytorch Seq2Seq Tutorial for Machine Translation
Aladdin Persson
60 Pytorch Seq2Seq with Attention for Machine Translation
Pytorch Seq2Seq with Attention for Machine Translation
Aladdin Persson

This video teaches how to implement a simple RNN using PyTorch and explore the use of GRU and LSTM networks for sequence classification. The video covers defining the RNN architecture, initializing parameters, and training the model on the MNIST dataset.

Key Takeaways
  1. Create a RNN
  2. Define the RNN parameters
  3. Initialize the RNN
  4. Initialize hidden state with zeros
  5. Send input and hidden state to RNN
  6. Reshape output to batch size, sequence length, hidden size
  7. Pass output through fully connected layer
  8. Train model for two epochs
  9. Change RNN to GRU and retrain
  10. Change RNN to LSTMs and retrain
💡 Using the last hidden state instead of concatenating all hidden states can improve the performance of the model

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →