Pytorch RNN example (Recurrent Neural Network)

Aladdin Persson · Beginner ·🧬 Deep Learning ·6y ago

Skills: LLM Foundations60%Supervised Learning50%ML Pipelines50%

Key Takeaways

This video demonstrates how to implement a simple Recurrent Neural Network (RNN) using PyTorch, including defining the RNN architecture, initializing parameters, and training the model on the MNIST dataset. The video also explores the use of Gated Recurrent Units (GRU) and Long Short-Term Memory (LSTM) networks for sequence classification.

Full Transcript

ladies and gentlemen welcome back for another PI torch video in this video I want to show you how to code a simple Arnon as well as how to code a GRU or LS tiem in pi torch so all I have here is the code for a fully connected neural network that we coded in a previous video but I'll just recap it quickly if you haven't watched that one so all that we have is a very very simple fully connected net that's yeah training on the M nice data set so we're just loading M nest we're initializing the network optimizer and we have some training loop and then in the end we're checking the accuracy of how good our model is so that's really all that we have and let's so I didn't want to repeat the code for all that I'll check out a previous video if you want to see more of that all right in this video I'll just focus on the RNN so let's see first of all what we want to do is we want to change our hi parameters and that we can remove this right here we're gonna create a RM and first thing we want to do is we want to change our hi parameters so when we load the M nice data set let's see I think when we load the amnesty to set so the shape when we load eminence data set it's going to be 64 I mean I guess I batch size let's say n by 1 by 28 by 28 and so what we can view this as is that we have 28 time sequences and each sequence has a 28 features ok so that's sort of how we can view the RNN working in this case and i also want to add that normally you wouldn't use an RNN for images but we can just we just want to kind of learn how to how to create RNN so we can use that input size should be 28 and we can say that the sequence of length is 28 so we're sort of viewing i guess we're taking one row at a time and that's what we're sending in to the RNN at each time step and then we're gonna have a number of layers to our RNN let's say we have two and let's say we want hidden size to be five 256 nodes in the in the hidden and let's see the learning rate is still that and then yeah we can still have let's say number of e-books is two that's really all that we want for i parameters and we're gonna you're gonna see why we need those so let's do class RN n and the dot module and then module like this we're gonna have our init function and so what we're gonna send in here is first of all the input size the hidden size the number of layers and also the number of classes okay first thing we're gonna call a super RNN self in it yeah so what we're gonna start with now it's just a very basic Arnon and then we'll take it to the GRU and in LST m first thing we're going to do is self dot hidden size is gonna be hidden size and self-tan on layers it's just gonna be known layers and then we're gonna define self dot RNN which will be n n dot R n N and it's gonna be so the input size is gonna be input size and that's sort of write the number of number of features for each time step okay so we don't have to explicitly say how many sequences we want to have the RN n will just work for any number of sequences that we send them just in this case it will be 28 sequences and then we're gonna do hidden size that's the number of nodes in each time step and lastly the number of layers for the RN n and one other additional argument that we're gonna do is batch first equals true yeah so since we the data set that we load their mistake I said is gonna have the batches as the exist axis then we need to say batch first equals true yeah you can read more about in the like I told documentation for how they expect the input to be but if we write batch first equals true as we do in this case we're gonna have so we need the input needs to be the number of batch the batch size first and then we're gonna have time sequence and then it's gonna be time times features okay so that's just what we're gonna send in in this case and then let's see so we're gonna also have a fully connected at the end so we're going to do nm dot linear and what we're gonna do here is we're going to do the hidden size and we're gonna do times sequence length and then number of classes so here what I as I said we have 28 time time sequences right time steps and what we're gonna do is we're gonna concatenate all of those sequences and that's what we're gonna send into the linear layer so it's gonna use information from every hidden state you could also just take the last the absolute last hidden state and I'm gonna show you in the end of this video how to do that that as well but let's just start with this one and so now we're down with the initialization that's the RNN that's the linear and then we're going to define forward self comma X and we need to sort of initialize the hidden state first so we're gonna do hidden state I guess we can do h0h torch torch that zeros and then self that num layers and yeah so the hidden state here needs to be initialized as the number of layers first and then X dot size and zero so that's sort of how many mini bashes we send in at the same time and then self dot hidden sighs and they were just gonna do dot two device and then so we're gonna do forward [Music] for for prop so forward rap we're going to do self dot RNN and we're just gonna send in X and the hidden state and then we're just gonna do out and then what would what would be the output here is just the hidden state but since we're not going to store the hidden state since every example has its own hidden state we're just gonna ignore that that output and then what we're gonna do is going to do out out that reshape and then we're gonna keep keep the batch as the first access and then we're just going to concatenate everything else so what this would be is I guess 28 times so the sequence length right 28 times the hidden size which is 256 and then we're just gonna do out equals self dot FC of out right so we just pass it through the linear layer and then return out and I think that should be it let's see we need to do aren't in here and we need to send in all of these things so let's just change to this so we send in the input size the hidden size number of layers and number of classes okay and we define those here in the high parameters the rest of the code should not change so we should be able to run this now and we do not so let's see what's wrong input must have three dimensions got to yeah so yeah I know what what's wrong here as I said the Emnes dataset has one by 28 by 28 but the Orion expects this kind of shape so n times 28 by 28 so what we got to do is we got to do dot squeeze and then one so this will remove the the one for that particular axis so that's X is one and we're gonna just remove that one and hopefully it should work now yeah all right yeah so we also have to let's here we can't have this and this is from the previous fully connected so that needs to be removed and I don't think there should be anything else now so I'm gonna let it rain and I'll get back to you when it's done alright so it's done training and we get about so we get ninety seven point five percent accuracy on the training and ninety seven point twenty eight on the test set which is actually quite good right we just trained it for two epochs and and it's just a basic basic RNN one thing i forgot to mention is that we need to do the same thing here dot squeeze of one when we do the check accuracy but yeah it's just a a detail so now let's see if we can improve on this result by changing this to a GRU instead so what we can do is we can do n n dot GRU instead of just a basic RN and yeah we really don't have to change anything else except that so we can just change sub top GRU instead and that should be all we have to change so I'll rerun this and we'll see what we get so after letting me train we get so we kind of see here that we got a little bit of an improvement we got ninety eight point forty one on the training and ninety eight point ten on the test set now let's change this to an LST M instead and what we need to do then is we need to do n n dot LST m and yeah let's do yourself that LST M and now what we need to do is we need to actually have a separate a separate cell state so we're gonna torch that zeros self dot num layers because if you remember the LST M sort of has a hidden state and a Cell State that's not the case for a GRU or basic owner but for an LCM we need to define a separate one kind of the same as the hidden state and what we're gonna do is we're gonna send in self at LST M we're gonna send in H zero comma C zero so they sell hidden state and the sell state as a tuple in the second argument and that's really all we need to change so I'm gonna run this again see what we get all right so we get comparatively this similar results as the gru in this case the gru is actually outperforming the LST m and yeah I guess in practice you most commonly see the lsdm performing better but really they are comparable and and yeah there's really no none of them that are better than the other but I think using an L stem is a good default choice but let's see what I want to do now yeah so I I said that now we're kind of using information from every hidden state but perhaps sort of just using the last hidden state is is okay right because the last hidden state has information from all of the previous ones so what we can do for that is that we can just remove the end and uh for every so it doesn't we don't need to do this concatenation of all of the hidden state and so we're just taking the last one and what we're gonna do then so we're gonna remove this reshape and we're gonna do so out here is gonna take all mini-batch all training examples at the same time and then it's just gonna take the last hidden state and then it's gonna take all features okay so that's really all we need to change just for it to take a specific hidden state in this case the last one of course I like just thinking about it we're losing information by doing this so the result is probably gonna be worse but perhaps in a few cases like just taking the most relevant information and training longer on that one is better than taking all information so let's see what we get alright it seems that I lied I'm not really sure how it's becoming better but it seems that the its performing better now when just using the last hidden state I really just think that's a matter of training longer but yeah that doesn't really matter that much so that's it that's it anyways that's how you would use just the last hidden state and yeah that's all for RNN and gr use and Ellis TMS in the next video I'll show how to do a bi-directional Alice TM yeah if you have any questions leave them below I think you so much for watching and the hope to see you in the next video [Music]

Original Description

In this video we go through how to code a simple rnn, gru and lstm example. Focus is on the architecture itself rather than the data etc. and we use the simple MNIST dataset for this example. ❤️ Support the channel ❤️ https://www.youtube.com/channel/UCkzW5JSFwvKRjXABI-UTAkQ/join Paid Courses I recommend for learning (affiliate links, no extra cost for you): ⭐ Machine Learning Specialization https://bit.ly/3hjTBBt ⭐ Deep Learning Specialization https://bit.ly/3YcUkoI 📘 MLOps Specialization http://bit.ly/3wibaWy 📘 GAN Specialization https://bit.ly/3FmnZDl 📘 NLP Specialization http://bit.ly/3GXoQuP ✨ Free Resources that are great: NLP: https://web.stanford.edu/class/cs224n/ CV: http://cs231n.stanford.edu/ Deployment: https://fullstackdeeplearning.com/ FastAI: https://www.fast.ai/ 💻 My Deep Learning Setup and Recording Setup: https://www.amazon.com/shop/aladdinpersson GitHub Repository: https://github.com/aladdinpersson/Machine-Learning-Collection ✅ One-Time Donations: Paypal: https://bit.ly/3buoRYH ▶️ You Can Connect with me on: Twitter - https://twitter.com/aladdinpersson LinkedIn - https://www.linkedin.com/in/aladdin-persson-a95384153/ Github - https://github.com/aladdinpersson

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Aladdin Persson · Aladdin Persson · 50 of 60

← Previous Next →

computeCost.m Linear Regression Cost Function - Machine Learning

computeCost.m Linear Regression Cost Function - Machine Learning

Aladdin Persson

gradientDescent.m Gradient Descent Implementation - Machine Learning

gradientDescent.m Gradient Descent Implementation - Machine Learning

Aladdin Persson

Neural Network from scratch - Part 1 (Standard Notation)

Neural Network from scratch - Part 1 (Standard Notation)

Aladdin Persson

Neural Network from scratch - Part 2 (Forward Propagation)

Neural Network from scratch - Part 2 (Forward Propagation)

Aladdin Persson

Neural Network from scratch - Part 3 (Backward Propagation)

Neural Network from scratch - Part 3 (Backward Propagation)

Aladdin Persson

Neural Network from scratch - Part 4 (With Python)

Neural Network from scratch - Part 4 (With Python)

Aladdin Persson

sigmoid.m - Programming Assignment 2 Machine Learning

sigmoid.m - Programming Assignment 2 Machine Learning

Aladdin Persson

costFunction.m - Programming Assignment 2 Machine Learning

costFunction.m - Programming Assignment 2 Machine Learning

Aladdin Persson

predict.m - Programming Assignment 2 Machine Learning

predict.m - Programming Assignment 2 Machine Learning

Aladdin Persson

costFunctionReg.m - Programming Assignment 2 Machine Learning

costFunctionReg.m - Programming Assignment 2 Machine Learning

Aladdin Persson

lrCostFunction.m - Programming Assignment 3 Machine Learning

lrCostFunction.m - Programming Assignment 3 Machine Learning

Aladdin Persson

oneVsAll.m - Programming Assignment 3 Machine Learning

oneVsAll.m - Programming Assignment 3 Machine Learning

Aladdin Persson

predictOneVsAll.m - Programming Assignment 3 Machine Learning

predictOneVsAll.m - Programming Assignment 3 Machine Learning

Aladdin Persson

predict.m - Programming Assignment 3 Machine Learning

predict.m - Programming Assignment 3 Machine Learning

Aladdin Persson

Caesar Cipher Encryption and Decryption with example

Caesar Cipher Encryption and Decryption with example

Aladdin Persson

Cryptography: Caesar Cipher Python

Cryptography: Caesar Cipher Python

Aladdin Persson

Vigenere Cipher Explained (with Example)

Vigenere Cipher Explained (with Example)

Aladdin Persson

Cryptography: Vigenere Cipher Python

Cryptography: Vigenere Cipher Python

Aladdin Persson

Hill Cipher Explained (with Example)

Hill Cipher Explained (with Example)

Aladdin Persson

Cryptography: Hill Cipher Python

Cryptography: Hill Cipher Python

Aladdin Persson

Interval Scheduling Greedy Algorithm: Python

Interval Scheduling Greedy Algorithm: Python

Aladdin Persson

Weighted Interval Scheduling Algorithm Explained

Weighted Interval Scheduling Algorithm Explained

Aladdin Persson

Weighted Interval Scheduling Python Code

Weighted Interval Scheduling Python Code

Aladdin Persson

Sequence Alignment | Needleman Wunsch Algorithm

Sequence Alignment | Needleman Wunsch Algorithm

Aladdin Persson

Sequence Alignment | Needleman Wunsch in Python

Sequence Alignment | Needleman Wunsch in Python

Aladdin Persson

Codility BinaryGap Python

Codility BinaryGap Python

Aladdin Persson

Codility CyclicRotation Python

Codility CyclicRotation Python

Aladdin Persson

Derivation Linear Regression with Gradient Descent

Derivation Linear Regression with Gradient Descent

Aladdin Persson

Linear Regression Gradient Descent From Scratch in Python

Linear Regression Gradient Descent From Scratch in Python

Aladdin Persson

Pytorch Neural Network example

Pytorch Neural Network example

Aladdin Persson

Pytorch CNN example (Convolutional Neural Network)

Pytorch CNN example (Convolutional Neural Network)

Aladdin Persson

Pytorch LeNet implementation from scratch

Pytorch LeNet implementation from scratch

Aladdin Persson

Pytorch VGG implementation from scratch

Pytorch VGG implementation from scratch

Aladdin Persson

Pytorch GoogLeNet / InceptionNet implementation from scratch

Pytorch GoogLeNet / InceptionNet implementation from scratch

Aladdin Persson

How to save and load models in Pytorch

How to save and load models in Pytorch

Aladdin Persson

How to build custom Datasets for Images in Pytorch

How to build custom Datasets for Images in Pytorch

Aladdin Persson

Pytorch Transfer Learning and Fine Tuning Tutorial

Pytorch Transfer Learning and Fine Tuning Tutorial

Aladdin Persson

Pytorch Data Augmentation using Torchvision

Pytorch Data Augmentation using Torchvision

Aladdin Persson

Pytorch Quick Tip: Weight Initialization

Pytorch Quick Tip: Weight Initialization

Aladdin Persson

Pytorch Quick Tip: Using a Learning Rate Scheduler

Pytorch Quick Tip: Using a Learning Rate Scheduler

Aladdin Persson

Pytorch ResNet implementation from Scratch

Pytorch ResNet implementation from Scratch

Aladdin Persson

Pytorch TensorBoard Tutorial

Pytorch TensorBoard Tutorial

Aladdin Persson

Pytorch DCGAN Tutorial (See description for updated video)

Pytorch DCGAN Tutorial (See description for updated video)

Aladdin Persson

Naive Bayes from Scratch - Machine Learning Python

Naive Bayes from Scratch - Machine Learning Python

Aladdin Persson

Spam Classifier using Naive Bayes in Python

Spam Classifier using Naive Bayes in Python

Aladdin Persson

K-Nearest Neighbor from scratch - Machine Learning Python

K-Nearest Neighbor from scratch - Machine Learning Python

Aladdin Persson

Linear Regression Normal Equation Python

Linear Regression Normal Equation Python

Aladdin Persson

SVM from Scratch - Machine Learning Python (Support Vector Machine)

SVM from Scratch - Machine Learning Python (Support Vector Machine)

Aladdin Persson

Neural Network from Scratch - Machine Learning Python

Neural Network from Scratch - Machine Learning Python

Aladdin Persson

Pytorch RNN example (Recurrent Neural Network)

Pytorch RNN example (Recurrent Neural Network)

Aladdin Persson

Pytorch Bidirectional LSTM example

Pytorch Bidirectional LSTM example

Aladdin Persson

Pytorch Text Generator with character level LSTM

Pytorch Text Generator with character level LSTM

Aladdin Persson

Logistic Regression from Scratch - Machine Learning Python

Logistic Regression from Scratch - Machine Learning Python

Aladdin Persson

K-Means Clustering from Scratch - Machine Learning Python

K-Means Clustering from Scratch - Machine Learning Python

Aladdin Persson

Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

Aladdin Persson

Pytorch Torchtext Tutorial 2: Built in Datasets with Example

Pytorch Torchtext Tutorial 2: Built in Datasets with Example

Aladdin Persson

Pytorch Torchtext Tutorial 3: From Textfiles to Dataset

Pytorch Torchtext Tutorial 3: From Textfiles to Dataset

Aladdin Persson

Paper Review: Sequence to Sequence Learning with Neural Networks

Paper Review: Sequence to Sequence Learning with Neural Networks

Aladdin Persson

Pytorch Seq2Seq Tutorial for Machine Translation

Pytorch Seq2Seq Tutorial for Machine Translation

Aladdin Persson

Pytorch Seq2Seq with Attention for Machine Translation

Pytorch Seq2Seq with Attention for Machine Translation

Aladdin Persson

This video teaches how to implement a simple RNN using PyTorch and explore the use of GRU and LSTM networks for sequence classification. The video covers defining the RNN architecture, initializing parameters, and training the model on the MNIST dataset.

Key Takeaways

Create a RNN
Define the RNN parameters
Initialize the RNN
Initialize hidden state with zeros
Send input and hidden state to RNN
Reshape output to batch size, sequence length, hidden size
Pass output through fully connected layer
Train model for two epochs
Change RNN to GRU and retrain
Change RNN to LSTMs and retrain

💡 Using the last hidden state instead of concatenating all hidden states can improve the performance of the model

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train