6. Recurrent Neural Networks [RNNs]

Weights & Biases · Beginner ·🧬 Deep Learning ·7y ago

Skills: Supervised Learning90%ML Maths Basics80%ML Pipelines80%

Key Takeaways

This video tutorial covers Recurrent Neural Networks (RNNs) and their application to time series data, including simple RNNs, LSTMs, and GRUs, using tools like Python, Keras, and NumPy.

Full Transcript

in this video we're going to talk about recursive neural networks assuming no background in recursive neural networks and we're going to talk about handling time series data because that's a great application of recursive neural networks in subsequent videos we're gonna show how to take these RN ends especially Ellis teams and apply them into text data if you haven't watched the earlier videos on perceptrons or multi-layer perceptrons you may want to go back and watch those first so far if you've been following along in this series you know enough to teach my little friend here to recognize images but the real world happens over time so today we're going to introduce recursive neural networks and along with that we're gonna introduce time series and handling data that changes over time I keep saying recursive neural Eric I think it's recurrent so when we talk about time series data there actually is a pretty famous data set maybe not quite as famous as M NIST but a really common data set to start with and it's a super exciting data set it's airline sales in the u.s. from 1945 to 1965 so the goal here is to look at airline sales in the past and predict airline sales in the future and actually here's a graph of that data set that you can look at and you can see that there's sort of two patterns happening at the same time first airline sales overall are rising but there's also a strong cyclic seasonality component so how do we use neural networks to model this data so if you remember I'm constantly talking about the API of neural networks and I'm mentioning over and over in my first couple videos on perceptrons that all the data that you input into your neural network has to be fixed size but with time series it feels different right you have these variable sizes of data so how do we turn that into a data set with a training set and a test set then we can actually input into our algorithm so there's one there's one super common super simple way to do it and that's called the sliding window approach and the way this works is we take a window of some fixed size maybe 10 elements and we go across we take the first 10 elements and put that in the first row of our training data and then from the first 10 out elements we want to output the 11th element which we call our label for that data then we slide over by one and so we take elements 2 through 11 as our inputs and from elements 2 through 11 we try to predict the 12th record in our data set and we keep sliding that window across and here we've actually created a data set that's exactly the same as the data set that we input into our perceptron in the very first video in this series so this is a super common approach and a super simple approach and it can be extremely effective so should we go to the code so here we are and we need to go into the ML class videos directory and then we need to go into the time series directory and then open up perceptron dot pi so you recognize a lot of this code at the top so lines 1 through 10 are kind of the standard boilerplate and then lines 12 and 13 is actually importing a library that I used to plot our predictions in this code line 19 here I set a parameter called look-back which is actually the size of the window that we're using so we're taking 20 inputs and we're predicting one output into the future so we have a function here load data and that actually loads in the data from a CSV international airline passengers that's CSV you can easily add your own CSV file if you want to do time series prediction on some other data line 34 defines another function called create data set and this actually does that thing I was talking about taking an array of data and turning it into that matrix where you have a you have inputs that are sized look back and an output that size one so as usual data X is the inputs and data Y is the outputs here so line 42 actually calls the load data function and puts the airline data into the data variable so this is a single dimension output which corresponds to the airline sails at each month and then if you remember from our perceptron video all of these things work better if you normalize the data to be between 0 and 1 and so actually do that in lines 45 46 and 47 just subtracting out the min value and then dividing by the difference between the max in the min now lines 50 51 and 52 what they do is they split it into a test and train data set and unlike with M nough Stata or non time series data sets I think it's better to split it into chunks so rather than randomly taking values we take the first 70% of our data which is maybe in the past and we use that as the training and then we take the final 30% of the data and we use that as the validation set this is because what we'd really like to do from here is take our data set and predict it on data into the future so lines 54 and 55 do that transformation I was talking about where we take the single dimension data and we turn it into a matrix where train X is the inputs and then train Y is the output that we want and here test X is the input on the test or validation data set and test Y is the output that we're hoping for lines 57 and 58 add a new dimension to the data this is similar to what we had to do to use convolutions on our data in a previous video they basically just add an empty dimension to our matrix of data so here we are at line 61 and we're going to finally create our neural network architecture and it says RNN but actually this isn't really recursive neural network this is the exact same perceptron that you would have seen in the first video so just like we had to do in the first video we add a flattened layer that takes our multi-dimensional data and smashes it down into a single dimensional input and then line 63 adds a single perceptron that outputs a single value based on the input data and in this case our activation function is linear but we could certainly change it to something else line 64 compiles that here we use our atom optimizer as usual we actually don't use categorical cross entropy as our loss you might want to stop and think about why do you remember why well we don't use categorical excuse me we don't use categorical cross Center before I lost because we're actually predicting a scalar value we're not predicting a category so in this case because we're picking a number we use mean squared error but you could try something else like mean absolute error if you wanted to and our final line 65 actually does this fit so here train X as usual is the input data train why is the label's we want to predict or the the future value of this time series we're going to train it for a thousand epochs because we have a pretty small data set and we're going to use a batch size of 10 we also pass in our validation data test X and test Y and then we have some functions to actually plot what's happening as the system runs so you can go back into your terminal and you can run python perceptron pi so you can watch it run here it's super super fast because it's a perceptron a small amount of input data check this up so here the blue line is the training data and the orange line is the data the test data that we held out and the green line is the prediction that our system is making and you can see that our predictions here are pretty reasonable so the way we do these predictions is we keep feeding in the predicted input as input into the next prediction right so if we kind of get off these things can go haywire but in this case our little perceptron with a small window of past data is actually doing a pretty reasonable job of forecasting airline predictions so we're gonna make this thing more complicated but actually this is not a bad result on a data set like this and you could really use this in production on certain types of time series data okay but you guys didn't come to this video to learn about using perceptrons on time series data I'm sure that way you care about is recursive neural networks but you might stop and think why do we use recursive neural networks at all right what's better about a recursive neural network than a dense perceptron what is it missing and I think what these perceptions are missing is the element of time so if you scrambled the inputs from 1 to 20 in the look-back that we fed into this perceptron it wouldn't make any difference in the accuracy the algorithm if we scrambled the past it has no effect on the prediction so we're putting a lot of work into our perception we're making it learn actually causality of time in a way so there's lots of parameters and on the small data set it works okay but it struggles on bigger more complicated data sets and anytime you can put some knowledge that you have about the world into the architecture of your model it's gonna generally make the models do better especially when the data sets get more complicated recursive neural networks generally they take the same kind of input as the density neural network right so they take in a set of data over time in numbers or vectors of numbers and they output a single number or a list of numbers over time but now here's the difference they actually keep a state that they pass through to themselves so this is a diagram of a simple recursive neural network and it's basically taking a state from the previous a neural network and it's outputting something now what happens inside of these recursive neural networks is different depending on the type that you use so in carrots you'll have a simple recursive neural network you'll have an LS TM and you also have a GRU those tend to be the most common neural networks that you'll see in the wild so let's start with the simple recursive neural network how does that work so our simple recursive neural network it takes in an input in this case it's a single number but it could be a larger dimensional thing and it also passes through some state and also in this case it's a single number but it could be larger later so now it has two inputs one from the outside and one from its previous self it takes those two numbers and it actually does the exact same calculation as a perceptron so weighted sum with an activation function and it outputs a single number it then takes its output and it passes it into the next recursive neural network and now that network takes in a four and it also takes on the output from the previous one and it does the exact same calculation with the same weights and it does that ten times or twenty times or as long as our window is and at the end it outputs a number and we take that output to be its prediction of the next value and we can do all the same things we did with a perceptron or a CNN where we do back propagation in this case it's called back propagation through time and we find the best set of parameters to make this output prediction exactly what we want it to be so let's see how this looks in the code so that was a lot to take it once and you might have missed a little of that but it's actually very easy for us to swap in a simple RNN for the perceptron that we had you can open up RN n dot pi and you'll see that there's only one small change I made before we had flattened and we had a dense layer I added simple RN N and then I have this one number here and what that one number means is that it's output and also the the thing that is passing from cell to cell is a single dimensional thing and that's important because our output dimension is actually a scalar it's a one-dimensional number at each time step so we can run this RN n by typing Python RN n PI and I also save this to save us time so here we have our blue line is the training data of this airline time series and then the orange line here is the actual data and the Green Line is our prediction with the RN n and you can see that this prediction is a lot lot worse it's not printing anything like what the data shows us so here's a great case where we can look at the loss and we can look at the validation loss and we can see that those are both improving so we can let this thing run for a while but actually what happens if we run this over time is that it never learns to actually fit our data so here's what other videos don't show you right this is a nonworking recursive neural network and so what I'm going to show you here which I think is really important is how to debug this problem and how to fix it so one thing I like to do when I'm dealing with a broken neural network is I like to run it on really really simple data so how could we make this airline data even simpler well one way is to use synthetic data so I actually have a little program make - sign up PI where I just output a sine wave I just want to see can this neural network model an actual sine wave so if you go in and you change this load data thing to take in a parameter sine si n now our model is trying to model a sine wave this seems like it should be almost the easiest time series data to model so let's make that change and let's run our program whoa and so you can actually see here that this neural network is not modeling the sine wave at all in fact it's predicting negative 1 for all these values of the sine wave which it never even sees in the data so first of all how is this thing even predicting a negative one how is that even a possible prediction of this thing can make and it actually reminds me that I forgot to tell you something about these neural networks which is they have a new activation function typically and we'll get into why that is but this activation function is called a hyperbolic tangent ok so I don't know how many people remember hyperbolic tangent from maybe their trigonometry class I'm not sure I had thought about it much until I saw it appear in a neural network the important thing to know about hyperbolic tangent is that it's basically like a sigmoid function but instead of going from 0 to 1 it goes from negative 1 to 1 so in this case a really really negative number well I'll put a negative 1 and a really really positive number will output a positive 1 and L STM's and grooves they use hyperbolic tangent activation function like crazy so Kerris actually makes the default activation function this hyperbolic tangent even though it's a simpler end and it probably doesn't really need to have that but now the big issue here the reason that this RNN can't learn such a simple thing is that it's actually only passing across a single parameter right so actually just passing one number from the point in time to point in time is not enough to even learn a pattern as simple as a sine wave function so what we need to do is we need to let it pass through more state more than just one single number so we can actually do that by changing the one into a higher number so let's do that in the code so here on line 62 where we see simpler one let's try changing that to a five so now that's gonna pass across five numbers instead of one and maybe that can encode the state of affairs ah so we get an error this is what the other videos don't show you these errors right so I'm gonna I'm gonna debug this with you but you might want to stop the video and think about why we got this error because this is a really common error to get when dealing with the neural networks of any type right it's a dimension error so it was expecting simple arn tend to have shape five but it got an array with shape one so what happened here so what's going on well so we're outputting a five dimensional thing but actually our because if you look at this diagram right so simple or an end it actually outputs the same thing that it's sending to the next VAT the next cell in the recursive neural network and so it's outputting a five dimensional thing and we can't use that so how do we turn this five dimensional output into a single dimensional output well one way to do it is to actually just add a dense layer at the very end and this is super common right so this is going to do is it'll take the five numbers that this network outputs and then it'll add a final perceptron that does a weighted sum that takes it down to a single output and so we can do that here by just adding a line that says dense so we had modeled that add dense one and we could actually add a different activation function if you wanted to so by default it'll use a linear activation function which will let it out put any number but we're trying to do a sine wave we had a single lines modeled that add dense one that adds our perception at the end and we could actually add a different activation function here in this case our data is normalized to be between zero and one so I think a sigmoid probably makes sense so let's say activation equals sigmoid awesome let's run this Network and so you can see it first this network is actually doing already a much better job of modeling the sine-wave so at first it seems to get it but it kind of dampens over time so it doesn't actually swing as much as the sine-wave swing and this sort of shows you actually how modelling time series is tougher than modeling other things because errors that you make in the beginning of predicting the future then feed in to your model and cause further errors as he pricked out further so this seems to start okay but then get bad but then you know after 100 epochs it's starting to really model the nature of the sine wave so I think we can stop and say that this network architecture is working much better than the previous one that we had and then we can take it we could try it back on the airline data so now that we've debugged our model let's go back to the original data set by just removing the sign here and let's try running it cool and so you can actually see that this is starting to model this airline data much better and over time it gets better and better and captures more and more of the way that the airline sales data set actually really looks and so I know you came here for an LST m and we're gonna go deep or in the next video into how Ellis games really work but I want to show you just as a taste of how easy it is to switch this simple are an end to an LST M so if we go back into the code into our nen pipe all we need to do is change this layer which says simpler an end to LST M go into our terminal and we type Python RN n PI and we're running our first lsdm one thing you'll notice about this LST M right away is that it runs a lot slower than the simple RN n and it runs much much slower than the perceptron and that's because there's a lot going on here and that power is going to be really important in some of the subsequent things that we talked about especially when you run them on text dated because if you think about it one way to look at text data is just a really complicated type of time series data today we saw how to take a recurrent neural network and use it inside Kerris on time series data and we even saw a little bit about how to debug a recurrent neural network and at the very end we swapped in for our simple recurrent neural network in Ellis which we're going to use quite a bit into the future we also learned how to take data and turn it into the format that you need in order to apply any type of time-series algorithm you

Original Description

In this tutorial we are going to look at Recurrent Neural Networks and time series data. In future videos, we are going to show how to take these RNNs and apply them to text data. Github repo: https://github.com/lukas/ml-class See all classes: https://wandb.ai/site/tutorials Weights & Biases: https://wandb.ai/site

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 10 of 60

← Previous Next →

0. What is machine learning?

0. What is machine learning?

Weights & Biases

1. Build Your First Machine Learning Model

1. Build Your First Machine Learning Model

Weights & Biases

Intro to ML: Course Overview

Intro to ML: Course Overview

Weights & Biases

2. Multi-Layer Perceptrons

2. Multi-Layer Perceptrons

Weights & Biases

3. Convolutional Neural Networks

3. Convolutional Neural Networks

Weights & Biases

Weights & Biases at OpenAI

Weights & Biases at OpenAI

Weights & Biases

Why Experiment Tracking is Crucial to OpenAI

Why Experiment Tracking is Crucial to OpenAI

Weights & Biases

4. Autoencoders

4. Autoencoders

Weights & Biases

5. Sentiment Analysis

5. Sentiment Analysis

Weights & Biases

6. Recurrent Neural Networks [RNNs]

6. Recurrent Neural Networks [RNNs]

Weights & Biases

7. Text Generation using LSTMs and GRUs

7. Text Generation using LSTMs and GRUs

Weights & Biases

8. Text Classification Using Convolutional Neural Networks

8. Text Classification Using Convolutional Neural Networks

Weights & Biases

9. Hybrid LSTMs [Long Short-Term Memory]

9. Hybrid LSTMs [Long Short-Term Memory]

Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Weights & Biases

Weights and Biases - Developer Tools for Deep Learning

Weights and Biases - Developer Tools for Deep Learning

Weights & Biases

Introducing Weights & Biases

Introducing Weights & Biases

Weights & Biases

10. Seq2Seq Models

10. Seq2Seq Models

Weights & Biases

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

Weights & Biases

12. One-shot learning for teaching neural networks to classify objects never seen before

12. One-shot learning for teaching neural networks to classify objects never seen before

Weights & Biases

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

Weights & Biases

14. Data Augmentation | Keras

14. Data Augmentation | Keras

Weights & Biases

15. Batch Size and Learning Rate in CNNs

15. Batch Size and Learning Rate in CNNs

Weights & Biases

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Weights & Biases

Grading Rubric for AI Applications with Sergey Karayev (2019)

Grading Rubric for AI Applications with Sergey Karayev (2019)

Weights & Biases

16. Video Frame Prediction using CNNs and LSTMs (2019)

16. Video Frame Prediction using CNNs and LSTMs (2019)

Weights & Biases

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Weights & Biases

17. Build and Deploy an Emotion Classifier (2019)

17. Build and Deploy an Emotion Classifier (2019)

Weights & Biases

Applied Deep Learning - Data Management with Josh Tobin (2019)

Applied Deep Learning - Data Management with Josh Tobin (2019)

Weights & Biases

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Weights & Biases

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Weights & Biases

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Weights & Biases

Designing a Machine Learning Project with Neal Khosla (2019)

Designing a Machine Learning Project with Neal Khosla (2019)

Weights & Biases

Lukas Beiwald on ML Tools and Experiment Management (2019)

Lukas Beiwald on ML Tools and Experiment Management (2019)

Weights & Biases

Building Machine Learning Teams with Josh Tobin (2019)

Building Machine Learning Teams with Josh Tobin (2019)

Weights & Biases

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Weights & Biases

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Weights & Biases

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Weights & Biases

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Weights & Biases

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Weights & Biases

Organizing ML projects — W&B walkthrough (2020)

Organizing ML projects — W&B walkthrough (2020)

Weights & Biases

Brandon Rohrer — Machine Learning in Production for Robots

Brandon Rohrer — Machine Learning in Production for Robots

Weights & Biases

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Weights & Biases

My experiments with Reinforcement Learning with Jariullah Safi

My experiments with Reinforcement Learning with Jariullah Safi

Weights & Biases

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Weights & Biases

Testing Machine Learning Models with Eric Schles

Testing Machine Learning Models with Eric Schles

Weights & Biases

How Linear Algebra is not like Algebra with Charles Frye

How Linear Algebra is not like Algebra with Charles Frye

Weights & Biases

Predicting Protein Structures using Deep Learning with Jonathan King

Predicting Protein Structures using Deep Learning with Jonathan King

Weights & Biases

Rachael Tatman — Conversational AI and Linguistics

Rachael Tatman — Conversational AI and Linguistics

Weights & Biases

Reformer by Han Lee

Reformer by Han Lee

Weights & Biases

Sequence Models with Pujaa Rajan

Sequence Models with Pujaa Rajan

Weights & Biases

GitHub Actions & Machine Learning Workflows with Hamel Husain

GitHub Actions & Machine Learning Workflows with Hamel Husain

Weights & Biases

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Weights & Biases

Jack Clark — Building Trustworthy AI Systems

Jack Clark — Building Trustworthy AI Systems

Weights & Biases

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Weights & Biases

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Weights & Biases

Antipatterns in open source research code with Jariullah Safi

Antipatterns in open source research code with Jariullah Safi

Weights & Biases

Attention for time series forecasting & COVID predictions - Isaac Godfried

Attention for time series forecasting & COVID predictions - Isaac Godfried

Weights & Biases

Made with ML - Goku Mohandas

Made with ML - Goku Mohandas

Weights & Biases

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Weights & Biases

Deep Learning Salon by Weights & Biases

Deep Learning Salon by Weights & Biases

Weights & Biases

This video tutorial teaches how to use Recurrent Neural Networks (RNNs) for time series data, including simple RNNs, LSTMs, and GRUs, and how to implement them using Python and Keras. The tutorial covers the basics of RNNs, including back propagation through time, and how to use them for time series forecasting.

Key Takeaways

Load Time Series Data from CSV File
Create Dataset with Inputs and Outputs
Set Look-Back Parameter
Use Sliding Window Approach to Create Fixed-Size Data Set
Implement Simple RNN
Implement LSTM
Add Dense Layer
Use Sigmoid Activation Function

💡 LSTMs are useful for time series data and can be used for text data as well, but they run slower than simple RNNs.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Supervised Learning

View skill →

Auto Machine Learning (AutoML) Using AutoGluon

Auto Machine Learning (AutoML) Using AutoGluon

Coding the SARIMA Model : Time Series Talk

Coding the SARIMA Model : Time Series Talk

Code With Me : Logistic Regression (from scratch) !

Code With Me : Logistic Regression (from scratch) !

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Predicting the Winning Team with Machine Learning

Predicting the Winning Team with Machine Learning

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train