6. Recurrent Neural Networks [RNNs]
Key Takeaways
This video tutorial covers Recurrent Neural Networks (RNNs) and their application to time series data, including simple RNNs, LSTMs, and GRUs, using tools like Python, Keras, and NumPy.
Full Transcript
in this video we're going to talk about recursive neural networks assuming no background in recursive neural networks and we're going to talk about handling time series data because that's a great application of recursive neural networks in subsequent videos we're gonna show how to take these RN ends especially Ellis teams and apply them into text data if you haven't watched the earlier videos on perceptrons or multi-layer perceptrons you may want to go back and watch those first so far if you've been following along in this series you know enough to teach my little friend here to recognize images but the real world happens over time so today we're going to introduce recursive neural networks and along with that we're gonna introduce time series and handling data that changes over time I keep saying recursive neural Eric I think it's recurrent so when we talk about time series data there actually is a pretty famous data set maybe not quite as famous as M NIST but a really common data set to start with and it's a super exciting data set it's airline sales in the u.s. from 1945 to 1965 so the goal here is to look at airline sales in the past and predict airline sales in the future and actually here's a graph of that data set that you can look at and you can see that there's sort of two patterns happening at the same time first airline sales overall are rising but there's also a strong cyclic seasonality component so how do we use neural networks to model this data so if you remember I'm constantly talking about the API of neural networks and I'm mentioning over and over in my first couple videos on perceptrons that all the data that you input into your neural network has to be fixed size but with time series it feels different right you have these variable sizes of data so how do we turn that into a data set with a training set and a test set then we can actually input into our algorithm so there's one there's one super common super simple way to do it and that's called the sliding window approach and the way this works is we take a window of some fixed size maybe 10 elements and we go across we take the first 10 elements and put that in the first row of our training data and then from the first 10 out elements we want to output the 11th element which we call our label for that data then we slide over by one and so we take elements 2 through 11 as our inputs and from elements 2 through 11 we try to predict the 12th record in our data set and we keep sliding that window across and here we've actually created a data set that's exactly the same as the data set that we input into our perceptron in the very first video in this series so this is a super common approach and a super simple approach and it can be extremely effective so should we go to the code so here we are and we need to go into the ML class videos directory and then we need to go into the time series directory and then open up perceptron dot pi so you recognize a lot of this code at the top so lines 1 through 10 are kind of the standard boilerplate and then lines 12 and 13 is actually importing a library that I used to plot our predictions in this code line 19 here I set a parameter called look-back which is actually the size of the window that we're using so we're taking 20 inputs and we're predicting one output into the future so we have a function here load data and that actually loads in the data from a CSV international airline passengers that's CSV you can easily add your own CSV file if you want to do time series prediction on some other data line 34 defines another function called create data set and this actually does that thing I was talking about taking an array of data and turning it into that matrix where you have a you have inputs that are sized look back and an output that size one so as usual data X is the inputs and data Y is the outputs here so line 42 actually calls the load data function and puts the airline data into the data variable so this is a single dimension output which corresponds to the airline sails at each month and then if you remember from our perceptron video all of these things work better if you normalize the data to be between 0 and 1 and so actually do that in lines 45 46 and 47 just subtracting out the min value and then dividing by the difference between the max in the min now lines 50 51 and 52 what they do is they split it into a test and train data set and unlike with M nough Stata or non time series data sets I think it's better to split it into chunks so rather than randomly taking values we take the first 70% of our data which is maybe in the past and we use that as the training and then we take the final 30% of the data and we use that as the validation set this is because what we'd really like to do from here is take our data set and predict it on data into the future so lines 54 and 55 do that transformation I was talking about where we take the single dimension data and we turn it into a matrix where train X is the inputs and then train Y is the output that we want and here test X is the input on the test or validation data set and test Y is the output that we're hoping for lines 57 and 58 add a new dimension to the data this is similar to what we had to do to use convolutions on our data in a previous video they basically just add an empty dimension to our matrix of data so here we are at line 61 and we're going to finally create our neural network architecture and it says RNN but actually this isn't really recursive neural network this is the exact same perceptron that you would have seen in the first video so just like we had to do in the first video we add a flattened layer that takes our multi-dimensional data and smashes it down into a single dimensional input and then line 63 adds a single perceptron that outputs a single value based on the input data and in this case our activation function is linear but we could certainly change it to something else line 64 compiles that here we use our atom optimizer as usual we actually don't use categorical cross entropy as our loss you might want to stop and think about why do you remember why well we don't use categorical excuse me we don't use categorical cross Center before I lost because we're actually predicting a scalar value we're not predicting a category so in this case because we're picking a number we use mean squared error but you could try something else like mean absolute error if you wanted to and our final line 65 actually does this fit so here train X as usual is the input data train why is the label's we want to predict or the the future value of this time series we're going to train it for a thousand epochs because we have a pretty small data set and we're going to use a batch size of 10 we also pass in our validation data test X and test Y and then we have some functions to actually plot what's happening as the system runs so you can go back into your terminal and you can run python perceptron pi so you can watch it run here it's super super fast because it's a perceptron a small amount of input data check this up so here the blue line is the training data and the orange line is the data the test data that we held out and the green line is the prediction that our system is making and you can see that our predictions here are pretty reasonable so the way we do these predictions is we keep feeding in the predicted input as input into the next prediction right so if we kind of get off these things can go haywire but in this case our little perceptron with a small window of past data is actually doing a pretty reasonable job of forecasting airline predictions so we're gonna make this thing more complicated but actually this is not a bad result on a data set like this and you could really use this in production on certain types of time series data okay but you guys didn't come to this video to learn about using perceptrons on time series data I'm sure that way you care about is recursive neural networks but you might stop and think why do we use recursive neural networks at all right what's better about a recursive neural network than a dense perceptron what is it missing and I think what these perceptions are missing is the element of time so if you scrambled the inputs from 1 to 20 in the look-back that we fed into this perceptron it wouldn't make any difference in the accuracy the algorithm if we scrambled the past it has no effect on the prediction so we're putting a lot of work into our perception we're making it learn actually causality of time in a way so there's lots of parameters and on the small data set it works okay but it struggles on bigger more complicated data sets and anytime you can put some knowledge that you have about the world into the architecture of your model it's gonna generally make the models do better especially when the data sets get more complicated recursive neural networks generally they take the same kind of input as the density neural network right so they take in a set of data over time in numbers or vectors of numbers and they output a single number or a list of numbers over time but now here's the difference they actually keep a state that they pass through to themselves so this is a diagram of a simple recursive neural network and it's basically taking a state from the previous a neural network and it's outputting something now what happens inside of these recursive neural networks is different depending on the type that you use so in carrots you'll have a simple recursive neural network you'll have an LS TM and you also have a GRU those tend to be the most common neural networks that you'll see in the wild so let's start with the simple recursive neural network how does that work so our simple recursive neural network it takes in an input in this case it's a single number but it could be a larger dimensional thing and it also passes through some state and also in this case it's a single number but it could be larger later so now it has two inputs one from the outside and one from its previous self it takes those two numbers and it actually does the exact same calculation as a perceptron so weighted sum with an activation function and it outputs a single number it then takes its output and it passes it into the next recursive neural network and now that network takes in a four and it also takes on the output from the previous one and it does the exact same calculation with the same weights and it does that ten times or twenty times or as long as our window is and at the end it outputs a number and we take that output to be its prediction of the next value and we can do all the same things we did with a perceptron or a CNN where we do back propagation in this case it's called back propagation through time and we find the best set of parameters to make this output prediction exactly what we want it to be so let's see how this looks in the code so that was a lot to take it once and you might have missed a little of that but it's actually very easy for us to swap in a simple RNN for the perceptron that we had you can open up RN n dot pi and you'll see that there's only one small change I made before we had flattened and we had a dense layer I added simple RN N and then I have this one number here and what that one number means is that it's output and also the the thing that is passing from cell to cell is a single dimensional thing and that's important because our output dimension is actually a scalar it's a one-dimensional number at each time step so we can run this RN n by typing Python RN n PI and I also save this to save us time so here we have our blue line is the training data of this airline time series and then the orange line here is the actual data and the Green Line is our prediction with the RN n and you can see that this prediction is a lot lot worse it's not printing anything like what the data shows us so here's a great case where we can look at the loss and we can look at the validation loss and we can see that those are both improving so we can let this thing run for a while but actually what happens if we run this over time is that it never learns to actually fit our data so here's what other videos don't show you right this is a nonworking recursive neural network and so what I'm going to show you here which I think is really important is how to debug this problem and how to fix it so one thing I like to do when I'm dealing with a broken neural network is I like to run it on really really simple data so how could we make this airline data even simpler well one way is to use synthetic data so I actually have a little program make - sign up PI where I just output a sine wave I just want to see can this neural network model an actual sine wave so if you go in and you change this load data thing to take in a parameter sine si n now our model is trying to model a sine wave this seems like it should be almost the easiest time series data to model so let's make that change and let's run our program whoa and so you can actually see here that this neural network is not modeling the sine wave at all in fact it's predicting negative 1 for all these values of the sine wave which it never even sees in the data so first of all how is this thing even predicting a negative one how is that even a possible prediction of this thing can make and it actually reminds me that I forgot to tell you something about these neural networks which is they have a new activation function typically and we'll get into why that is but this activation function is called a hyperbolic tangent ok so I don't know how many people remember hyperbolic tangent from maybe their trigonometry class I'm not sure I had thought about it much until I saw it appear in a neural network the important thing to know about hyperbolic tangent is that it's basically like a sigmoid function but instead of going from 0 to 1 it goes from negative 1 to 1 so in this case a really really negative number well I'll put a negative 1 and a really really positive number will output a positive 1 and L STM's and grooves they use hyperbolic tangent activation function like crazy so Kerris actually makes the default activation function this hyperbolic tangent even though it's a simpler end and it probably doesn't really need to have that but now the big issue here the reason that this RNN can't learn such a simple thing is that it's actually only passing across a single parameter right so actually just passing one number from the point in time to point in time is not enough to even learn a pattern as simple as a sine wave function so what we need to do is we need to let it pass through more state more than just one single number so we can actually do that by changing the one into a higher number so let's do that in the code so here on line 62 where we see simpler one let's try changing that to a five so now that's gonna pass across five numbers instead of one and maybe that can encode the state of affairs ah so we get an error this is what the other videos don't show you these errors right so I'm gonna I'm gonna debug this with you but you might want to stop the video and think about why we got this error because this is a really common error to get when dealing with the neural networks of any type right it's a dimension error so it was expecting simple arn tend to have shape five but it got an array with shape one so what happened here so what's going on well so we're outputting a five dimensional thing but actually our because if you look at this diagram right so simple or an end it actually outputs the same thing that it's sending to the next VAT the next cell in the recursive neural network and so it's outputting a five dimensional thing and we can't use that so how do we turn this five dimensional output into a single dimensional output well one way to do it is to actually just add a dense layer at the very end and this is super common right so this is going to do is it'll take the five numbers that this network outputs and then it'll add a final perceptron that does a weighted sum that takes it down to a single output and so we can do that here by just adding a line that says dense so we had modeled that add dense one and we could actually add a different activation function if you wanted to so by default it'll use a linear activation function which will let it out put any number but we're trying to do a sine wave we had a single lines modeled that add dense one that adds our perception at the end and we could actually add a different activation function here in this case our data is normalized to be between zero and one so I think a sigmoid probably makes sense so let's say activation equals sigmoid awesome let's run this Network and so you can see it first this network is actually doing already a much better job of modeling the sine-wave so at first it seems to get it but it kind of dampens over time so it doesn't actually swing as much as the sine-wave swing and this sort of shows you actually how modelling time series is tougher than modeling other things because errors that you make in the beginning of predicting the future then feed in to your model and cause further errors as he pricked out further so this seems to start okay but then get bad but then you know after 100 epochs it's starting to really model the nature of the sine wave so I think we can stop and say that this network architecture is working much better than the previous one that we had and then we can take it we could try it back on the airline data so now that we've debugged our model let's go back to the original data set by just removing the sign here and let's try running it cool and so you can actually see that this is starting to model this airline data much better and over time it gets better and better and captures more and more of the way that the airline sales data set actually really looks and so I know you came here for an LST m and we're gonna go deep or in the next video into how Ellis games really work but I want to show you just as a taste of how easy it is to switch this simple are an end to an LST M so if we go back into the code into our nen pipe all we need to do is change this layer which says simpler an end to LST M go into our terminal and we type Python RN n PI and we're running our first lsdm one thing you'll notice about this LST M right away is that it runs a lot slower than the simple RN n and it runs much much slower than the perceptron and that's because there's a lot going on here and that power is going to be really important in some of the subsequent things that we talked about especially when you run them on text dated because if you think about it one way to look at text data is just a really complicated type of time series data today we saw how to take a recurrent neural network and use it inside Kerris on time series data and we even saw a little bit about how to debug a recurrent neural network and at the very end we swapped in for our simple recurrent neural network in Ellis which we're going to use quite a bit into the future we also learned how to take data and turn it into the format that you need in order to apply any type of time-series algorithm you
Original Description
In this tutorial we are going to look at Recurrent Neural Networks and time series data. In future videos, we are going to show how to take these RNNs and apply them to text data.
Github repo: https://github.com/lukas/ml-class
See all classes: https://wandb.ai/site/tutorials
Weights & Biases: https://wandb.ai/site
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Weights & Biases · Weights & Biases · 10 of 60
1
2
3
4
5
6
7
8
9
▶
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
0. What is machine learning?
Weights & Biases
1. Build Your First Machine Learning Model
Weights & Biases
Intro to ML: Course Overview
Weights & Biases
2. Multi-Layer Perceptrons
Weights & Biases
3. Convolutional Neural Networks
Weights & Biases
Weights & Biases at OpenAI
Weights & Biases
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
4. Autoencoders
Weights & Biases
5. Sentiment Analysis
Weights & Biases
6. Recurrent Neural Networks [RNNs]
Weights & Biases
7. Text Generation using LSTMs and GRUs
Weights & Biases
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
Introducing Weights & Biases
Weights & Biases
10. Seq2Seq Models
Weights & Biases
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
14. Data Augmentation | Keras
Weights & Biases
15. Batch Size and Learning Rate in CNNs
Weights & Biases
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
Organizing ML projects — W&B walkthrough (2020)
Weights & Biases
Brandon Rohrer — Machine Learning in Production for Robots
Weights & Biases
Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Weights & Biases
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
Testing Machine Learning Models with Eric Schles
Weights & Biases
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
Rachael Tatman — Conversational AI and Linguistics
Weights & Biases
Reformer by Han Lee
Weights & Biases
Sequence Models with Pujaa Rajan
Weights & Biases
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Weights & Biases
Jack Clark — Building Trustworthy AI Systems
Weights & Biases
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
Made with ML - Goku Mohandas
Weights & Biases
Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases
More on: Supervised Learning
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI