Autoregressive Process - Applied Time Series Analysis in Python and TensorFlow
Key Takeaways
The video demonstrates applied time series analysis in Python and TensorFlow, covering autoregressive processes, stationarity, and model estimation using the Yule-Walker equation and U-Walker equation. Tools such as numpy, matplotlib, pandas, and TensorFlow are used to generate and analyze AR processes, and to model real-world data like quarterly earnings per share.
Full Transcript
now let's cover the auto regressive model the autoregressive model uses a linear combination of past values of the target to make a prediction since we are talking about auto regression the regression is made against the target itself we refer to the auto regressive model as the arp model where p is the order mathematically an arp model is expressed like this where p is the order c is a constant and epsilon is noise phis here are the parameters or weights the arp model is very flexible in the sense that it can model many different types of time series patterns however keep in mind that the auto regressive model can only be applied to stationary time series which will constrain the range of the parameters phi here you see the simulation of an auto-regressive process of order two we will cover this example in depth and with code in the next lesson if we look at the acf plot we see some oscillation as well as a slow decay this is a hint that if it is not a moving average process and so an auto regressive process must be in play now we look at the pacf the partial autocorrelation function plot and we see that there is no significant peak after lag two therefore the pacf can be used to determine the order of the ar model as a side note the pacf or partial autocorrelation function finds the correlation between the present value and the residuals at a previous slag therefore it finds a correlation that cannot be explained with the acf function so to recap if you plot the acf and you see a decay or sinusoidal pattern then it suggests an auto-regressive process plotting the pacf will allow you to estimate the order of the ar model in this case we saw that it is of order two since after lag two the coefficients are not significant so that's it for the autoregressive model let's cover an example now with python alright so with the theory about the auto-regressive model covered let's apply it now in python as always we'll start off by importing the libraries that we will need so from stats models dot graphics dot tsa plots let's import plot acf and plot pacf then from stats models dot tsa dot arima process let's import arma process now you know that this is going to be useful to simulate an ar process later on then from stats models dot regression dot linear model let's import you walker you'll see later on how this is going to be useful and from stats models dot tsa dot stat tools we will import add fuller of course we need matplotlib dot by plot as plt we'll need pandas this time so import pandas as pd and we'll need numpy as np and of course matplotlib inline awesome and i made a mistake here supposed to be a t stats models all right and with this done i will set the figure size for my notebook so plt rc params figure dot fix size is going to be equal 10 and seven and a half all right and now let's start off by simulating our autoregressive process of order two so simulate ar to process and the process will have the following equation so it will be y t is going to be equal to 0.33 y at t minus 1 plus 0.5 y at t minus two and so this is the equation that we are going to simulate and of course of order two because we have t minus one and t minus two so just like with the m a simulation the moving average uh simulation we start off by initializing our array of coefficients for both the ar portion and the m8 portion so in this case our ar2 is going to be equal to a numpy array sorry numpy array and of course we always start off with the coefficient at lag zero and like i said this is almost exclusively equal to one so the array starts with one and then the coefficient at like 1 is 0.33 and the coefficient at light 2 is 0.5 and now for the ma2 array there's also a numpy array and again at lag zero it's always equal to one but because this time we only want to simulate an auto regressive process we'll put zero and zero for the following coefficients that way we get rid of the m8 portion awesome with that we can generate our samples so ar2 process is going to be equal to arma process pass in both arrays and then generate samples actually it doesn't take an s here so the function is just generate sample and i will set the number of sample equal to a thousand all right and now we are ready to plot our simulated ar2 process so plt.plot ar2 process uh let's give it a title so plt.title is a simulated ar2 process let's uh already zoom in on the graph uh because it will look ugly otherwise so plt.xlim from 0 to 200 and finally we can show the plot and you should get something similar to this again probably not exactly the same like i said the initialization is a bit different every time you run it awesome so now let's take a look at the acf and if you remember well looking at the acf for an auto regressive process should not give us any relevant information so let's see that for ourselves so plot the acf of ar2 process and don't forget your semicolon and you should get the following now as you can see we get one at like zero but then we see this oscillation and decreasing going on as the number of lags increases right and so there is no real information that we can get from the acf plot so now let's take a look at the pacf plot the partial autocorrelation function so here let's uh plot pacf this time of ar2 process and you get the following now as you can see at like zero we have one and that is okay but then only a significant peak up until lag number two and the rest is not significant so indeed you can get the order of an ar process from the pacf and now let's try to model back our simulation to see if we can get the coefficients that we set up earlier and for that we will use the u walker equation this time so i will say that rho and sigma is equal to your walker pass in the ar2 process pass in the order in this case of order two and the method method i will set it to mle and then we can print the results so it will be uh f string here and i will say that rho is going to be equal to the negative of rho because it will give us negative coefficients so we need to take the negative of the negative to bring it back to positive and let's also print sigma which is the variance of the simulated the model so here sigma all right and when we run this i made a mistake here it's not singa but sigma sorry about that when you run this you get the following so we get coefficients of zero point three seven and four seven and if we go back up we have point thirty three and point five so close enough and we get a sigma variance uh very close to one now let's do another simulation this time let's try to simulate an ar process of order three so here quickly simulate uh ar3 process and so feel free at this point to pause the video and try it on your own once i give you the coefficients we will basically go through the exact same steps uh and then model it at the end okay so this time we are going to simulate the following equation so yt will be 0.33 y at t minus 1 plus 0.5 y at t minus 2 and plus 0.07 y at t minus three so going all the way to lag number three and therefore of order three all right so like i said feel free now to pause the video and redo the same process that we did before for the order two but this time with order three as an exercise otherwise follow along with me right now so we start off by defining our array so ar3 is going to be equal to np array and then one for lag zero it's always equal to one and then we go with the coefficients so 0.33 0.5 0.07 and then for the m a portion again we are going to cancel it out so one and then zero zero and zero all right now we are ready to generate the samples so ar three process is equal to arma process pass in both arrays ar3 ma3 dot generate sample and this time i'll generate a bit more samples i will generate 10 000 of them awesome and now we are ready to plot our simulation just to take a look so plt.plot the ar3 uh process and let's give it a title so plt.title the simulated simulated ar3 process let's already zoom in between 0 and 200 and we are ready to show the plot all right and i get something similar to this and now let's take a look at the acf and the pacf and as you would expect the pacf should have only significant peaks all the way to lag number three and after that the peak should not be significant so let's plot the acf first of our ar3 process and then we will plot the pacf of our ar3 process and as expected from the acf there is no relevant information that we can take from this however looking at the pacf like 0 1 2 3 and after that no peaks are significant and with this step done we are now ready to use the u walker equation again to try and get back the coefficients that we set earlier so as before rho and sigma is going to be equal to the fuel walker pass in the ar3 process this time the order is 3 and the method is still mle and then we can print out the results so print let's put in an f string so rho is going to be equal to big negative of rho and sigma so sigma is going to be equal to sigma and we get the following so 0.34 0.49 and 0.07 if you remember well we had 0.3 0.5 0.07 so very close and again we get a variance here sigma of one all right and now i would like to walk through a mini project with you guys where we will start to get our hands uh dirty a little bit with modeling so this will be a mini project mini project and will be about modeling the johnson and johnson quarterly earnings per share also called the eps okay so we are going to read in a data set and we are going to apply the autoregressive model to like i said try to model the earnings per share of johnson and johnson so make sure that you download the csv file it is included as a resource in this lesson so data is going to be equal to pd.read csv and the file is called jj.csv in this case the file is in my the same directory as this notebook so make sure that you include the path to your file here and then we can display the first five rows so data.head and you should get the following so as you can see we have the date column which is uh the date of the quarter and data is the value of the earnings per share so now let's quickly uh show what our data looks like by doing a scatter plot so here i will simply make the plot a bit larger for you guys so plt.figure fix size is going to be equal to 15 by seven and a half so substantially larger and now we are going to create a scatter plot so plt dot scatter and so we want the date on the x-axis and the column data on the y-axis the title is going to be quarterly quarterly eps for uh let's write it j and j here and then let's give it a y label so the y label of course is eps in dollars and the x label is simply going to be the date i will rotate the x ticks so plt.x ticks and then rotation equal to 90 degrees and we are ready to show the plot all right and you should get the following so as you can see it's not a very low and it is increasing over time so we have a trend right here and you might also see a bit of a cyclical behavior right so it's going up and down up and down up and down up and down and so on and so forth all right so now before we start modeling you know that we make we have to make the data stationary and you know that it is not stationary in this case because we are we see an increase right so there is a trend an increasing trend here so the transformation that i will apply to this data set to make it stationary will be to take the log difference so the first step is that we are going to get the logarithm of the eps and then we are going to take the difference so quickly here or write a comment take the log difference and the way to do it so data data is going to be equal to np log of the data column and then we are going to take the difference so data data is going to be equal to data data dot diff so this way we take the difference and because we take the difference we usually lose the first data point right because at the the very first data point cannot be difference with a previous one so we lose it so therefore i'm going to drop the very first data point so data is gonna be go to data dot drop data dot index zero so that way we drop the data point at index zero which would be a nand in this case and then we can display the head of the transform data set and you should get the following so as you can see we still have the date but now the data is the log difference of the earnings per share so now let's plot our new transform data to see if we still have this trend so plt dot figure again i will set it actually no let's not change the size here let's plot it right away so plt dot plot uh the data column plt dot title this will be the log difference of quarterly eps for jnj and let's show the plot right away and you should get the following now as you can see we do not have this trend anymore right so across time it is not going up or down awesome however if you remember well stationarity is both no trend and same variance so as you can see here maybe the variance is not the same across the data set it is very hard to say just by looking at the plot and so that's why we are going to test it statistically to see if we have a stationary data set or not and the way to do that is to use the add fuller test so the add fuller test the null hypothesis is that the data set is not stationary however if you run the add fuller test and you get a p-value less than 0.05 then you can reject the null hypothesis and assume that the time series is stationary so let's run that right away so i will say that add fuller result is equal to add fuller so the add folder test and we run it on the data column and then let's print out the result so it prints out an adf statistic so add fuller statistic and that's going to be under add fuller result at index zero and then we can also print out the p value right so again using an f string and i will print out the p value and that's under add fuller result at index one and when you run the test you should get values similar to this so in my case the idf statistic is very small and negative which is a good sign and looking at the p-value we get 1.3 times 10 to the minus 28 so indeed it is below 0.05 and therefore we can say that the time series is stationary so knowing that now we can take a look at the acf and pacf to see if we can derive the order of the ar process or may process but as you can guess we are talking about the autoregressive model here so we are going to use the ar process so let's take a look at both the acf and pacf so plot the acf of data data and let's also plot the pacf of data data and you get the following so looking at the autocorrelation right no information can be retrieved from here right it is sinusoidal and decreasing however looking at the pacf so as you can see at like zero we get one but then here's significant significant significant but the rest is not really significant anymore and so we can say that well maybe an ar process of order 4 in this case would be a good approximation to modeling the earnings per share for johnson and johnson and so let's try that right away using the u walker equation so here i will say that we will try an ar model of order four and so the way we do it is that rho and sigma is going to be equal to your walker you walker and then now we pass in the data column and of course we want a process of order four and then we can print out the coefficients so let me bring this down a little bit so here an f string so our coefficients uh rho is going to be equal to the negative of rho and let's also print out the variance used to model so sigma is going to be equal to sigma and you get the following so as you can see now we get the coefficients of an ar4 model that models basically the quarterly earnings per share for johnson johnson and so congratulations you have model g your very first real time series data set and that's it for this mini project i hope that you guys enjoyed it so thank you very much for taking this free preview with me as always there is a link in the description below if you want to take the full course the link will have a promo code applied to it already so you can click on the link and you'll get the course with 87 off and if by any chance you click on the link and the promo code has expired feel free to send me an email it will also be in the description and i will send you a coupon code so that you get the course on sale so thank you very much and i'll see you on the next one
Original Description
👉 Get the course at 87% off: https://www.udemy.com/course/applied-time-series-analysis-in-python/?couponCode=TSPYTHON2021
📚 Link to the notebook: https://github.com/marcopeix/AppliedTimeSeriesAnalysisWithPython/blob/main/HOTSAP_AR.ipynb
📚 Link to the dataset: https://github.com/marcopeix/AppliedTimeSeriesAnalysisWithPython/tree/main/data
Email me for a coupon if the one above expired: peixmarco@gmail.com
-----------------------------------
Now, let’s cover the autoregressive model.The autoregressive model uses a linear combination of past values of the target to make a prediction. Since we are talking about autoregression, the regression is made against the target itself.
We refer to the autoregressive model as the AR(p) model, where p is the order.The AR(p) model is very flexible in the sense that it can model many different types of time series patterns. However, keep in mind that the autoregressive model can only be applied to stationary time series, which will constrain the range of the parameters phi.
If we look at the ACF plot, we see some oscillation, as well as a slow decay. This is a hint that it is not a moving average process, and so an autoregressive process must be in play. Now, when we look at the PACF, the partial autocorrelation function plot, then we see that there is no significant peak after lag 2. Therefore, the PACF can be used to determine the order of the AR model.As a side note, the PACF or partial autocorrelation function finds the correlation between the present value and the residuals at a previous lag. Therefore, it finds a correlation that cannot be explained with the ACF.
To recap, if you plot the ACF and you see a decay or a sinusoidal pattern, then it suggests an autoregressive process.Plotting the PACF will allow you to estimate the order of the AR model. In this case, we saw that it is of order 2, since after lag 2, the coefficients are not significant.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Data Science with Marco · Data Science with Marco · 15 of 38
1
2
3
4
5
6
7
8
9
10
11
12
13
14
▶
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Linear Regression in Python | Data Science with Marco
Data Science with Marco
Classification in Python | logistic regression, LDA, QDA | Data Science With Marco
Data Science with Marco
Resampling and Regularization | Data Science with Marco
Data Science with Marco
Decision Trees | Data Science with Marco
Data Science with Marco
Suppor Vector Machine (SVM) in Python | Data Science with Marco
Data Science with Marco
Unsupervised Learning | PCA and Clustering | Data Science with Marco
Data Science with Marco
Data Science Portfolio Project: Regression #1 | Data Science with Marco
Data Science with Marco
Data Science Portfolio Project: Regression #2 | Data Science with Marco
Data Science with Marco
What Are Time Series - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Basic Statistics - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Autocorrelation and White Noise - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Stationarity and Differencing - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Random Walk Model - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Moving Average Process - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Autoregressive Process - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
ARMA Model - Time Series Analysis in Python and TensorFlow
Data Science with Marco
What is data science?
Data Science with Marco
Answering DATA SCIENCE questions #1 - Why learn SQL when Python and R exist?
Data Science with Marco
R vs Python in the Industry - Data Science Q&A #datascience #datasciencecareer #careeradvice
Data Science with Marco
Data science or data engineering - which is best for you? #datascience #datasciencecareer
Data Science with Marco
Where to find data for data science projetcs? #datascience #datasciencecareer
Data Science with Marco
Data science certificates on resume? #datascience #datasciencecareer #careeradvice
Data Science with Marco
Should you aim for data science or data engineering? | Data Science Q&A #1
Data Science with Marco
Don't waste time on this | #datascience #datasciencecareer
Data Science with Marco
Low-code AI tools - are they good? | #datascience #datasciencecareer #careeradvice
Data Science With Marco
How to grow as a data scientist after 2+ years of experience? #datascience #datasciencecareer
Data Science with Marco
Transition into DATA SCIENCE without a masters or bootcamp #careertransition
Data Science With Marco
How to improve your data science profile?
Data Science With Marco
How to learn Python for data science?
Data Science With Marco
Does Scrum/Agile work for data science?
Data Science With Marco
What are the major roles in analytics and how to choose?
Data Science with Marco
Thoughts and advice for a live SQL coding round
Data Science With Marco
Data science interview question: difference between type 1 and type 2 error
Data Science With Marco
Feature selection in machine learning | Full course
Data Science With Marco
Anomaly detection in time series with Python | Data Science with Marco
Data Science With Marco
Podcast - TimeGPT, predicting the future, and more
Data Science With Marco
Big announcement - Revealing my new book
Data Science With Marco
Get Started in Time Series Forecasting in Python | Full Course
Data Science With Marco
More on: Research Methods
View skill →Related Reads
📰
📰
📰
📰
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
🎓
Tutor Explanation
DeepCamp AI