Get Started in Time Series Forecasting in Python | Full Course

Data Science With Marco · Beginner ·📊 Data Analytics & Business Intelligence ·1y ago

Skills: ML Maths Basics90%Supervised Learning80%ML Pipelines70%

Key Takeaways

This video course teaches time series forecasting in Python using statistical models such as ARMA and SARMA, and covers topics such as baseline models, cross-validation, and exogenous features. The course uses libraries such as statsforecast and utilsforecast, and provides hands-on examples and code snippets to illustrate key concepts.

Full Transcript

Hello and welcome to this video. If you are new here, let me just quickly introduce myself. My name is Marco and I am passionate about teaching and about time series forecasting. Okay, so I currently work at NIXLA which is in my opinion and even before I joined the company one of the leaders in open source time series uh forecasting software. uh and over there I get to work on neural forecast which is our package for forecasting with deep learning models and also get to work on time GPT so very cool stuff uh I also wrote a book on the subject as you can see here time series forecasting in Python so I wrote this book I'm very proud there's actually a second one coming at the end of 2025 u and there's also a blog uh that I have where I share my best tutorials on forecasting with Python now enough about me Okay, if you are here, it means that you are interested in time series forecasting and this is the best starting point for you. Okay, we are going to cover everything that you need to know to to know get started in the field of time series forecasting. So, perfect video if you know how to code in Python, you've done some data science projects here and there, but if you've never really handled time series data before, don't worry. We're going to cover everything that you need right now. So, let's not waste any more time. Let's jump right in. So, a good starting point when learning forecasting for the first time is by studying the statistical models. Okay? First of all, because they're still good models. They're still relevant today. And because it also builds the foundation knowledge that you need to then start using more advanced machine learning and deep learning techniques. All right? And of course, if you are interested in those, you know, more advanced, more recent techniques, uh, let me know in the comments or by liking this video. So on the agenda for this lesson, we will quickly explore the fundamentals of time series data. Okay, so defining them, looking at its components and then we are going to develop baseline models. All right, those are super simple models but very important to have and then we start forecasting with the ARMA model. All right, after that we are going to explore some more advanced techniques in forecasting. Uh so things like cross validation, working with exogenous features, all right, generating prediction intervals and finally we are going to see how we can evaluate those forecasting models. And so as you can see from this video, we're going to get a complete picture of forecasting only here we are focusing mostly on using statistical models and of course everything will be coded in Python. So let's start by defining time series. All right. And a time series is super simple. It's a set of data points ordered in time. And ideally, the data is equally spaced in time. So you have, you know, data every minute or every hour. And so there are many examples of time series, right, that we see dayto-day. For example, the closing price of a stock, the electricity consumption of a of a household, or even the temperature outside. Okay? So as long as your data is indexed in time, it can be viewed as a time series. And here for example I show you uh an example of time series which in this case tracks the monthly milk production in Australia. So every month we are going to record how much milk is produced. Uh and now you know take some times to analyze this graph. Okay feel free to pause the video and analyze this graph a little bit and notice that there are two things right. We see there's a trend in data. So the milk uh production is increasing over time and there's also a repeated pattern right? So we see it's like peaking and then dropping down, peaking and then dropping down and this is being repeated every time, right? And so this pattern is what we call the seasonality. Okay, it represents a cycle that is repeated at a fixed time intervals and we can actually decompose most time series into the components I just mentioned. Okay. So at the very top here on the graph uh you see it is our original data. And then the graph right below it it is the trend. Okay. And the trend we define this as like the general direction of our series. Okay. And so we see that the trend is slowly increasing over time. Right. And then the third graph uh that's what the uh this is where we have the seasonality. Okay. Which is again the pattern being repeated at fixed time intervals. Okay. So we see the same pattern being simply repeated time and time again across our series. And then finally at the bottom, so whatever is not explained by the trend and the seasonality is what we call the residuals. Okay? And those represent quick changes in your series. Okay? Uh that are not explained by the first two components, right? And ideally those changes are completely random. All right? So when you are forecasting time series data, you're really trying to uh model and forecast you know the trend and the seasonality. All right, because those are mathematical components that you can actually model and forecast in the future. Okay, assuming that your residuals are completely random. Um this is stuff that is not captured by a model, right? It is impossible for us to forecast like to predict a next random value, right? because by definition it is random. Okay. And so this is why our forecast will never be perfect. Okay. So we can model the trend, we can model the seasonality, but then the residuals it's always going to be the errors uh left after our models. All right. So with all of that in mind, let's start forecasting using baseline models. And a baseline model is crucial. Okay. It is a very simple model based on some kind of huristic or statistic. Okay. usually includes uh you know using the mean of your series uh using the last known value. This is what we call the naive forecast where we can simply repeat the last season of data which we call the seasonal naive forecast. And so to give you an example here are some baseline models for our milk production data set. Okay. So here uh the first forecast that you see so what what I said what what I did is I reserved the last 12 time steps uh just to say you know we're going to forecast the last 12 time steps and we can compare the predictions versus the actual values right so the line that you see right here uh this is simply the historical mean okay for the next 12 months so what we did is took the entire data set we calculated the mean and then we said well that value is going to be the same for the next 12 months all right nothing exciting is just a flight line, right? That represents the average of the series. But this is one of the baseline models that you can use in forecasting. Now, we can be slightly smarter about it, right? We can realize, well, data that is very old, you know, very uh away in history might not be as interesting uh as the most recent period, right? Because, you know, I know I have a trend, my milk production is increasing. Why would I care about what happened in uh 1960s when I'm trying to forecast 1976? Okay. And so uh you can also take the mean over a more recent period. And this is what I did here. So here I'm using the mean of the last year only. And already we can see it's a better forecast, right? Because now we have some overlap with the actual values. And like I said, it makes sense. Okay? What happened a long time ago might not be as useful to forecast the future. Okay? Uh and then we add the naive forecast which here is simply repeating the last known value into the future. Okay. Uh super simple baseline very useful uh often used as well. And finally we can simply repeat the last season of the data. And so in this case uh a season it lasts 12 months. All right. So because we have monthly data the season is being repeated every year. So what we I did is I simply took the last 12 months of data and I repeated it into the future. And as you can see uh we can barely see the line uh on the graph, right? Because it overlaps almost perfectly with the actual data. So the lesson here is make sure to have a baseline model and also be smart about the baseline model that you are using because it can be a really good model. Okay? and your baseline must be really good so that you're really challenging your more advanced models that you're going to spend time tweaking and tuning and selecting. Okay, so it's really important to have a good baseline so that you then you can make a very good comparison with your more advanced models and then justify like yes this model that I designed is much better than something really simple and naive. Okay, so um at this point it's time to jump into some code. Okay. So, write some Python code uh to to make those baseline models come to life and then we are going uh to come back to the slides to learn a little bit more theory. All right. So, let's get started with some code. I am right now working in a notebook. So, if you want some instructions on how to set up the environment to reproduce the results I am going to show you, I have the little instructions right here. So either for local development so you can create your own environment and install the required dependencies or the easiest way would be to start a Google Collab notebook and then just install the following dependencies. Uh don't forget that on Google Collab you already have the basic stuff like pandas numpy mattplot lab etc. So really you only need to install stats forecast and utils forecast. And of course at any point if you want to see the full solutions uh you can head to the um GitHub repository. Of course all the links are going to be available below the video. So we are doing uh this one right here yt03 forecasting stats. So this is the entire solution and the data set is in the data folder. So the daily sales French bakery.csv file. This is the data that we're are going to be using uh for this tutorial on forecasting. So let's not waste any more time and import our dependencies. So of course I'm importing the usual numpy, pandas and mattplot lib. I am also using utils forecast. So utils forecast this is going to be very useful for us to uh plot our series also to evaluate them using different metrics and uh the metrics are coming from utils forecast.losses. So of course more information on that as we go through uh the tutorial. Then I am going to read my data and in this case uh so right I am reading the data locally. Okay. But of course you can put the URL of the GitHub uh repository for the CSV file. Right? So you can always replace this data/aily sales. You know, so if you just go here in data and then you click on the CSV file and then once that loads, you can click on raw and then you can just copy paste the URL that you see and paste it here, right? And it's going to do exactly the same. So here we're just doing a little filtering. So I am just taking the series that have more than 28 time steps. uh this is going to be useful for us because we're going to be doing some uh testing and cross validation as we go through this video. So this is why I am filtering right now and then I am also dropping a column called unit price. Uh we'll go back to it in a moment when we are ready in the tutorial and you should get the following. So we have three columns unique ID. So this simply identifies uh what series we are looking at. uh and of course keep in mind we are looking at uh the sales of a French bakery. Then we have the ds which is the date stamp so simply the date and y which is the value so the volume of the sales for a particular product. Okay. Now in this case there are many many products in this data set. Okay. And here I am just plotting two of them. So we have the sales of baguette and so as you can see uh we have the sales throughout uh many months. This is a daily data set right? So every day we have a uh new data point as you can see sometimes it falls to zero sometimes it's not really equal to zero and then also here for baguettes the scale is different right so we oscillate between zero and 100 whereas for quasonants we are oscillating between zero and 300 so something to keep in mind as well and then the nice thing about this plot series function that comes from utils forecast is that you can zoom in using this max in sample length equal to 56. So now as you can see now we are only plotting the last 56 time steps for both baguettes and quasa. All right so I am not spending too much time explaining the plotting functions. Okay. We are going to be doing plots of course but I don't want to spend too much time on them uh simply because I want to focus this tutorial on forecasting time series right and using uh stats forecast to make good forecasting models. Uh so feel free to you know to pause the video and study the the plotting code a little bit more. Uh but again this won't be the focus of this tutorial. So with all of that being said let's get started with some baseline models. And so in this case we are going to be using stats forecast of course. So this is in my opinion one of the best and of and also the fastest uh implementations of uh statistical forecasting models. So definitely something you should be using and luckily for us they have all of the baseline models already implemented. So the naive historic average window average and seasonal naive. All right. So let's take a look at how we can define them for this um situation. I will set the horizon equal to seven because we have daily data. I think it is reasonable to expect uh to forecast the sales for next week. So for the next 7 days, but of course this is entirely uh subjective. Okay, I decided that I don't know if this data set was necessarily built uh for a forecast horizon of 7 days, but we'll just go with this. And then the way this works is we are going to define a list of models. So with stats forecast, we can fit many models at the same time. Uh and we can just pass a list of the models that we want to fit uh right away. So we will use a naive model. So remember that the naive will simply forecast the last time known time step. Okay. Then we can have the historic average. So this is going to take the average of the entire history of our series and then use that value as a forecast. We can also have the window average. So window average and then you can specify the window size. So how many time steps do you want to take into account to calculate that average? and then forecast that value. So here in this case I will set a window size of seven. This means that I will look at my data set and then I'll take the average of the last seven days and then forecast that into the future. All right. And then finally we have the seasonal naive uh model. And now we need to specify the seasonal length. Uh so here in this case what is the length of my season? So if I go back up here uh clearly we see some kind of seasonality every seven days. This is uh pretty usual for daily data, right? Uh so in this case we have a seasonality of one week. One week is 7 days. So the season length is also going to be equal to 7. Great. Now that this is done, we are going to initialize our stats forecast object. So this is going to be the object responsible for training, fitting, predicting, cross validation when we get there, etc. So we always start with that and then we pass in the list of models that we want to fit. So here in this case models is going to be equal to models and then you specify the frequency of your data. So here in this case the frequency is daily. So we pass in capital D. All right. So this basically takes the frequency uh strings um similar to pandas. All right. Once that this is done we're going to call the fit function. So you're going to fit and we're going to fit on the entire data frame like so. And then you can make some predictions. So I'm going to say the print is going to equal to SF.redict and then you need to specify the horizon. And so here in this case the horizon is equal to 7. And we can run this. As you can see this is almost instantaneous, right? This makes sense. There's no real fitting for these models. Right? We're using very simple statistics and logic to make very naive predictions. Uh but you can see what the predictions look like. Right? So as you can see for each uh time series in our data, right? Uh we have made predictions using the naive method, historic average, window average, seasonal naive and so on and so forth. Okay. And of course we can now plot those predictions. In this case again I am plotting the predictions only for baguette and quason. and you get the following. Now, of course, this is nothing amazing, nothing exciting, right? Three of our forecasts are simply horizontal lines. This makes sense, right? This is the naive historic average and window average. And then we have the seasonal naive uh which simply repeats the last 7 days of our data. All right, so again, nothing crazy, but uh this uh is potentially your first forecast, right? If you are getting started uh in forecasting time series, so you have made your very first forecasts. Now we have predicted the future, right? However, we do not know how those models are performing, right? Because we made predictions for time steps for which we don't know uh the actual values. So we cannot compare the predictions against actual values. So let's redo this exercise, but now we're going to do a test train split. Okay. So I'm going to say that test is going to be equal to DF dot group by we're going to group by unique ID and then I'm going to take the last seven time steps uh as a test set. Okay, so I'm for each unique ID in my data set. I'm going to reserve the last seven time steps for the test set. And then of course train is going to be uh the rest of that. So it's going to be df.drop. We're going to drop test.index index and then we are going to reset the index like so. Great. And then once that this is done, we can uh redo our fit and predict. So I'm going to do SF.fit. This time I'm passing the train as the data frame. And then my predictions are going to equal to SF.predict where h is going to be equal to horizon. uh and then we're going to create our evaluation data frame where I have both the actual values and the predictions. So I'm going to say that evaluation df is going to be equal to so we're going to merge the two data frames. So PDM merge we're going to merge the test data frame as well as the predictions data frame. We're going to do a left merge and we're merging it on the date stamp and the unique ID like so. So let's run this again. Super fast. This is done. Uh and now we have our evaluation data frame. So now how are we going to evaluate? We're going to use utils forecast. So we're going to say that evaluation is going to be equal to the evaluate method. So this method comes from utils forecast. uh we need to pass the data frame. So the data frame that contains both the predictions and the actual values and then we can specify what metrics do we want. In this case, we're going to use the MAE. All right, the MAE is something. Let me scroll back up quickly. This is what we imported here. So I imported star. So I am importing all of the available losses from utils forecast. MAE is part of it, which is why I have access to it right here. All right. And the MAE super simple is the mean absolute error. Okay. So we just take the uh average of the absolute distance between the predictions and the actual values and then we take an average of all of that. So this is what we are going to use right here and then you can uh inspect the evaluation data frame and you should get the following. All right. So as you can see uh for each series we now have the MAE for each of our method. So for the baguette uh naive got an enemy of 17, historic average an enemy of five, window of seven and then seasonal naive 12. And you see we get that for every unique ID uh of our data set for each model. Now uh we have a lot of of time series like I said we have a lot of unique ids. So uh something that would be interesting would be to average across all of the series right to get um like an overall idea of the performance of each model. So I'm going to say that evaluation is going to be equal to evaluation. Uh and then we are going to drop the unique ID axis is going to be equal to one and then we are going to group by the metric and then we're going to take an average of that and then we'll reset the index and now we can display evaluation. So this will give us the average of the mee for each unique ID uh and for each model. So once you do this we get the following. So now we can see that naive on average across all of s all of the series gets an ME of six historic average five window average five and so on. Okay. And then right here like I said this is some plotting uh code that I already have written out just so that we save some time here. And you get the following. So it's just the same metrics that I showed above but shown as a bar plot. So clearly as you can see seasonal naive achieves the lowest MAE across all series. Uh this makes sense right our series from what we have inspected they do have some seasonality to it. So again you can see that with seasonal data using the seasonal naive baseline is usually the best baseline and so this will serve as a benchmark uh for our future experiments throughout this tutorial. All right so that's it. That's it for this portion of the code. Let's go back to the slides and now learn about the ARMA model. All right so at this point we have used Python to forecast using baseline models. Okay, I know nothing super exciting just yet, but now let's introduce a more advanced model and so here uh there are of course many statistical models that we can use. All right, and each of them have has its own strengths and limitations. All right, so the ARMA model, the one that we are going to uh look at in this video is one of the most fundamental models. Okay, and it can handle seasonal data and exogenous features, which is why I decided to use it for this tutorial. Uh we also have exponential smoothing works well on seasonal data. We have the theta model which uses decomposition to make predictions. Um this actually a very good model and recently has won uh not not that it has won but it has come very close to winning in a forecasting competition. Um then we have some more advanced models like MSDL and TBATs. Those models they can uh take into account multiple seasonal periods. And then we have models like crossen, Imappa and TSB which are specifically built for intermittent time series. So that means when you have data with zero values and then nonzero values uh super uh important there. Now for this video we're going to focus only on ARMA like I said. Uh so if you want to learn about the other models and more advanced techniques then I suggest that you check out my course applied time series forecasting in Python. uh there's a link with a discount code in the description and in that course we are going to cover all of the models and of course much more advanced techniques uh so it's there if you are interested and you are uh willing to really master time series forecasting I suggest that you take a look at this course but for now like I said we're going to explore ARMA so ARMA stands for auto reggressive integrated moving average model so AR is the auto reggressive portion. I is what we call the order of integration and MA is the moving average portion. And so with ARMA, you can really forecast any series that has a trend and a single seasonal period. Okay, so if your series has more than one seasonality, you have to use models like MSDL or TBAs. But if it only has a single seasonality, ARMA can handle that. So let's break it down a little bit more. Okay, AR the auto reggressive model. It is basically a regression against itself. So it is really saying that future values depend on past values. It's pretty logical, right? And the mathematical notation uh is a RP where P is what we call the auto reggression order. Now auto reggression order simply controls how many past values do we consider. Okay, so if P is equal to 1, then as you can see, we only do regression against YT minus one. So current time step is YT, YT minus one, the time step before. If we set P equals two, then we look at the past two time steps. So YT minus one and YT minus 2. Simple as that. Next, we have the moving average model, the MA portion. Okay, so MA says that future values depend on the present and past error terms. So basically this model is saying that the series has some kind of an average value and then it is bouncing up and down from that average uh due to some random error terms. Okay. And those error terms are what we're going to use to forecast the series. And again mathematically we denote it as MAQ where Q is the order of the moving average process. So here uh if Q is equal to 1 that means that you are looking you know at the past error terms. So at T minus one and if Q is equal to two you're looking at the two past error terms T minus one T minus 2. Fairly simple. Now when we combined the auto reggressive and the moving average models we get an ARMA model. Okay. Problem is this model only works with stationary series. Okay. Meaning that uh the series has no trend and no seasonality mostly. Okay. Not very useful. All right. We are mostly interested in series with some kind of seasonal pattern and some kind of trend. Right. And so if you want to use the Arma model, you have to make transformations to your series. So we don't really use that anymore. So we use the integration order I okay which will internally transform the series to make it stationary. So that way you don't have to make the transformations. You can simply model non-stationary series directly with the ARMA model. All right. And usually D is set to a value of either zero, one or two. Um but anyway, we don't spend too much time on those parameters because as you will see now we have methods that automatically optimize those parameters for ourselves. Okay. So let's not uh spend too much time here um trying to know what the best value of each parameter is. Now ARMA is great but if you want to model seasonal series we adapt the ARMA model and call it the serma. Okay, so the seasonal arma and this is going to include new seasonal patterns. Okay, uh sorry, seasonal parameters, right? So now you have capital P which is the seasonal auto reggressive uh order. Okay, and it's going to look at multiples of the seasonal period. All right, so we're going to see in a minute what that means. And then same thing for capital D. So the seasonal order of integration and capital Q the seasonal order uh of auto of the moving average portion. All right. And then you have the frequency. Super important. Okay. The frequency is the number of observations per cycle. All right. So that was a lot of information. Let's take a look at an example to understand all of that. So we go back to our example of milk production. Okay. So here our period or the length of a season right is really 12 months. Okay, because as you can see we have monthly data right on the graph and that pattern is being repeated every 12 months. Okay. So our season length our frequency is equal to 12 12 data points. All right. So if we have now a capital P, okay, of value one, it means I'm going to take a look at the value at T minus 12. Okay. So what was my value one year before or 12 months before? Okay, same thing for capital Q. So what was the error term 12 months before the present time step? Okay, so this is how those seasonal um orders and parameters work. So with all of that in mind, let's take a break from theory, go back into the code and start modeling with uh the ARMA model, okay? Because in practice, all you really have to do is set the frequency of the data, okay? Or the season length. All right, the rest we're going to use what we call the auto arma function. uh that is going to find all of the optimal optimal parameter combinations of you know small P capital P small Q capital Q and even um lowercase D capital D right so it's going to automate that for us and select the best combinations automatically for us so really all we have to do is provide the seasonal length and that's it all right so let's jump back into the code all right and we are back into the code and now we are going to forecast using the ARMA model And specifically we are going to use the auto arma function because like I said we are not going to um optimize the parameters ourselves by hand. Now we have autoarma functions that can do all of this optimization for us. Uh and so really we should use it. So first of all I am going to define a subset of my data set. Why am I doing that? This is because uh with statistical models when you have multiple series you are fitting one model per series. Okay. And in this case we have quite a few of unique ids. I don't remember the exact number but we have quite a few hundreds of unique series. Okay. And so while stats forecast is very fast. Okay. ARMA itself is not that fast. There are other models that are much faster than ARMA. So exponential smoothing, MSTL and stuff like that they will be much faster to optimize than ARMA. So in this case we are going to be working with only baguette and quason just so that you don't have to stare at your computers uh for you know many minutes waiting for a model to train. Okay. Uh so this is the only reason I am doing this but of course feel free to train on everything or select other unique ids. Really the goal here is to make this project your own. All right. So now I'm going to define a subset of the train set. So I'm going to call it small train. So small train is going to be uh the training set that we had earlier where the unique ID is going to be in unique ids. And then we'll do the same thing for test. So small test is going to be equal to test where test dot unique ID is in unique ids. Great. Once that this is done, we are going to define our list of models. Okay. And here I am going to try out uh two different models. Okay. Uh so I'm going to try out the auto arma but I'm going to set seasonal equal to false. Okay. And the allias. So the name of this model I'm going to call it arma. All right. So I just want to make this little experiment with you where we are going to fit either an ARMA model. So without the seasonal component and we're also going to fit an auto arma model where now we allow the seasonal component. So we specify the seasonal uh length in this case. Right? And so we are going to see which model version performs best in our scenario. Now you should have some kind of a intuition that because of seasonal naive it already performed better than the other models. likely using a seasonal ARMA model is going to work better as well. But still, let's just do this experiment right now. So you can also see how you can do that yourselves later on. So after that, of course, initialize your stats forecast object, pass in your list of models and then the frequency is still um daily. After that, you are going to fit on the small training set, right? Otherwise, like I said, it's going to take quite a bit of time. And then we are going to make some predictions. So, I'm going to say that ARMA PRS is going to be equal to SF.redict and H. of course, we keep our horizon of 7 days. Once we have our predictions, we can make our evaluation data frame, right? So ARMA eval df is going to be equal to pdmer merge the arma pres with the eval df. Uh and then we're going to do an inner join and we also join on ds and of course unique ID. So that way we are going to be able to compare um the uh predictions of all of our baseline models plus now the ARMA and serma model. So after that we can evaluate. So ARMA eval is going to be equal to evaluate and then we pass the ARMA eval df and the metrics will stick with me. All right, simple as that. So, uh after that we can uh display the evaluation data frame. So, let's run this. Like I said, this is going to take a bit of time because it's going to run the optimization of ARMA and Serma for both series. Uh so, like I said, it's going to take a few minutes. So, I will be back once it is done. And I made a small typo here. metrics should be I forgot the equal sign for me. So sorry about that. It didn't take that long actually to fit everything uh but I made this little typo. So now I will be back once this is done. All right. And you get the following. So as you can see now we have our evaluation data frame for our both uh series baguette and quason. And we have now the metrics for ARMA sera and all of our baseline models. So again this is per series. So if you want to have the average for everything, we can run the same logic as we did earlier. Uh and now you get the metrics for the models overall right for all of the series. Uh and you can see basically that surma is the best model achieves an ME of 8.9 uh ARMA 11.9 but still both models are now better than any of the uh baseline models that we used earlier. All right. uh you can also plot the series if you want. So now we can visualize the kind of forecast that we got using Serma and ARMA. And as you can see with Serema which is the yellow uh line, we get a much better sense of uh the peaks, right? We were able to forecast that better versus ARMA where we, you know, we force the model not to use any seasonal components. And as you can see now, it is struggling much more to make accurate forecasts. But again it makes sense right we have a seasonal Siri so it makes sense to use the cerema model I just wanted to make this experiment with you so that you can see uh those types of results and how it looks like and how you can do it as well. So then comparing the metrics uh for all of our methods that we've used right now again keep in mind we are only evaluating on the last seven time steps of our data. Uh so here of course the worst model is the naive model achieves the highest MAE and now our best model right now is Serema model but like I said we are only evaluating on the last seven time steps. This is not a great uh test set right. So a much better way to evaluate our forecasting models is to run cross validation. So let's go back to the slides, learn about cross validation and then we'll come back to the code to implement it. All right. So we have done some forecasting with ARMA. We even compared ARMA versus SARMA. But now let's explore one of the most important concepts in time series forecasting and data science in general really which is cross validation. So by now you might have noticed right evaluating on a single forecast period right so on you know just a few days for example is definitely not enough right there are too few data points so it's not really representative of the model's performance and so this is why we need a more uh reliable and robust evaluation and this is done through cross validation. Now uh if you have some background in data science or machine learning right cross validation with time series data is a little bit different because now our data is ordered in time and so we need to keep that order fixed in time. Okay, you cannot start shuffling your data around. Okay, it has to stay in that order. And so uh with cross validation in time series, okay, we start with an input window. This is what you see the darker window here on the left. And then you are going to make prediction over a certain horizon. So the light blue window that you see. And then once this is done, you're going to update your input sequence. And then you are going to forecast the next window. All right? And that process is repeated until you either run out of data or you reach the maximum number of windows that you desire. All right? And so as you can see, uh this this really mimics the situation where I'm going to forecast uh a certain horizon. I'm going to wait for new data to come in. And now I'm going to use this new data as an input to my model to make the next set of forecasts. All right? And this is how you should really evaluate your forecasting models. Okay? So you need to use cross validation to forecast multiple windows. And now you get you know many windows of predictions versus actual values and then you can evaluate your forecasting model and you get a much better picture uh of its performance and how it would perform in real life. All right so uh fairly short okay but that is it this is cross validation. So let's go back into the code and see how we can apply this method. All right. So let's run cross validation such that we have a much more uh robust and representative evaluation of our models. Okay. So here in this case for cross validation we don't need a train test split anymore because we're going to be using different windows of forecast to make the evaluation of our models. So I actually want to pass the entire data set to cross validation because that function internally will take care of creating the windows making forecast updating the input making the next forecast and so on and so forth. So I will say that small df is going to be equal to the f where the unique ID is in the small list of unique ids that we uh defined earlier. Right? So, baguette and then we're going to specify the models again. So, I'm going to say that models is going to be equal to uh in this case we'll use the seasonal naive because this was the best u benchmark as we saw. So, I'm going to say that season length is going to be equal to seven. Then we're also going to try the auto arma model. I'm going to say that's seasonal. Uh let's set it to false this one. So again, we're going to use uh an ARMA model even though I think this one will not perform very well. And then we'll also use the auto arma, but this time we'll set we'll allow this to be um to be a seasonal model. So season length is going to be equal to seven. And the allias, let's set it to serma. Of course, this must be a string. All right. Then once that this is done, you initialize the stats forecast object. By now, you should be comfortable with it. So you pass in your list of models and then your frequency is still daily. And now we are ready to run cross validation. So I'm going to say that my CVDF, so my cross validation data frame is going to be equal to SF dot cross validation. So now let's take a look at the cross validation method. First we need to define the horizon. Okay. So how many time steps are you forecasting? So in this case it's still seven days. You pass in your data frame and this time I am passing the entire data frame because like I said internally it will make those windows for us. Okay. So speaking of windows, how many windows do we want? In this case I'm going to set it to eight. Why am I doing eight? Because with an horizon of seven, we have eight windows of seven forecast. 8 time 7 is 56. Um, and so with that, uh, you get at least more than 50 points, right? So 56 points per series to evaluate. So this starts to be a reasonable sample size to get an evaluation of your series. Of course, the more windows the better, right? But here in this case, I am sticking to eight. And then we're going to set the step size. So the step size uh this basically determines the uh distance between the starting date of each window. All right. So here I like to set the step size equal to the horizon. That way we do not get overlapping windows. Okay. So the windows are going to be uh one after the other. If your step size is less than your horizon, you will have overlapping windows. All right? It's not necessarily a problem. just know that uh you will have more points you know more time steps being forecasted more times okay um and so you might give more weight to those time steps when making an evaluation okay so just something to keep in mind usually I believe that for most situations using the step size equals to the horizon so you have nonover overlapping windows is the best way to go it's also going to be the fastest uh way of running cross validation so I think you should stick with And then you have the refit parameter. So this is something entirely up to you. The refit parameter determines if you want to reoptimize your model like refit your model every time the input size changes. Okay? And that really depends on your use case. If you think that your model uh is going to be updated every time you're going to make forecast and every time you get uh new uh actual data, then you should set refit equals to true. However, if you think that you're going to train your model once and then you're going to run forecast, you know, for a while and then just after, I don't know, three or four months of the model being in production, then you are going to refit, then you should set this parameter to false. Okay? Of course, sending it to false uh makes the entire process much faster. setting it to true makes it makes it slower because of course you're going to be uh you're going to be re-optimizing refitting your model for every window uh but then it might give some uh better um better performances right and here clearly I have made a typo so it was stats forecast sorry about that uh great so now uh this is going to take a bit of time to run because again we are going to fitting two ARMA models per series, but we're also doing that multiple times because we said refit is equal to true. So we're doing that for every new window of data. Uh so we will be back once this is done running. All right. And now our cross validation is over. It is done running. We can of course make some visualization of the predictions that were made and we should get the following. It's a little bit hard to see on the screen what is happening, right? But basically the yellow line is still the Serma model. We have the naive as a um as a slightly lighter blue and then we have the ARMA model as some kind of this blue green. Okay, but anyway, all that to show that we have now made predictions over a longer period, right? Over multiple windows I should say. So it's still 7 days forecast but we've done that multiple times eight times to be precise. So now we get more points to compare right um against the actual values so that we have a better evaluation. So let's run that evaluation right away. So I'm going to say that CV eval is going to be equal to evaluate and then you're going to pass the uh cross validation data frame. But in this case I'm going to drop uh two columns. We'll drop the cutoff column. Access is equal to one. Um, so sorry I said we're going to dropping we're dropping two columns. In this case, we're only dropping one. So we're dropping the cutoff. Okay, the cutoff simply says this is where the model stopped uh training. Um, so so yeah, so you should see basically uh this is the for so the forecast for the 6th of August. So it stopped at the 5th of August and same thing for the 7th of August. It stopped training at the 5th of August. Okay, so this is what the cutoff date says. It just says where the model stopped training. So stopped seeing actual data. So we're going to drop that column because we don't need it. And then I will say that metrics is going to be equal to MAE as usual. And then I will say that my CV eval is going to be equal to CV eval. By now you know what we're going to do. We're going to drop the unique ID. And then we're going to group by the metric. We're going to take the average and we're going to reset the index. And then we can display the evaluation and we get the following. Okay. Now, of course, we can visualize that in this bar plot and you should get the following result. Now, interestingly, the ARMA model, so the nonseasonal ARMA model is worse than seasonal naive. Okay, so here that's super interesting, right? So, this is why we need to have good baseline models. Okay, the seasonal naive is actually better than our trained and optimized nonseasonal ARMA model. All right. And also now I hope that you see the importance of doing cross validation because before we thought ARMA was better than seasonal naive but now when comparing multiple forecast windows so on more data points we are seeing that on average seasonal naive is actually better than ARMA. However, okay, so our seasonal ARMA model is better than seasonal naive and of course better than ARMA. This makes sense. We have seasonal data. It was expected that this model worked best. And now we have this confirmation as well, right? That even on multiple forecast windows, SURMA still performs better than our baseline. All right, so that's it for cross validation. Again, cross validation is a critical critical point and critical, you know, method to know when doing uh when doing time series forecasting for more robust and representative evaluations. So that's it for cross validation. Next step is working with exogenous features. Okay, so up until now we have only used past values of the series to make forecasts. That's great. It works really well, right? But it is possible that we may benefit from using information of external variables or exogenous features. So let's see how we can do that right here. So like I said, exogenous features are external variables. Okay, they are not part of your time series itself. So it could be an entire series okay completely uh it could be the dates of upcoming holidays uh or it could be any external factor to your series all right so for example uh you know let's take the example of forecasting the electricity consumption right now it might make uh it might make sense uh to include information about the temperature outside okay or the wind speed or even the hour of the day because for example you know colder days means that people are going to turn on the heating and so consumption goes up. Or even the hour of the day, right? If you know that people they come back to work at around 5 or 6 p.m. uh then they are likely to start doing chores, you know, so starting the oven, cooking, uh starting the washing machine, and so again, your consumption is going up. Okay, so as you can see, exogenous features can be very important uh and can be good features to your model and can improve your forecasts. And there are really three types of exogenous features in time series forecasting. We have what we call the static feature. So that one stays constant in time. So something like a product category. Uh then we have the historical feature. So this is when we know the value in the past but not into the future. Okay. So something like uh the price of gas for example. Okay. You know what was the historical price of gas but you don't know what it's going to be in the future unless you forecast it. But we'll see in a minute why that might be a little bit dangerous. Okay. And then you have uh the future uh exogenous features. Okay. So those are um are features for which you know the values in the past and also we know them for certain in the future. So an example of that is uh the dates of national holidays. Okay. We know when were the holidays in the past and we know for certain when will be the next upcoming holidays. All right. And so with this with statistical models, they uh they need only support actually future features. Okay. So to use features with a statistical with statistical models, sorry, you need to know their values in the past and also provide the future values over your forecast horizon. Now like I said, you could technically forecast a feature to use it. Okay? However, you might amplify the error. Okay? Because we know our forecasts are not perfect, right? So whatever error you have in that forecast, you are using it again to make another set of forecasts. And this is where the error can really be amplified. And so sometimes it's better to remove that feature completely. Okay, but of course it depends on the situation you have to test. Now if you don't have features in your data it is still possible to create some by encoding time series information as features. Okay. So for example we can use for terms to describe the seasonality. Okay. So for terms basically uh a sum of signs and cosiness and those can really uh encode seasonal information of your series uh as a feature and you can then use those features to make predictions. Something else that you can encode uh is encode the time stamp itself. Okay? So you can say the day of the week, the hour of the day etc. Right? So you can just say this is hour 1 2 3 4 5 blah blah blah all the way to 24 or you know the day of the week you can say this is day 1 2 3 4 5 6 7 right and so on and so with those features it is possible that your model benefits a little bit uh from using those features. And the best part is you know them into the future as well right? So you know for example Monday today right uh is day number one. You know that next Monday in a week and next Monday in two weeks is still going to be day number one. So you can use those features uh predict them over the future right with absolute certainty. So it is relevant for statistical models. All right. So that's it. Let's jump back into the code and see how we can create exogenous features and use them with our models. All right. So now let's start forecasting with exogenous features, external variables. How can we include them in our models? And luckily for us, no, it's not every model that can support exogenous features, but I specifically chose the uh ARMA model or Serma model, right? Uh because I know it can handle exogenous features. So it makes, you know, for an interesting use case for this tutorial. So now I'm just rereading uh the data set because if you remember earlier I dropped the unit price but now in this case we are going to consider this as an exogenous variable. So is the price of the product being sold an important feature when forecasting uh the volume of sales? Okay. And if we ta

Original Description

Serious about mastering time series forecasting and ready to take the next step - enroll today in Applied Time Series Forecasting in Python and enjoy an 60% discount code: https://www.datasciencewithmarco.com/offers/zTAs2hi6?coupon_code=ATSFP60 Lifetime access to the course, modules are regularly updated, and your questions are answered by me directly! Course material: - Solutions notebook: https://github.com/marcopeix/youtube_tutorials/blob/main/YT_03_forecasting_stats.ipynb - Dataset: https://github.com/marcopeix/youtube_tutorials/tree/main/data This video is the perfect starting point for beginners looking to forecast time series data. We use 100% Python code to cover the fundamental concepts of time series forecasting: - defining time series data - time series decomposition - forecasting with ARIMA - cross-validation in time series - using exogenous features - generating prediction intervals - evaluation metrics for forecasting models Chapters: - 0:00 Introduction - 1:20 Define time series - 5:38 Baseline models - 9:10 Baseline models (code) - 23:22 ARIMA - 31:01 ARIMA (code) - 38:21 Cross-validation - 40:38 Cross-validation (code) - 49:56 Forecasting with exogenous features - 54:28 Exogenous features (code) - 1:08:10 Prediction intervals - 1:09:32 Prediction intervals (code) - 1:14:31 Evaluation metrics - 1:18:58 Evaluation metrics (code) - 1:29:25 Next steps

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Science with Marco · Data Science with Marco · 38 of 38

← Previous Next →

Linear Regression in Python | Data Science with Marco

Linear Regression in Python | Data Science with Marco

Data Science with Marco

Classification in Python | logistic regression, LDA, QDA | Data Science With Marco

Classification in Python | logistic regression, LDA, QDA | Data Science With Marco

Data Science with Marco

Resampling and Regularization | Data Science with Marco

Resampling and Regularization | Data Science with Marco

Data Science with Marco

Decision Trees | Data Science with Marco

Decision Trees | Data Science with Marco

Data Science with Marco

Suppor Vector Machine (SVM) in Python | Data Science with Marco

Suppor Vector Machine (SVM) in Python | Data Science with Marco

Data Science with Marco

Unsupervised Learning | PCA and Clustering | Data Science with Marco

Unsupervised Learning | PCA and Clustering | Data Science with Marco

Data Science with Marco

Data Science Portfolio Project: Regression #1 | Data Science with Marco

Data Science Portfolio Project: Regression #1 | Data Science with Marco

Data Science with Marco

Data Science Portfolio Project: Regression #2 | Data Science with Marco

Data Science Portfolio Project: Regression #2 | Data Science with Marco

Data Science with Marco

What Are Time Series - Applied Time Series Analysis in Python and TensorFlow

What Are Time Series - Applied Time Series Analysis in Python and TensorFlow

Data Science with Marco

Basic Statistics - Applied Time Series Analysis in Python and TensorFlow

Basic Statistics - Applied Time Series Analysis in Python and TensorFlow

Data Science with Marco

Autocorrelation and White Noise - Applied Time Series Analysis in Python and TensorFlow

Autocorrelation and White Noise - Applied Time Series Analysis in Python and TensorFlow

Data Science with Marco

Stationarity and Differencing - Applied Time Series Analysis in Python and TensorFlow

Stationarity and Differencing - Applied Time Series Analysis in Python and TensorFlow

Data Science with Marco

Random Walk Model - Applied Time Series Analysis in Python and TensorFlow

Random Walk Model - Applied Time Series Analysis in Python and TensorFlow

Data Science with Marco

Moving Average Process - Applied Time Series Analysis in Python and TensorFlow

Moving Average Process - Applied Time Series Analysis in Python and TensorFlow

Data Science with Marco

Autoregressive Process - Applied Time Series Analysis in Python and TensorFlow

Autoregressive Process - Applied Time Series Analysis in Python and TensorFlow

Data Science with Marco

ARMA Model - Time Series Analysis in Python and TensorFlow

ARMA Model - Time Series Analysis in Python and TensorFlow

Data Science with Marco

What is data science?

What is data science?

Data Science with Marco

Answering DATA SCIENCE questions #1 - Why learn SQL when Python and R exist?

Answering DATA SCIENCE questions #1 - Why learn SQL when Python and R exist?

Data Science with Marco

R vs Python in the Industry - Data Science Q&A #datascience #datasciencecareer #careeradvice

R vs Python in the Industry - Data Science Q&A #datascience #datasciencecareer #careeradvice

Data Science with Marco

Data science or data engineering - which is best for you? #datascience #datasciencecareer

Data science or data engineering - which is best for you? #datascience #datasciencecareer

Data Science with Marco

Where to find data for data science projetcs? #datascience #datasciencecareer

Where to find data for data science projetcs? #datascience #datasciencecareer

Data Science with Marco

Data science certificates on resume? #datascience #datasciencecareer #careeradvice

Data science certificates on resume? #datascience #datasciencecareer #careeradvice

Data Science with Marco

Should you aim for data science or data engineering? | Data Science Q&A #1

Should you aim for data science or data engineering? | Data Science Q&A #1

Data Science with Marco

Don't waste time on this | #datascience #datasciencecareer

Don't waste time on this | #datascience #datasciencecareer

Data Science with Marco

Low-code AI tools - are they good? | #datascience #datasciencecareer #careeradvice

Low-code AI tools - are they good? | #datascience #datasciencecareer #careeradvice

Data Science With Marco

How to grow as a data scientist after 2+ years of experience? #datascience #datasciencecareer

How to grow as a data scientist after 2+ years of experience? #datascience #datasciencecareer

Data Science with Marco

Transition into DATA SCIENCE without a masters or bootcamp #careertransition

Transition into DATA SCIENCE without a masters or bootcamp #careertransition

Data Science With Marco

How to improve your data science profile?

Data Science With Marco

How to learn Python for data science?

How to learn Python for data science?

Data Science With Marco

Does Scrum/Agile work for data science?

Does Scrum/Agile work for data science?

Data Science With Marco

What are the major roles in analytics and how to choose?

What are the major roles in analytics and how to choose?

Data Science with Marco

Thoughts and advice for a live SQL coding round

Thoughts and advice for a live SQL coding round

Data Science With Marco

Data science interview question: difference between type 1 and type 2 error

Data science interview question: difference between type 1 and type 2 error

Data Science With Marco

Feature selection in machine learning | Full course

Feature selection in machine learning | Full course

Data Science With Marco

Anomaly detection in time series with Python | Data Science with Marco

Anomaly detection in time series with Python | Data Science with Marco

Data Science With Marco

Podcast - TimeGPT, predicting the future, and more

Podcast - TimeGPT, predicting the future, and more

Data Science With Marco

Big announcement - Revealing my new book

Big announcement - Revealing my new book

Data Science With Marco

Get Started in Time Series Forecasting in Python | Full Course

Get Started in Time Series Forecasting in Python | Full Course

Data Science With Marco

This video course teaches time series forecasting in Python using statistical models such as ARMA and SARMA, and covers topics such as baseline models, cross-validation, and exogenous features. The course provides hands-on examples and code snippets to illustrate key concepts, and is suitable for beginners and intermediate learners.

Key Takeaways

Import necessary libraries such as statsforecast and utilsforecast
Read and preprocess time series data
Split data into training and testing sets
Train and evaluate ARMA and SARMA models using cross-validation
Use exogenous features to improve forecasting accuracy
Evaluate model performance using metrics such as MAE

💡 Using exogenous features can improve forecasting accuracy, but requires careful handling to avoid amplifying error.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related AI Lessons

Why Statistics is Important in Data Science

Statistics is the foundation of data science, enabling professionals to extract insights and make informed decisions from data, and its importance cannot be overstated

Medium · Data Science

Does This Have AI in It Yet?

You can build AI-friendly systems using existing data discipline skills, no new skills required

Medium · Data Science

Web Scraping with Python in 2026: Best Libraries and Anti-Bot Strategies

Learn to scrape websites with Python in 2026 using the best libraries and anti-bot strategies to avoid being blocked

Dev.to · Etrit Neziri

How Wisconsin Used Foxes And Deer To Revamp Science Education

Wisconsin's innovative use of trail camera photos and public input revolutionizes science education and wildlife management

Forbes Innovation

Spreadsheet Guy Meets the CFO: "Define How Much"

Digital Transformation with Eric Kimberling