Stock Price Prediction Using Machine Learning | Python Machine Learning Projects | Simplilearn
Key Takeaways
Demonstrates stock price prediction using machine learning and Python
Full Transcript
[music] Hello and welcome to simply learn. I am going to talk about today about the stock price prediction. Now you might be wondering that how does various algorithms or various people who invest in stock market they try to predict the future prices of any particular stock based upon the financial statements. So in this particular course we are going to talk about the stock price prediction based on certain past values. We will look into some data points wherein we'll try to understand maybe based upon the volumes available on the stock exchange or the closing price available on the stock exchange. We will try to predict future price for a stock. For this particular case, I will use a synthetic data of a stock and we'll try to do that practically at the end of the session. So there are multiple methodologies which are typically used for stock price prediction. There will be two types of problems that can be addressed with stock price prediction. We'll come into that slowly. But as of now we can talk about that stock price prediction is a process by which we estimate the future values of any company's stocks and any kind of financial instruments like equity or a bond that is traded in the share market. So that's what we try to predict through stock price prediction tools that could range or you know idea would be to look at the historical data apply some analytical models and then use those analytical models to predict the future price in a near distant future. Right? Maybe today's price dependent upon the next days yesterday's price or a week's price a price before 10 weeks or 20 weeks depending upon what kind of the nature of the data is we can predict the prices based upon that we can predict the closing prices we can predict the trends if it is going up in this manner how it how it is going to perform in next 10 days or 12 days and we can estimate the future prices as well. So forecasting trends, what are the closing prices going to be and estimating the future prices would be the primary thing that you can perform with the stock price prediction. This type of problem can be divided into two major categories. One we can put it up into the regression mode and the other one we can put up into the classification mode. So what is a regression or classification type of problem? So there are two major things that we can think of in regression part. If I have to predict the actual price of a stock that would be of the nature of regression problem where we are trying to predict a continuous variable or it can be a classification problem whether I want to predict of the nature whether to buy a stock not buy a stock based upon whether its price will go up or down we can decide to whether sell it off or to keep it up or hold it up. So that can be categorization problem that can be dealt with. So traditionally if I say that stock predictions was based fundamentally uh and technical analysis that we can perform as humans but when we have got too much amount of data available and we can now use advanced machine learning and statistical methods which can give me a better idea about predictions using nonlinear relationships with the data. Why is it important for us? So there are majorly four important things that we can think of why it is important for us right it is important for us because of four primary reasons let me just bring that up for you number one whenever we are trying to predict the stock price it's a kind of an investment that we're trying to do so investment decision making so accurate price prediction will help any kind of investor who is interested into stock markets to make up a correct decision such that it can potentially generate better returns from the investment in the stock market. So if I can predict a future price of a stock in a correct manner or to a some extent in a correct manner it will be very useful in investment decision-m many investment portfolios or many mutual fund companies they invest in stock markets for the growth of the people and they are attaining good amount of return 13% 14% or maybe 21 some are attaining even 31 35% of returns. So that is happening because of they're making a good decision making process. The investment managers that they have they're taking up a good predictions and based upon that they're investing up. Second thing we can also do the risk management by doing this by which suppose if there is a stock price which is going to fall down and we can be pre-prepared for that. If I can predict a stock price which is going to fall down in some time maybe 10 days 20 days we can minimize the amount of losses by managing the risk. Then within algorithmic trading the machine methods of viewing the trading is also helping us in a sense that uh in terms of stock markets or stocks price prediction the value of a stock changes up every millisecond. There is the amount of or the the interval the time interval that we have to decide for making up a decision is in milliseconds somewhere it is in microconds either. So a machine taking up a decision instead of human will be little bit faster and quicker processing so that we do not lose out on times. Algorithmic trading will help us in order to make sure that whatever decisions are being taken, they are taken up instant and are processed instantly and it also can allow me to optimize my portfolios. These are many uh there are many applications these days which are available which you know talk about how can you optimize your portfolio and maximize your returns. So depending upon what kind of funds and what kind of stock prices stocks that you have you can use them to balance up the stock so that overall profit can be attained. Now how do we go ahead with this particular problem? So there are multiple approaches uh in order to do the price prediction. We are primarily concentrating on the f on the actual price prediction or the regression part of stock price prediction. And typically there are four or five models of machine learning or I would say statistical models which are typically used for the purpose of future price prediction of the stock prices. So how will we go for that? Okay. So the first algorithm is called as auto reggressive model. very simple model it is uh it actually takes into consideration the stock price as a single variable and you can say that uh it predicts the future values based upon the past values of the same data. So said let's say if I'm trying to predict the closing price so I'll take up take into account the closing price from the past historical data and based upon historical data pricing I will predict the future pricing. So there is some basic understanding of based upon what has happened in the history is going to be repeated in the future as well. That's the basic idea behind it. So if you if I have to address or put that up into a equation form, it's a kind of linear equation that decides how does the stock current stock price or current price is going to vary. So here you can see that yt is equal to c plus that's a constant term to regulate the error. 51 tus 1 + 52 t - 2 up to 52 tus p plus the error ter the error at time interval or it's a random error that we put up these terms 51 52 and 5p are nothing but the coefficients of regression so if I have to establish a straight line relationship then these are your coefficients of regression and that determines at what time so let's say if today's price what is the importance of yesterday's price that will be 51 is the importance of yesterday's price Y3 - 1 Y3 - 2 is the day before yesterday's price or maybe it's a monthly interval or yearly interval or quarterly interval depending upon the interval so each PI term is giving the importance of the past prices it also takes into consideration the lag so that something that we put up right where it is useful it is useful when the the stock prices are predicting for a very small term period right it is very good when it is for like when there There's no seasonality involved for temperature prices. They can use it. It's a generic time series algorithm. It's not only for stock prices. Stock prices will be our demo section where I'll talk about the quotes. But here we'll try to understand the algorithms in a short. There will be little more detailed or you know topics that can be covered upon each of these models that can be dealt separately. But this is an introduction of these algorithms as of now. some uh points where we can use AR model or auto reggressive model. This is an assumption or this is a typical idea that auto reggressive model typically works well when today depends upon yesterday. Say for example, what is my price going to be today is going to be dependent on yesterday. So tomorrow's price will be dependent on today, right? And that means also requires to buy data to be stationary. Stationary means the mean and the variance of the data is not going to vary. The mean and the variance of the data is consistent throughout the period. If I select any particular time interval, the mean and the variance is going in a similar fashion. That's where constationality comes into the picture. There are no spikes or seasonality in the data. Seasonality is a term where you can see that there is something that changes at a certain point of time. Something that changes at a certain point of time. Let's say all of a sudden there is a spike happening and uh prices are going shooting up for a moment and then coming down again. So that's a seasonality maybe let's say sales of uh gifts items increases in the month of December and January because of the festive season. So it can be a possibility. So order regressive model further that's the equation that we have with the lac polinomial equation. same equation that we've seen before yt is equal to c + 51 t - 1 52 t -2 up to 5 p t - p uh with an error term random error then using some lag operations what we have done here is we have added lag so suppose I want to monitor starting from the last 12 data points so today depending upon not yesterday but 10 days earlier so lag is that say say for example today is let's say uh 1st of October and let's Okay. Uh the 1st of October price on 1st of October is dependent on price on let's say 30th of September or rather let's say 20th of September. So there's a gap of 12 days in between. So that will be your lag. So we can use that lag as well as a factor. So pi equus 1 is equal to 1 minus same equation. It's just that I have added up the lag multiplied with the lag term in exponential form. P L - 51 52 L to 1 52 L to 23 L2 3 5 P L to P. So exponential term added up to that n which is equal to a constant value plus ET. ET is the random error that we have. But this is called the white noise. It's a stationerity check that we need to put up if there because there will be not a stationary data that we will always have right again the assumptions which are not typically there because the mean and the variance will vary over the period of time. So ET is covering up that white noise or the change in the mean and the variance. So in a particular AR model the PSF. Now what is PSCF? It's a part partial autocorrelation factor. So partial autocorrelation factor [clears throat] is something that decides how is the previous values are dependent upon the particular value or current value right so if there's a lack of p terms then after p terms the partial autocorrelation goes zero right after that it becomes non-zero afterwards right AC that is autocorrelation factor it always keeps on going down like a sinosidal curve so yeah okay so let's move on so these are certain methods by which we perform form the auto reggressive models. These are the background processing. You do not have to worry about that because scikit learn as a library and a stats model in python as a library gives us these methods directly and we can implement them directly. One of the very common method to perform auto reggression or solve the equation of the auto reggressive model is u walk method by which we solve the systems of linear equation using the auto covariances. Right? Another method is an regression coefficients orderly release square method. Then the BS algorithm which is only specifically used for AR models and then we have MLES or which are the maximization of gausian likelihood or that can be also used. Then there are some forecasting formulas that I have put up on this slide. Right? These are the forecasting means when I'm trying to predict maybe for tomorrow or certain steps after tomorrow. number of lags term that so first is the learning part when we learn there is a called as lag when we forecast it then how much far off we are trying to predict that is also into consideration now if I have to predict for future price it's the same equation nothing new as you can see here that's y t + 1 that's the equation at time interval + 1 given at current time t is equal to your constant time c plus i = 1 to p the I I t + 1 of I - 1. So that exactly is the same equation that has been launched. The only thing that has changed here is the timeline. The time window has changed from today to tomorrow. Right? When I take up that into the iteration form of h is greater than one. The lag inclusion also changes. The minus term changes to plus term. That's the only difference. If this is the current day, then here towards left I will get the minus term, negative terms. And here are the positive terms. So if this positive terms will be our prediction part and this will be the learning part of the models. So we'll use this data to learn and this data will be the prediction part that can be or forecasting part that can be done with this. So we'll try to understand each of those algorithms slightly bit and then we'll implement those. Next model is a moving average model. In moving average model, right? So what do we consider in moving average is not only the actual term but we also consider the error terms right so if there is kind of sudden uh change in the data so there is a stationerity in the data that means mean and variance are not varying that's the understanding however there can be sudden spikes because of some white noise some external factors that can is you know which is unpredictable kind of things so those kind of things can be uh addressed with The moving average model. Moving average is very simple thing. We can you know take up certain number of lag points and take up the average of last few points. For example, we can think of an example like this. Suppose there are values 1 2 3 4 5 and six. And I'll talk about a very simple moving average concept. So average let's say I want to start with an average of lak three. So first three values I will not have any value. The fourth one will now be the average of last three. So 3 + 2 + 1. So that's I will have a no value over there. Not applicable. I will have a not applicable value over there too. I will have not applicable value over there too. So the fourth one can be predicted as 3 + 2 + 1. So that is uh 6 by 3. So four can be predicted as 2. Then this moving average will change to this one. 4 + 3 is 7 + 2 is 8. 8 by 3 will be this figure. then you will have another window. So since I've chosen the lag of three 4 5 6 that will be you know 14 by3 and so on. So it can move on like this. So initial that's a moving average concept that's given over there. Mu is a constant the lap time theta 1 e minus1 theta 2 e minus 2 theta t minus q and that's the random error term. When do we use moving average? when there is a sudden spike on the data there are some you know kind of very small events that are happening we can use that at that point of time so what is the advantage over there we can capture short-term short-term spikes short-term losses that are coming up we that can be managed irregular time series can be modeled with this same equation over there yt is equal to new e theta t minus 1 up to theta t minus q and with the lag part as well it's the same equation that is appeared or yt minus mu up to how much lag period? It's an averaging concept. Now invertibility condition what is an invertibility condition? Uh what is it says that the moving average at a lack of certain interval is equal to the representation of auto reggressive models up to infinite representation. That's the basic idea about uh the moving averages concept. So moving average can be considered as auto reggressive models which is stationary same thing. So these are together two different concepts when combined together creates up a very good model we call that as RMA model or when they are integrated together we create up another model called as ARMIA or EMA model. So how do we identify a a moving average model the autocorrelation factor just as exactly opposite to the moving average model. In moving average model the PACF value the partial quarteration factor it becomes zero. It becomes non zero and remains close to non zero up to some point. here for a period of P lakhs and ACF decays. ACF continuously keeps on going towards zero in moving average it's reverse the ACF is you know it tries to cut off after a certain lag and PSF partial autocorrelation factor it starts decaying so these are actually inversion of either but they are typically very beautiful algorithms that can be used so there are two types of uh mades or we can use one is called as the conditional MLE other one is exact MLE what is MLE Maximum likelihood estimation. So the maximum likelihood estimation values we can control either by making all the past errors as zero. That's one of the things we can do that by using that word that is one of the conditional MSE or we can use exact maximum likelihood estimation based upon the probability calculation forecasting method. Same equation again yt + 1 is equal or given time t is equal to mu. Same equation that we have in for learning curve and for prediction curve the plus term change the minus term changes to the plus term as we had earlier. Then next model is called as ARMA auto reggressive integrated moving average. So we are now combining up both the models. So the ARMA model right uh is a statistical model used uh a forecasting technique that is used for time series data. uh again uses past values right to predict the future values using the moving average method to remove use the past errors both things I will also use the past values I will also use the past errors and I will try to integrate them together so that I can remove any kind of trend so if there is any kind of non-stationality between them right so there are three important types three important components AR which is the auto reggressive part I which is the integrated part and third one which is the M A part. So a R part I will use a term P for this for integration I will use D the D as in delta for moving average I'll use the term Q. So these three things I will use and so in this model what does the role of all these three terms a r i and m the auto reggressive component it uses the relationship between the observations and the number of lagged observations or whatever the past value that I'm trying to cover up to predict the future value. So that is auto reggressive part integrated part it is actually going to put up the differentiation between the or differencing between the raw observations by which I can make the time series stationary so stationarity is introduced ideally you may not have a stationerity in the data because suppose I have a data that's increasing in this manner like this so that is not stationary I need a straight line like this where mean and co mean and variance will not vary over the time if this is one chunk or this is another chunk that this is another chunk things will not vary. So how will I get that? Subtract this minus this then this minus this then this minus this then this minus this. So there will be a stationarity that can be generated for every time interval. So that is where this is stationarity will help us. So integration part will take care of the stationarity of the data by which it will make the data more stationary and by which the applications of AR model or MA model will become relevant. mean and variance will not will be taken care of by the stationary. You can consider this as similar to standardization of the data wherein you can put the mean consistently throughout the features as zero and standard deviation consistent throughout the as one. similar kind of concept will be uh applied with the integrated part or I part then comes the MA part the MA tabs right that it's a relationship between the observation and the residual errors from the modeling in the legs so if today I am predicting a price then what were the errors in the last terms those are typically taken up by the moving average things right then where can I use it when data is showing strong trends but no seasonality is there any kind of economic times energy commission website traffic uh it is very good for linear trend based data very interpretable cannot capture seasonality on its own that's the basic idea because we are making the data stationary on its own so that's mathematical equation you can take a pause note it down now this is how do we identify stationarity or identify the ARMI model application right you can plot the data to find out examine trend and seasonality we'll try to do Then we can perform ADF test, KPSS test or we can use them together to check for the stationerity. We can use differentiation or differencing in order to make the data stationary. We can look at the partial autocorrelation values and autocorrelation values. If the PSF cuts off after a time, it's AR model. If ACF cuts off, then it's a uh MA model. If both are cutting off, then it's ARMA model. And that's how we can find out identifications. Same condition for parameter estimation as well for the maximum likelihood computation. Then there is another concept called a surma model. Surma model is nothing but an extension to uh ARMA model where it also includes the seasonality part. So that is called a seasonal auto reggressive integrated moving average model. So into ARMA model I will also include a seasonality part. So till now we have seen in Aria model there were three components. One is P that is the auto reggressive part. There was a D that was the differentiation part. How much dreation we are trying to do that is trying to remove the trends. Then we have the Q part which is the order of the moving average uh by which I will try to identify the dependency of the observation with the residual errors. Then there is a C when and these were represented by small letter P D and Q. I have used small letter P, small letter D and small letter Q. Now in simma model I will change those terms to big P, big D and big Q. And the only difference is that now this is seasonal auto reggressive order. This is seasonal differencing degrees and the Q is the seasonal moving average order. And I'll introduce another term into that which is a small s that identifies the time steps is the time steps for each seasonal cycle. That means let's say it can be 12 for an yearly cycle. It can be you know if there's a monthly data then 12 will tell me every 12 months that is yearly cycle or if it is daily data you can put up weekly quarterly whatever you want. So 12 13 depending of the nature of the data. So we'll add one more term which is S. That's the season term. So SERMA is ARMA plus X plus season modeling right repeated daily, weekly or monthly paddics. If there's something that's getting repeated daily, monthly, weekly, annually, you can use SRMA model as we have used before parameters PDQ and PDQS that comes into the picture. Sees the lot regression differencing MAS and length of the season for monthly seasonality. sales weekend every December electricity usage daily weekly patterns whether it has seasonal cycles you can think of that external whenever we have a seasonal data it is very good it can capture both seasonality as well as the trend of the data heavily computationally heavy part so we'll not get into too much of the mathematics take a pause view this equation seasonal lag polinomial equation so here you can see that there is s term which is the seasonal period right theta l to s is a R polomial and phi term and theta l to s is the seasonal ma polinomial. So this is your ma polinomial and this is your error term. So this is ma this is here. Moving with this where can I use simma? So if I have to forecast monthly retail sales right picking around holidays predicting weather patterns where I can see seasonality trends showing up energy consumption where might maybe at maybe during wedding seasons the energy consumptions goes high you can use these kind of forecasting techniques for this purpose lastly we I'll talk about another model called as sar maximax model is an extension to the ARMA model or cerea model where X stands for exogenous variable. Now what is the max model right? So the seasonal auto reggressive integrated moving average exogenous regressions is very powerful time series uh forecasting technique. It is an extension to the traditional ARMA model and it takes into consideration both seasonality and certain external vectors right. It is accommodating both auto reggressive model is accommodating moving average model components the integration part of the to the trend part making data stationary and it also actually involve an external variable to be included to include the external factors affecting the regressions part. Right? So again it will have few components. I'll try to put them through. There will be a component S that will cover the seasonal component part. I will have AR component. I will have integration component. I will have MA component and I will have the X component which is the exogenous regressor. Right? So so it adds the external predictors as been said over there. So because of this external predictors so I am also putting up ex additional information other than whatever the given data is we are believing that the given data or given past trends values are impacting the price prediction of is impacting my uh prediction values however there can be some external factors you can think of a situation let's say a stock price may be not directly dependent upon what is in the statement made by Donald Trump of India president for that matter but the stock prices will vary accordingly if there is some kind of statement being made right so that's where the sim max or an external factor influence will also allow me to work well so that is the forecasting arma or surma impact plus impact of an external variable it can be anything the Russia Ukraine war or it can be anything or there is the oil crisis in the world there is is a bankruptcy happening to a particular particular bank anything else that can be considered as an external factor right so that's a complex equation for simma that's a regression plus time series model that is where simma is moving with the s max option here you can see that there is another term beta xt that's introduced for us that is nothing but your external factor affecting the simma model so what are the different kind of challenges that you face in stock price prediction or stock price forecasting number one Market volatility. The market these days are very volatile, right? Geopolitical event, news, investor sentiments, highly unpredictable markets, noisy data. We will not have a stationary pattern with the data. Uh there will be lot of analyst data. So that's data noise that's make a problem. Overwitting very complex model. Let's say if I talk about you know complexity of the models. So models like s maps model. These are very very complex model and they tend to overfit as well. So overfitting is a problem that can come across and external influences as we said before. So market volatility external influences can also lead to a problem or something which is an unexpected behavior from the market right political decisions economic changes which are very difficult to quantify. Then there is a lack of causality because we are primarily depending upon the correlation of the values. You remember the partial autocorrelation factor and the autocorrelation pattern. These are the two major components because of which we are deciding or predicting the future values. But correlation does not tell me the cause. Correlation tells me that there is a relation. But what is the reason behind that relation is not justified by the correlation. That is something can also know a problem in justifying or explaining the answer or the prediction values. So at times these models are also termed as black boxes. Next we're going to talk about a demo. in demo document. So in demo document I'm going to use a uh simple data. I have a stock price data. I have already executed that file out for me. I will provide this document file as well as the notebook file for you guys which you can use up. Now here we are going to implement all those models the complex models the names that I've just given over there. We will use them and try to you know find out the prediction or the behavior of different models that I've just talked about. So for this purpose I am using a dummy data which is a synthetic data of stock price. I have that in the file called stocks one csv. Stop data one stop data 1.psp and this I'm train simple time series models using start model. Start model is one of the library by which we can train up any time series data. Right? I'm importing pandas numpy mattplot for visualization. Now this is where my model starts. Starts model.tsa TSA time series algorithms auto reggressive model I'm importing autore that's auto regression model using OS method then ARMA model is RMIA and sax model is simax model you have cerema there too in order to judge because it's a continuous value prediction we'll try to look at mean squared error and mean absolute error you can look at R squared values uh if you want let's read up the data file very quickly loading up the data set pdle csv stock data dot CSV that's the way I have all this the data file available I'm reading up the date and I'm telling what was the index column that I want so date column I want that as index then I want to put that up business day frequency so I will have dates filled up as frequency business so Monday to Friday will be filled up Saturday to Sundays will be subtracted off df dot close interpolate values so I am interpolating up and filling up those values if there is any missing value in my data frame. I'm trying to put that up though this is a synthetic data. So there were no missing values into this. This is just an additional step and that's how my data now looks like. DF head. So this is a Jupyter notebook environment. This notebook file I'll provide you. So some basic exploratory data analysis that I like to do here. So what we have taken up I've shown up the shape of the data. Then I'm showing up two columns here. The close volume and the describe part. Then I'm plotting up a figure. So look at I have 1565 row the two columns right the columns are closed and only that I'm looking here okay mean is 102 and 1.15 standard deviation 3.47 47 and this. So that's for the full data that we have when we look at certain time interval that should not be varying up. So let's plot up this graph and look how does it look like. So that's a synthetic closed price graph from 2080 to 2024. It's a simple line plot that has been plotted up right you can see that DF do.plot. I'm just plotting up a DF do.lo line on one of the axis. On X axis I've got the date which is my text. On Yaxis I've got the closing price. performed the train test split. I did not use the scikit learn train test split method. I directly used the method of simple multiplication. So training percentage I'm taking 80%. My split is integer value of length of the data frame multiplied by train percentage. So that will be 80% values. Then I put up the split for df.lo with the split and so on. Train size I got train size of 1252 and test size of 313 samples. Few functions. This is one of the function where I am just trying to put up the evaluation part just printing up the results. So that's evaluate series. So true is my actual values that are given in the data. Pre is my predicted values and label I'm putting up the model name. By default it's model but later on I'll pass on the model name. Calculating of RMSSE root mean squared error and the mean absolute error. So that can you can use this to compare how much is the difference between the actual values and the predicted values. First model auto reggressive model. I'm using a lag of pi. You can vary around with that lag. AR model is equal to autoreg. I need to provide the data column number of lags. If there is any old name of the data, you can put up that dot fit. So that will train my model. Now I can use dot predict method to predict on test data index zero. I'm starting ending up test index this. Then I can put up the results. evaluate for tests close and AR forecast. So I've trained on train data and I'm predicting on the test data. Now you look at this. This is how the graph looks like. Since the split that I've done is in a sequence, right? This was my original data up until here. Up until this point maybe somewhere uh this is the entire trend up to a certain point of time 80% of the time I have taken up that. So up to that's 2022 this period it was my let me just show it. So last 200 points of my training data was shown up here because I only plot train close iO up to last 200 points not all the points were displayed up because that would be too clumsy graph train last 200 points then I plot up the close value with the label test. So this is my test data and then AR forecast values the straight line that I have got these are my predictions line. So you can think of that here somewhere auto regressive model that was assuming the stationarity with the data is not getting a stationary data and hence there is a difference that you can notice over there similarly I'm now implementing the moving average model you can also compare the results of RMSSE that's root mean square is 0.92 and M is 0.73. So there's a significant amount of difference with that much lag. Let can try and increase the lag order. Let's try with the lag of 10. That's still too high. Yeah. Now we'll move on to the moving average model. In this scenario, the moving average model will not be a very great idea. Uh why? Because uh the data does not contain any kind of residual errors that might come across. So it is not after certain number of lags it is going to get cons you know fixed or constant. Let me try with 10. So I'm looking forward looking back towards the no auto regression. Same ARMA model moving average implemented using ARMA. I'm I'm not putting up any kind of stationerity. I'm implementing ARMA with order value as 00 Q. When I put up order 00 Q that means I'm not applying any kind of auto reggression. I'm not applying any kind of integration. I'm only applying a lag of Q. the errors to look back. So MA model calculated from ARMA model train order is this dot fit I can forecast on this part same data MA model.predict predict testing is starting from index zero going test index minus one from one to last then again I can evaluate the same model so at maq is equal to three it's going to take some time because I changed okay quick so after a point it starts flattening up not too much of difference you can see rms value is very high me value is very high you can see that huge amount of difference as compared to ar model is giving very small value of RMSSE and ME. So marginal difference and if I have shown the data so if you look at the closed price uh the mean value is 102 the minimum value is 98 so with 98 minimum value if you're getting RMSSE value close to 0.92 that's kind of very very small very good picture so AR model is doing best job so far here this MA model is not doing good job because too much amount of error is coming up for root mean square and MA value arriving model with all PDQ values. I'm taking up a lag of five one differentiation. So today's is differenting with yesterday's price and two is my Q value. Looking back the errors of last two nodes ARMA train same model no changes at all I just need to put up all PDQ values rest of the process will remain same. Okay, that's a flattened line. As you can see that stationarity is now introduced and it's giving me a fixed price throughout. Right, it was able to reduce down RMSSE and MA but not to a significant level of AR only because of that integration part. The line has flattened out. Now coming to the sura model simple model again PDQ I'm taking up fixed values for those. So it is for the seasonality and I'm putting up season as five that is one business week per season sax model pdq seasonal order pdqs forecasting on the test data okay evaluating the series over there that's just the name of the series now that looks kind of very very good line right sac model the only thing is that in sax I did not put up any external factor over there without external factor it is showing good kind of seasonality 0.93 with RMSSE and MA 0.74. That's far better than what we have got here. It's kind of far better than what we've got here. I have displayed here both the train data and test data. In later examples, I'm not showing the train data. So, please understand that the train data is not shown up. That's why you're not getting those picture. Now, here I'm also putting up an external factor that is the volume of the data. Right? So train external factor train volume test external of test volume. So train and test data I'm taking the volume column and I'm taking the log of those sax train and I just need to put up additional factor is external factor is train. So volume of the data is going to be an external factor forecasting again the model.predict again I'll tell the external factor is the test external value which is taking a log of the test volume. You can took up log values, square root values, whatever type of values but or you can transform or you can directly use the same value if you want to put that up into the picture. You can use the same values either. So with external factors that's how it looks like straight line still making errors RMSSE is 1.89 and MA is 1.37. So so far simax appears to be the good one. Now finally I've saved that into an external file called as model forecast. CSV all the predictions that we've got so far from AR model, MA model, ARMA model, cerema model and curax model. All these predictions saved up there. Uh this was my original data. Let me just show you. So I have a date, I have a close and I have a volume. That was my original data. And let's me just show you up the predictions part two. Now I've got this other file that's my model forecast. So actual value is 106.17. AR predicted 106.43 and Serax predicted 42. So kind of close relationship between AR model and SEMAX model forecasting and you can see that uh the MA model after here it is from this point after 10 points it has become consistent all throughout the data point it is fixed. However, the cerema model and AR model they are trying to be in line with the other models surma model and AR model they are two good models 109.08 08 is 109.136 which is 109.7. So kind of very very close models. So by far now we can say that both AR model and SARMA model are the good models which are covering up the trends in seasonality. And if I have to ch take a choice between AR model and cerema model based upon whatever the results have been. So you can look at the RMSSE values and compare. So I have the minimum RMSSE 93.37 coming up for Surma and right that's good model and here we can see this let me just take that last 200 plot out in here you can see 92.26 and 26. So AR model is giving me a better results as compared to uh AR is equal to 5 lag order at five. Maybe we can try and fiddle around with the lag order. Maybe trying to include little bit more lag. Maybe that and enhance the results. You can see that introducing more lag is giving me slightly better more results. RMC value further went down to 90.58. So does the lag value here also can be manipulated. But we need to look at the computational power as well. whether my computational efficiency is still getting hampered with this or not. So with that note uh let me just put up a curtains to this topic. We have you know discussed in this particular course an overview of the different algorithms about the time series analysis for stock price prediction. We have discussed about auto reggressive moving average the ARMA model, SURMA model and CRMAX model. We have also looked at the implementation of those models uh using Python and primarily using the stats models library. Right? These are three methods. Uh for further questions you can uh reach back to us. Thank you very much.
Original Description
🔥Professional Certificate Course in Generative AI and Machine Learning - https://www.simplilearn.com/iitk-professional-certificate-course-ai-machine-learning?utm_campaign=7N5KzxkIIe8&utm_medium=DescriptionFirstFold&utm_source=Youtube
🔥Advanced Executive Program In Applied Generative AI - https://www.simplilearn.com/applied-generative-ai-course?utm_campaign=7N5KzxkIIe8&utm_medium=DescriptionFirstFold&utm_source=Youtube
🔥Michigan - Applied Generative AI Specialization - https://www.simplilearn.com/applied-ai-course?utm_campaign=7N5KzxkIIe8&utm_medium=DescriptionFirstFold&utm_source=Youtube
🔥Professional Certificate Program in Generative AI and Machine Learning - IITG (India Only) - https://www.simplilearn.com/applied-generative-ai-course?utm_campaign=7N5KzxkIIe8&utm_medium=DescriptionFirstFold&utm_source=Youtube
This tutorial on Stock Prediction Using Machine Learning by Simplilearn, explores how machine learning can be used to forecast stock price movements using real market data and Python-based modeling techniques. You’ll learn how to collect, clean, and prepare financial datasets, along with understanding patterns, trends, and correlations that influence price behaviour. The session covers key ML models such as Linear Regression, LSTM, and Random Forest, explaining why each performs differently in volatile markets. You’ll walk through step-by-step Python implementations, feature engineering methods, and evaluation metrics. The tutorial also highlights risks, limitations, and the importance of avoiding overfitting when dealing with unpredictable financial markets. By the end, you’ll understand how to build, test, and interpret a complete stock prediction pipeline—an essential project for anyone learning machine learning with Python.
Following are the topics covered in the Stock Prediction Using Machine Learning Tutorial:
00:00:00 Introduction to Stock Price Prediction Using Machine Learning
00:01:01 What is Stock Price Prediction?
00:03:34 Why is Stock Pric
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Simplilearn · Simplilearn · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Ethical Hacking Full Course 2026 | Ethical Hacking Course for Beginners | Simplilearn
Simplilearn
AWS Full Course 2026 | AWS Cloud Computing Tutorial for Beginners | AWS Training | Simplilearn
Simplilearn
Data Structures And Algorithms Full Course | Data Structures and Algorithms Tutorial | Simplilearn
Simplilearn
SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn
Simplilearn
Microsoft Azure Full Course 2026 | Azure Tutorial for Beginners | Azure Training | Simplilearn
Simplilearn
Shopify Tutorial For Beginners 2026 | Shopify Course | shopify dropshipping | Simplilearn
Simplilearn
Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn
Simplilearn
🔥Feeling Stuck? How Upskilling Can Boost Your Career! #shorts #simplilearn
Simplilearn
Growth Hacking In Marketing | Learn Growth Hacking Marketing Strategies | Simplilearn
Simplilearn
🔥Cracked 3 Job Offers with One AIML Course! | 20–30% Salary Hike #shorts #simplilearn
Simplilearn
Top 10 Must-Have Figma Plugins for UI/UX Designers in 2026 | Figma Plugins | Simplilearn
Simplilearn
Business Analytics Full Course 2026 | Business Analytics Tutorial For Beginners | Simplilearn
Simplilearn
Simplilearn Reviews | Getting future-ready with course in Artificial Intelligence | Roopam’s story
Simplilearn
Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn
Simplilearn
Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn
Simplilearn
Simplilearn Reviews | How David Went From Seasoned Engineer to AI Innovator #GetCertifiedGetAhead
Simplilearn
Complete Social Media Marketing Strategy for 2026 | Social Media Marketing Strategy | Simplilearn
Simplilearn
🔥Top 4 Cybersecurity Certifications You Need! #simplilearn #shorts
Simplilearn
🔥Cloud Engineer Salary in India 2026 | City-Wise Breakdown #shorts #simplilearn
Simplilearn
Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn
Simplilearn
Full Stack Java Developer Course | Full Stack Java Developer Tutorial for Beginners | Simplilearn
Simplilearn
Social Media Marketing Full Course | Social Media Marketing Tutorial For Beginners | Simplilearn
Simplilearn
How To Create LLM Chatbot Demo 2026 | Build a LLM Chatbot From Scratch | Simplilearn
Simplilearn
Digital Supply Chain Management Certification | Supply Chain Management Course | Simplilearn
Simplilearn
AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn
Simplilearn
ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn
Simplilearn
Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn
Simplilearn
ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn
Simplilearn
Simplilearn Reviews | Integrating AI & Music | Diego's Story
Simplilearn
Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn
Simplilearn
SEO Full Course 2026 | SEO Tutorial for Beginners | SEO Training | SEO Explained | Simplilearn
Simplilearn
PMP Vs CAPM: Which Certification Should You Choose? | PMP Vs CAPM | Simplilearn
Simplilearn
Complete Data Analyst Roadmap 2026 | How To Become A Data Analayst In 2026 | Simplilearn
Simplilearn
Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn
Simplilearn
🔥5 Jobs That Are Most Likely Safe from Layoffs in Today’s Market #shorts #simplilearn
Simplilearn
🔥Git vs GitHub – What's the Difference?
Simplilearn
What Goes Behind Building the Likes of Uber and Netflix? | Product Management Tutorial | Simplilearn
Simplilearn
AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn
Simplilearn
Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn
Simplilearn
Product Life Cycle 2025 | Stages Of Product Life Cycle | Product Life Cycle Tutorial | Simplilearn
Simplilearn
Project Management Full Course 2026 | Project Management Tutorial | PMP Course | Simplilearn
Simplilearn
PCB Design Course 2025 | PCB Designing Explained | How To Make PCBs | Simplilearn
Simplilearn
Python Full Course 2026 | Python Data Analytics Tutorial For Beginners | Simplilearn
Simplilearn
🔥Top Product Management Skills You Need to Succeed in 2026 #shorts #simplilearn
Simplilearn
SQL For Data Analytics 2026 | Essential SQL Commands | SQL Tutorial For Beginners | Simplilearn
Simplilearn
Simplilearn Reviews | Paving Way To Success With AI & ML Course | Soumik’s Upskilling Journey
Simplilearn
Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn
Simplilearn
Learn Snowflake In 45 Mins | Snowflake Tutorial | What Is Snowflake | Snowflake Explained
Simplilearn
🔥ML Career Tip – How to Start Learning Machine Learning in 60 Seconds! #shorts#simplilearn
Simplilearn
🔥Agile vs Waterfall in 60 Seconds #shorts #simplilearn
Simplilearn
Excel Full Course 2026 | Excel Tutorial For Beginners | Microsoft Excel Course | Simplilearn
Simplilearn
What Are AI Agents? | Types Of AI Agents | AI Agents Explained | AI Agents Tutorial | Simplilearn
Simplilearn
How To Create a Product Roadmap In 2026 | Product Roadmap | What Is Product Roadmap | Simplilearn
Simplilearn
SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn
Simplilearn
🔥What Is Phishing? #shorts #simplilearn
Simplilearn
Cloud Computing Full Course 2026 | Cloud Computing Tutorial | Cloud Computing Course | Simplilearn
Simplilearn
Simplilearn Reviews | Overcoming Rejection & career plateau to finding a New Job : Bhaskar Banerji
Simplilearn
Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn
Simplilearn
Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn
Simplilearn
VLSI Design Course 2026 | VLSI Tutorial For Beginners | VLSI Physical Design | Simplilearn
Simplilearn
More on: ML Pipelines
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2
Medium · JavaScript
Stop Overfitting With Basically One Line of Code
Medium · AI
Stop Overfitting With Basically One Line of Code
Medium · Machine Learning
Stop Overfitting With Basically One Line of Code
Medium · Data Science
Chapters (3)
Introduction to Stock Price Prediction Using Machine Learning
1:01
What is Stock Price Prediction?
3:34
Why is Stock Pric
🎓
Tutor Explanation
DeepCamp AI