Kaggle Challenge (LIVE)

Siraj Raval · Intermediate ·🧠 Large Language Models ·7y ago

Skills: LLM Foundations90%LLM Engineering80%Prompt Craft70%Fine-tuning LLMs60%

Key Takeaways

The video covers a Kaggle challenge where participants use reinforcement learning to predict anonymous financial instruments, with a focus on time series analysis, forecasting, and Markov decision processes, utilizing tools like TensorFlow.js, Google Colab, and Kaggle Gym.

Full Transcript

streaming in three two one go event is starting stream has begun the world is here and ready for reinforcement learning I think we're live great hello world it's Suraj and welcome to my live stream and in this live stream I'm going to attempt this Kaggle challenge called the two sigma financial modeling challenge it's a hundred thousand dollars worth of prize money and I'm gonna sit I'm gonna create an algorithm that's going to hopefully get into the top fifty leaderboards well we'll see how how well it does but I want to just start off by saying that the point of this video is first of all to talk about some time forecasting techniques now for some reason in all the videos that I've made I haven't talked about time series forecasting outside of the context of deep neural networks but I will in this video and the other part of this video is for me to show that reinforcement learning can be used in the real world in an applicable setting and it this is part of move 37 so so that's why I'm doing this and and how this video is gonna be structured is it's gonna be an intro Q&A so I'm gonna answer two questions so go ahead and start asking them now I'm gonna go over a time series lecture brief Q&A exploratory data analysis or EDA for our data set brief Q&A and then reinforcement learning and the point of this is to predict a value obviously all machine learning is about predicting a value but we're trying to predict a specific target value and I'll talk about that when we get to it but let me just start off by answering two questions and then we'll get into the code okay the first question is hi everybody good thank you for being here the first question is can you suggest some gesture recognition algorithms sure so right now pose estimation is the the state of the art so go to youth tensorflow jas probably the easiest to use implementation for pose estimation in the browser anybody can do it so that's the one to use okay so that's the first one the next question out of two is what does it require to create a bot a human great advance question now I'm going to minimize this and hello everybody so that question I bought like a human that is a GI artificial general intelligence you know the Turing test which hasn't really been passed yet but that would be an open domain chatbot that would it would be trained on data that is not a closed loop but just you know the entire Internet and this hasn't been solved but the best way forward with that would be I would say using deep reinforcement learning in the context of of text data and and the web and and having an open domain where there's this there's a cycle where you're using reinforcement learning where you're using a reward signal to train a deep neural network where it's searching the internet itself it's it's querying the Internet and it's building off of these queries it's using natural language processing hence the deep neural network to create abstractions from the text data and then based on those abstractions it's learning to maximize a reward which would be to say you know you could frame it so that a human would say you know yes/no binary you know that this is a good response or not yeah so that's kind of a research direction that I'm thinking of what hasn't been done but what would be cool for human level chat BOTS okay so that's it for the QA let's start talking about time series analysis because in this data set they're asking us to predict a target variable based on the past okay so so let's just talk about time series analysis in general right where we have two variables right so let's start off with univariate a single variable time series analysis so we have some price data let's say this is for Bitcoin okay this is a Bitcoin price over a period of days now if we want to forecast the price for the next day how do we do that right this is a time series where the variables depend on the time right what the values are what their target variables are or are completely dependent on the time step right so that would that's what makes it different from a regular data set a time series data set depends on the time so what we would do here is then naive approach let's just start off with a naive approach where we say what this next data point is going to be in this graph is going to be the target variable that's it so we're just going to say that the predicted variable and here's the equation is going to be the the variable from the previous time step that's it right that's the equation right there and we would call this the naive approach right and so what happens is when we have a data set like this where just imagine the entire thing is one data set and we have split it into training and testing data where we say okay this is the entire training data set now based on this last data point predict the next point well it's gonna say well based on this one let me just do that again because that's our variable that's our equation or forecast model and so it'll do that again and again and again and what happens is it's just a straight line so this is a very bad approach and this is the naive approach but let's see how we can improve on this so how would we improve well check out this graph so it's got volatility it's going up it's going down but notice that there is an average line you can imagine that there is this average line the the line of best fit you could call where it is the average between the ups and the downs and we could draw it mentally through this model and so if we do that if we do that then we can make the assumption that the next price is or is going to be the average of all the prices that came before it right so if we have that set that sequence of values of all of those Y values the Y values are right here on this on this axis and the X values are here the days then we could use this equation now don't be afraid about the fact that we are using so a little bit of math here what this says is the target variable which we can call y hat the one we want to predict is going to be equal to the sum this is Sigma notation this eat this Greek looking letter the sum of all of those variables that came before it divided by the total number of them X so from I to X where X is the number of variables add them all up that's what Sigma notation means and then divide by the number of them and that's the average and that's our prediction so if we do that then this is what our line is gonna look like okay so it's saying based on this last data point right here what's gonna be the next one well it's not going to be up here it's gonna be down here because we're taking into account all of those data points from the very very beginning to the very end but notice that this is not ideal either right we need something that's going to be better than that so how do we improve on that well we would use a different technique called the moving average so what the moving average does is it says well the points at the very beginning and the points near the end these are completely different directions so let's only consider the points immediately before our forecasts are our target variable that we want to predict so we'll have a window okay and we'll just average those and we'll leave out what we'll leave at the beginning and so what that equation looks like is this where Y hat the predictor variable is going to be the sum of all of those that all of those values that came before our target variable up to a certain threshold which we define as P you know say the first the the previous five or the previous six variables divided by the total number of them and that's the average and and so if we do that now notice there there's a little bit it's getting better our prediction right it looks like this now how can we improve on this I'm notice I'm going through a lot of techniques very fast so um slow me down if you if you feel like it's too fast well one way we can improve that it's by using a technique called simple exponential smoothing what that means is you know let's let's take into account all of those variables because clearly all of them matter but let's weight them differently okay let's weight them differently where we say the variables that came immediately preceding or for our predictor our target variable will weigh them more than the variables that came at the very beginning because these matter more so how do we do that mathematically right and here's how we take this constant value which we're going to call alpha and we do the same thing where at where we're adding them all up but we're multiplying it by this disk this this constant value and squared cubed to the fourth to the fifth notice this trend here of exponential exponentially increasing and so this is called simple exponential smoothing okay and what this means is that these variables are going to be weighted differently now here's a question for you I'm very excited and I'll be very impressed if someone can answer this question what does this formula look like that we already know about from reinforcement learning literature such rich literature what does this formula look like if anybody can answer that I'm gonna be very impressed let me keep going though okay ready okay it looks very similar to the discount factor from reinforcement learning so in the reinforcement learning context we have an agent it acts in an environment right it's making an action and it receives an observation of this of the next state and in order to maximize reward how do we maximize reward well here's how we calculate reward at every time step we can predict what the reward will be for being in a specific state up to our end state the terminal state and will add up all those rewards multiplied by a constant factor called the discount factor and we are weighting those rewards in order of the rewards that came pre immediately previously we're weighing them more we're saying that they're more important than the rewards that came at the very very beginning and that's the discount factor yeah Wow actually people got that I'm very I'm very impressed good job guys very good so let's keep improving here so so the by the way the reason I wanted to say that is because is to just give you some intuition behind reinforcement learning it's it's a framework for viewing the world really and how intelligent agents interact in the world it's not the actual mathematics of intelligence of pattern recognition but it's more about framing these pattern recognition now in the context of a dynamic world that adapts to that intelligent agent more on that at the end so Holt was a mathematician in the in 1964 I think was the year who invented a linear trend model where he said you know what this idea of single exponential smoothing it works it's fine however let's improve on that because it doesn't take into account the idea of a trend now a trend is a general direction that we see that a graph is moving in and and the way to mathematically define a trend as holt suggested in his linear trend model would be to create a forecast equation that consists of two other equations so we have a level equation and then we have a trend equation and we use both of those equations to compute the final forecast equation so it's L plus HB where L is the level equation and B is a trend location we have two constant factors we have alpha and we have beta they're both different and we can tune them accordingly and the level equation is the same idea of exponential smoothing but applied to both the level the average value in the series and the trend and if we do that then notice our graphs forecast is getting much better okay now there's one more technique I want to talk about and this is an improvement that Holt made to that linear trend model we'll start coding in a second but it's called the seasonal it's called his winter seasonal method so there's another there's another concept in forecasting called seasonality where in a in a set of data there's gonna be there's gonna be seasons right so in any kind of time series data there's gonna be some kind of seasonal not any but most of them real-world there's gonna be some seasonality where there's gonna be cut some kind of predictable up and some predictable downs let's say you know retail for retail stores there's gonna be more people buying toys in December because of Christmas in you know a lot of Western countries or you know wherever or there's going to be you know some kind of friend in the seasonal direction for stock markets as well you know based on this this is what's happening here's how the markets going to go so in order to mathematically define seasonality we have now three equations so we're adding on to what we had before again we're using our level we're using our trend and now we add a third equation which is the seasonal equation where the level equation shows the weighted average between the seasonally adjusted observation and the non seasonal forecast for time T the trend equation is the same as Holt's linear method and the seasonal equation shows the weighted average between the current seasonal index and the seasonal index of the same season last year so they're all all each of these equations is interdependent on each other okay and so what happens when we do that is now we are getting somewhere check out this this graph it's a much better graph right so so so that that's the idea of seasonality now that's for the case of univariate time series now if we have multivariate time series that's multiple input data for whatever the multiple predictor variables for whatever our target variable is going to be which is the case for our two sigma financial modelling contest then we're going to use a model that is very similar to Holt's winter seasonal method that's taking into account the level the trend the seasonality right to make the forecast but it's also finding linear interdependencies between these predictor variables so it's the same idea of multiple equations that are relating to each other in a way that we once they relate to each other we can create a graph and so some popular models for that are ARIMA ARIMA X etc and I'll make I'll make a dedicated video on those because you really need to make a dedicated video on those and or we could create it like us we could treat this as a supervised problem as many people had done where we use the power of LST M networks to then treat it as a supervised problem where we say the predictor variable let's say pollution that's the sorry the target the oldies words the target variable is going to be the result of the predictor variables the temperature the the human waste use amount etcetera so there's a mapping between those two and the reason we use LST M networks long short-term memory neural networks is because they take into account long term sequence data and they can they store memory in a way that is beneficial to sequential data which is the case of Time series data and so that's what we've seen a lot of LST M networks being used in time series data okay so let's get into some EDA and like I said I'm going to answer some questions now all right so what are some questions okay uh what's out of focus I know it is okay when did you start programming I started programming you know that's that's that's that's a hard question because you know I've been I used to like III guess the I guess the earliest time that I started programming was it's almost embarrassing to say but like modifying halo2 when I was I think 13 or 14 so that was a that was that was a while ago but I wasn't even programming I was more like downloading scripts and just like hacking it and stuff so so it's been a while but really seriously programming probably a couple years a couple years okay who is paying you nobody's paying me I mean YouTube ads are paying me patreon you guys are paying me to do this nobody's paying me nobody's paying me kaggle nobody's paying me I would have to say that if somebody was paying which is the best book for RL Sutton and Bartow have the Bible of RL which is called an introduction to reinforcement learning find it on the Internet it's it's all it's it's it's all available for free and last question is I'm going to get a haircut for sure will you attend I will do time series data definitely depends on other factors it depends on a lot of factors if there's multiple variables okay okay now to the data set let's go ahead and do this so so first of all now the data set is in the video description so check the video description we're gonna do this in Google collab together okay so we're ready for our exploratory data analysis step okay so what I did was by the way with Google collab with these two lines you can mount whatever data set you want into Google collab and then call it directly so what I did was I downloaded it it's an h5 file uploaded it to my Google Drive and then called it with this these two lines of code very simple Thank You Google Cola for making it much easier to do alright so so let's get into this code our first step is going to be to list out our data set right we we have this data set where I have this data set in my Google Drive and I have a link for you in the video description and I just want to see if it's there okay good it's there that was it okay so once I've seen that it's there now I'm going to convert it into a panda's data frame but before that I've got to import this dependency or install this dependency called tables that's gonna let me do that and now we can import pandas our handy-dandy data pre-processing Python library to then to then say let's import this data set that trained h5 okay recursively and we're gonna import it as trained so that's what we're gonna call it and then we're gonna say our data frame is going to be trained yet and then we'll we'll call it by its name train and that's it and hopefully good so now we have it as a data frame and now we can see how big is our data set how big is this thing we got to check it out this thing is massive it is over 1.7 million data points which is big so let's examine this data set just to see the head just of the first few variables okay so here's our data set okay so we have an ID we have a timestamp which is going to you know be a different time and so these are all of our predictor variables now what do these mean right what what do these mean right and so there's like more than 40 to 44 variables and then we have Y so Y is our predictor variable so in the in the in this competition to Sigma what they did was they said this is a but these are a bunch of financial instruments so financial instruments are like derivatives bonds mortgages you know stocks assets all of these different types of financial instruments but they anonymize them so we're calling them just technical 41 technical 42 and then we have a predictor variable now what is this predictor variable they didn't reveal to us but we can think of it as a price right let's just think of it as a price in a trend and this price for this asset is dependent on all these other anonymized financial instruments and so based on all of these financial instruments can we predict the price for whatever this is let's just say it's a stock for this case okay so let's keep let's keep going here so our next step is to say well how many will call them labels so Y is going to be labels how many labels and how many values do we have so what we're gonna do is we're going to list them both by creating two matrices and saying the labels are going to be appended by the number of columns that we have the values are going to be appended by the number of non-empty variables we have and then we'll print out all of those columns and all of those values starting from the very beginning all right oh right DF let's see if that works good okay so these are all of our variables all of our values or all of our labels and values okay so just like that so now we want to see how much we have to do some data cleaning how much missing data do we have so now we can use matplotlib to see just how much missing data we have because we probably have a lot so PLT as our matplotlib and then we'll say we'll use this in line call to say that we want to be able to show a map live graph inside of the browser okay so we're going to create a map plop live graph and we're gonna say that it's going to tane a bigger size that's going to be between 12 and 50 so we'll we'll keep it we'll keep it small relatively small and we're gonna start from those labels which I named an IND and connected to those labels we have all of our values and I'm gonna color them I'm gonna label them Y so in my graph it's gonna say Y and now we can say let's say set the Y so these are going to be the intervals between these variables - let's say it's gonna be half of the width that I defined for which is 0.9 because those values we saw before they seem to be they seem to be in that range so now we'll say Y tix and let me just do that again Y X tick labels so that's the labels and then we have our other line which is our horizontal line oh I'm going to name it the other line horizontal and we're gonna have that's that's four that's four Y and then count of missing values that's what we're looking for the count of missing values X label and one more which is our title for our graph number of missing values in each column okay that's it and show the plot okay let's see of course invalid syntax 2 - 2 X dot X dot set Y ticks IND + with just like that uh-huh and IND NP is not defined right is it really not - fine I didn't import numpy up there okay fine how's MP ok6 sighs right so sometimes you just gotta deal with these errors nice okay so looks like we've got quite a lot of missing values in our data and so you know if we we could we could just clean them all out but this is a good step - Wow so fundamental 61 has a lot of missing values so there's a lot of missing values in this data okay so that's what that's what we wanted to do was just just to see that and so let's just show one more pretty graph it's a rainbow graph and we can use the other plotting library called Seabourn to do this it's just one more very simple graph and so we'll say and and I'll take questions right after number eight here so six how many people do we have in here okay - and 33 okay cool so that's it for this so now we can see at each time step we want to see at each time step what the data looks like so we'll say how much of each predictor variable do we have at each time step okay so now we can see that cool okay so that's the count for each versus the time step so there's more and more it's it's going up okay so it's going up the trend of the data is going up so that's just one thing to know it's it's a linear trend upwards as as time goes on okay and so lastly it was just one line of code and we're done with this EDA part how many unique assets do we have in total and that's prints the length of DFID unique 40 24:00 okay so so let's answer some questions then I'll talk about reinforcement learning okay um okay cool so can we use or n n with LST m to predict these scenarios yes you can like I mentioned before and deep reinforcement learning is the is the cutting edge for that too why not use Phillip asks why not use pandas dataframe methods to call the columns instead of using loops Phillip that's a totally valid question and we could have done that and lastly one more question what is your opinion of not snow how deep do you new up do you need to know math for reinforcement learning how deep do you need to know math for reinforcement learning that's a great question I compared to supervised and unsupervised learning it is more necessary to know the math behind it because that ecosystem is not as developed as a supervised learning ecosystem and while you can use open a eyes gym to do a simple you know random policy for an agent inside of a game if you want to do anything more complex deep Q if you really want to understand these algorithms then yes you're gonna need to know how what the idea behind policy functions are and the idea behind value functions both for a state and an action you're gonna need to know how the bellman equation works and that's really what it what it comes down to understand the bellman equation and everything else will follow and then that that's and there's there's four of them actually in and I will continue to talk about them but let's continue going here so that's it for my Q&A now to orell right so that's our EDA now for our L so how do we use reinforcement learning in time series data so in reinforcement learning there is an agent that is acting on the outside world it is observing the effects of the environment and it's learning how to improve its behavior that's why we see it being used so so often in games right so but in contrast a time series forecast is a setting where there is a passive observer so the agent is passively observing the the data set and it's not really interacting with the environments because the environment is not reacting to the agent it is is a it is a one way it is a one way action right whereas in a game world for example like Hart Pole right where the where the pole is trying to balance itself right if the agents action in a given state is to move to the left then the environment that the the platform that its balancing on will then move its reacting right so in a real world how do we use this well what is a system that adapts to changes that an agent an AI makes well this stock market could be one where a state will change because the state is the account balance if you have an account balance when you make when an agent makes an action like buy sell or hold the balance will change or if we want to get more meta then the the entire stock market will change so if an agent makes a trade then the market will change right so we can that is a reactive environment what's another reactive environment electricity grids sensor networks interconnected routing grids of data of of connections right so any kind of system that adapts an adaptive system that that reacts to an agent interacting with that environment is a use case for reinforcement learning so a static data set is not necessarily a reinforcement learning scenario so how do we solve this though because there are a bunch of companies out there that have these systems like Google for example they use reinforcement learning to improve the they they use it to improve the quality of their power usage in their giant data center and they reduced their cooling bit bill but I think it was 40% and even more after that so there are companies out there electricity companies power utility companies public works companies that have these systems that need to be optimized but they don't and they have these real-time data sets rights meters that are happening in real time what they need then is a reinforcement learning solution but right now and here's another here's a startup idea I want to get to you guys this stream is going up and down like there were 200 people here now there's 600 people here this is crazy by the way so um so where was I so this is a call to action for startups okay because I see a real need here here's a pain point where there are companies that need a reinforcement learning solution to help optimize their profits for their systems and there are data scientists out there that that want to use reinforcement learning to then solve these systems so what there needs to be is an intermediary that is that offers a simulation as a service and so what these simulation is the service companies do or startups will do is they'll they'll approach and here's how I would do it I would approach one of these companies and say you know I you know I understand reinforcement learning I understand that you know we can offer you a 30% reduction in your costs if you give us access to your real-time API and we'll create a simulated environment based on that and then we will give it to say kaggle - - then allow their data scientists to create RL algorithms and so there is an intermediary step here now cago can do this themselves and and they have thought about this and you know that who knows what's gonna happen there but this is an idea that that it's time has come and more and more people are getting interested in reinforcement learning and there needs to be more simulated real world not game world environments out there so that's my suggestion and so hopefully you understand the difference here between timeseriesforecasting and reinforcement learning from from from what I've said so far and and why there's a need for it and how we can apply reinforcement learning to time series if there is some reactive component to the data set itself it can't just be a static data set it has to be a real-time API okay so so there is a possibility that we're gonna see more of that in the future now what I did find though what I did find was a library so the closest thing on Kaggle to to this idea of reinforcement learning was created by this guy and it's called the cavil gym so what he did was he framed he framed the reinforcement learning problem he framed the knot the reinforcement learning he framed the the two sigma problem of predicting the the target variable as a reinforcement learning problem as a Markov decision process and what I think this was the pioneering step in saying let's create a simulation of a dataset and then solve the data sets and then sold the dataset in the context of a simulated setting right and so he created this library called kaggle gym which which takes that library and what I've done is I've pasted it in this library here and we're gonna talk about it and then we're gonna use it so we're gonna use that cackled gym library to solve this problem okay so so little refresher here so in reinforcement learning we have a Markov decision process where we have an agent it performs a set of actions in a given state to maximize reward and the action that it takes given a state is considered the policy so a policies suggest it's a function that says that says given this state and given this action or know given this state what's the best action to take okay that's how a policy works and so there's two other functions here that are a part of a Markov decision process the transition probability that says what is the next likely state to go in if you take an action in this given state and a reward function that's going to help you max help the agent maximize what reward it receives for taking a given action now this can be learned over time and that would be considered Q learning that would be considered a model free method these two functions could be learned over time or they can be given to us or beforehand in which case this will be a complete Markov decision process but in the real world we will never almost never have a complete Markov decision process we will almost always have a partially observable Markov decision process what that means is that we won't have these transition probabilities we won't have this reward function we'll have to learn them or we could just avoid those functions and learn what's called q-function directly let me talk about that at the end okay I just wanted to introduce the idea of a Markov decision process before we get into this code so in this cattle gym environment that franz sloth boobers suggested we have an r score so so what kaggle suggested was that what kaggle suggested let me just go back was that the data we evaluate the scores using this equation right here okay let me make this bigger this is called the or score so the our score is 1 minus the this the difference between the target and the and the predicted variable squared the sum of all of them divided by the predicted variable - what was you again this constant value you it's not called you it's called forgetting the name of it but this constant value and 1 minus that and that's R squared and then we can derive our from R squared by saying R equals sine of R squared times the square root of the absolute value of R squared and that's going to give us R and that's going to be or we can consider that a loss function because it's going to give us one scalar about it's going to give us a scalar value which we can use to measure how good are our predicted variable is our predicted target is and then based on that that our score we can see what the leaderboard says and then you know we can see what everybody's our score is here so the highest one was 0.02 so we'll see what we can get using this taggle gym library that was created before so let me answer any other questions move yes thank you very much mu mu alpha theta I was in mu alpha beta in high school how can I forget move okay how to start the basics move 37 is my course it's all on YouTube for free check it out write me you all right great guys thank you okay so um so what is this what let me start off with this the our score this function is just the programmatic version of the equation that I just showed and that's it so that's we're going to compute it so let me let me go through this so inside of this gym this kaggle gym environment we have an observation and so what the observation is is it is our training it is our it is our predicted variable what we want to predict and our it is our it is the variable that we are predicting so the predicted variable and the target variable what is already there because we already have those targets or labels we can call them labels right so labels man I'm sweating today yes okay we can call them labels so inside of our environment so inside of this environment and in our environment what is it our environment is our static data set we'll split it up into training and testing data okay and then here's the step function so this is basically recreating that open AI gym environment or or an agent takes and take us takes a step the parameter is the action that it takes and then it receives an observation and a reward so in this very naive implementation how its computed how the our score is computed is just by saying that the predicted variable is only going to be the variable from the previous time step well we could do that actually I mean we could we could choose our own policy based on this but what inside of this alone all it's saying is this is really the key right here like this this part right here the reward for taking a step in this environment is going to be the our score of our predicted variable and our target okay that's our reward and we return that as well as an observation which is going to be the values of both as we saw before and a boolean that says done or not and an info which is a logging variable right so so based on that we can create a policy so let's let's write one using this this variable and the reason I pasted it all is because this this could be its own Python file right tackle gym dot pi okay so let's test this out so we'll say let's create our own agent environment loop we'll define our own policy and then based on that we'll we'll keep we'll try to improve it okay so inside of this test function will say go ahead and create the environment using make which is the function that I just defined get the initial observation which is going to be our variables that we defined before we'll print them out so we can see them you know just for logging purposes you know what is the observation of both the target and of the the the training data or the the predicted the predicted value and then based on both of those will create our training loop okay so this is the agent environment loop okay based on that so what we'll say is while true here's the loop begins the target value is going to be the initial observation and then we'll choose some starting point to just start from like what is the predictor variable that we want to start from and we'll just say six point zero six and then observation well so what are we going to get returned when we take a step I'll continue to explain this guys let me just write this out I'm not done explaining this it's going to so the sort of it's going to return so this is the class that we just talked about it's going to return all three of these things based on the action which is the target we take and if we're done break we're done with the loop else now what do we do with the rewards right so we can choose any policy and here is where we actually show what that policy is going to be and so what I'm going to do it as you're seeing right now is I'm going to print out three variables and then I'm done so I'm going to print out the info I'm going to print out the amount of rewards I'm going to print out the first few rewards 0 through 15 okay so that's that invalid syntax for make our environment equals make okay and then I'll test it out all I do is just run tests and that's gonna give us what environments not defined did it really no it is it is check this out right right and then yep okay observations not defined line 9 observation rewards is not defined rewards that append rewards okay okay let me answer some questions now because we're definitely gonna have some questions here okay oh break how did this not catch it okay gotcha okay let me answer some questions here thank you let's see what we get here okay so our public score is going to be 0.01 seven so compared to so we're like number forty-three and guess what this okay so so so guess what so here's our policy right here's our policy right here all we're saying this is a naive method but in the context of a Markov decision process this is that's it the basic idea here is that we framed this as a Markov decision process where an agent is taking action in an environment to move from one state to the next state and we're trying to maximize reward and the policy to choose that action is going to be the predict the variable that we want to predict is going to be the variable from the last time step and then to compute the how good it is we're just going to find the difference between the predicted and the actual variable so it's gonna be the variable in t minus 1 and T that's it we could have done this in one line of code however in the context of a Markov decision process which we have here we could then add to it by creating another policy by creating a better policy that's going to improve on this it's like what would be an example q-learning okay so Q learning where an agent is taking an action given a state in order to maximize a reward and we are computing this Q table which is a bunch of it's a it's a giant matrix of possible actions that we can take in any given state and then we're gonna optimally choose what those actions will be by iteratively updating the cue table using what's called the bellman equation and what the bellman equation does as it relates one state to another and if we can relate any one state in an environment to another state then we can compute those variables that are different between them right like the state the state the state value function and the action value function and using those we can compute an optimal policy so that's how we could improve on this however like I said before we need an environment that's going to be reactive and this is also just to show that you don't necessarily have to have the greatest you know cutting edge algorithm in the world to place well on to honor on a challenge or to do well in general in machine learning sometimes linear regression can work better than a deep neural network if your data set is small or you know for a variety of reasons so my point is that so we placed using this very simple methodology obviously it's a very naive method but I wanted to really sneak in a lecture on reinforcement learning on cue learning on the difference between timeseriesforecasting and reinforcement learning into this problem of this kaggle challenge and and this was the the most this specific challenge was the most orell friendly challenge that's available on cattle right now and like I said it's this is a great example this is a great opportunity for aspiring data scientists out there to create a service that creates simulated environments that can be offered to real-world companies and you know just to create a business out of that so I see a real need for that and that that could be a use case for this I'll answer two more questions q-learning I'll have I actually have a great q-learning video coming out this weekend I'm gonna have some great links for you in the video description the datasets in the video description and let me do a wrap so just say just say just say a topic and I'll do a freestyle rap on the topic before I end this livestream before I end it okay so tremendous school of AI and we're actually a nonprofit organization and it is the adventure of a lifetime and it is it is a story that's going to be told decades from now and it's not even about me it's about the deans it's about the people running this yeah really it's it's it's a family on the students the Wizards were all a family the people watching this we are all a family if you are here at the end of this livestream you are a dedicated data scientist who cares about or AI research or who cares about the future of AI and using it to solve real world problems and that's our mission our values okay so um open AI okay no no that I want to do a different one chatbot okay okay here we go this is there's always a little tag at the beginning I try to use a chat bot I try to make Matt plot live plot it out with the graph but you can't because it's text data man you gotta use math I don't know what you're using man you're out of class you got to take a chat bar and visualize it in a way that people can't realize it it's okay let me show you instead of map ha live let's use something else like I call it map plot jive it's a new library I just invented it it's made for chat BOTS the visual lies in the browser it's like a laptop it runs on anything browser in the cloud GPU CPU GPU I don't care that's it that's it for you all right that's it all right that's it for the wrap all right thank you guys for showing up I hope I made this enjoyable for you timeseriesforecasting or el markov decision processes and the accessibility of Kaggle as a way to earn a passive income and a way to hone your skills as a data scientist these are all the things that I hope you've learned in this live stream I love you guys we're about to hit 500,000 subscribers so I can't wait till all your friends we want to grow this community as fast as possible so thank you guys I love you and thanks for watching for now I've got to go work on school of AI stuff so yeah thanks for watching

Original Description

Two Sigma Investments published a $100,000 code competition on Kaggle that asks data scientists around the world to try their best to create an algorithm that can make predictions about anonymous financial instruments (like derivatives, assets, bonds). Normally, reinforcement learning is not used on Kaggle but in this live stream I'll use reinforcement learning to help solve this challenge. This will serve as a great real-world use case for RL and I'll also discuss some other common time series forecasting methods. Get hype! Code for this video: https://github.com/llSourcell/Kaggle_Challenge_LIVE-Two-Sigma Dataset: https://www.kaggle.com/c/two-sigma-financial-modeling/downloads/train.h5.zip Please Subscribe! And like. And comment. That's what keeps me going. Want more education? Connect with me here: Twitter: https://twitter.com/sirajraval instagram: https://www.instagram.com/sirajraval Facebook: https://www.facebook.com/sirajology This video is apart of my Machine Learning Journey course: https://github.com/llSourcell/Machine... More Learning Resources: https://www.kaggle.com/kanncaa1/machi... https://www.kaggle.com/rtatman/beginn... https://machinelearningmastery.com/ge... http://blog.kaggle.com/2017/01/23/a-k... Join us in the Wizards Slack channel: http://wizards.herokuapp.com/ Sign up for the next course at The School of AI: https://www.theschool.ai And please support me on Patreon: https://www.patreon.com/user?u=3191693 #SirajRaval #KaggleChallenge Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w Hiring? Need a Job? See our job board!: www.theschool.ai/jobs/ Need help on a project? See our consulting group: www.theschool.ai/consulting-group/ Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.co

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Siraj Raval · Siraj Raval · 0 of 60

← Previous Next →

What is Bitcoin?

What is Bitcoin?

5 Ways to Use Bitcoin

5 Ways to Use Bitcoin

BTC Fever - Siraj [Music Video]

BTC Fever - Siraj [Music Video]

5 Reasons to Build Decentralized Apps

5 Reasons to Build Decentralized Apps

The Interplanetary File System

The Interplanetary File System

How to Build a Dapp in 3 min

How to Build a Dapp in 3 min

Life Before Smartphones

Life Before Smartphones

4 Ways to Use Smart Contracts

4 Ways to Use Smart Contracts

3 Dapps You HAVE to See

3 Dapps You HAVE to See

Char's Life as a BitTorrent Engineer

Char's Life as a BitTorrent Engineer

4 Reasons AlphaGo is a Huge Deal

4 Reasons AlphaGo is a Huge Deal

Build a Neural Net in 4 Minutes

Build a Neural Net in 4 Minutes

Sentiment Analysis in 4 Minutes

Sentiment Analysis in 4 Minutes

The Hackathon Life

The Hackathon Life

Your First ML App - Machine Learning for Hackers #1

Your First ML App - Machine Learning for Hackers #1

Build an AI Composer - Machine Learning for Hackers #2

Build an AI Composer - Machine Learning for Hackers #2

Build a Game AI - Machine Learning for Hackers #3

Build a Game AI - Machine Learning for Hackers #3

Build a Movie Recommender - Machine Learning for Hackers #4

Build a Movie Recommender - Machine Learning for Hackers #4

Build an AI Artist - Machine Learning for Hackers #5

Build an AI Artist - Machine Learning for Hackers #5

Build a Chatbot - ML for Hackers #6

Build a Chatbot - ML for Hackers #6

Build an AI Reader - Machine Learning for Hackers #7

Build an AI Reader - Machine Learning for Hackers #7

Build an AI Writer - Machine Learning for Hackers #8

Build an AI Writer - Machine Learning for Hackers #8

Build a Chatbot w/ an API - ML for Hackers #9

Build a Chatbot w/ an API - ML for Hackers #9

One-Shot Learning - Fresh Machine Learning #1

One-Shot Learning - Fresh Machine Learning #1

Generative Adversarial Nets - Fresh Machine Learning #2

Generative Adversarial Nets - Fresh Machine Learning #2

Tone Analysis - Fresh Machine Learning #3

Tone Analysis - Fresh Machine Learning #3

Generate Rap Lyrics - Fresh Machine Learning #4

Generate Rap Lyrics - Fresh Machine Learning #4

Build an Autoencoder in 5 Min - Fresh Machine Learning #5

Build an Autoencoder in 5 Min - Fresh Machine Learning #5

Build a Self Driving Car in 5 Min - Fresh Machine Learning #6

Build a Self Driving Car in 5 Min - Fresh Machine Learning #6

Build an Antivirus in 5 Min - Fresh Machine Learning #7

Build an Antivirus in 5 Min - Fresh Machine Learning #7

TensorFlow in 5 Minutes (tutorial)

TensorFlow in 5 Minutes (tutorial)

Build a Recurrent Neural Net in 5 Min

Build a Recurrent Neural Net in 5 Min

Build a Simulation in 5 Min

Build a Simulation in 5 Min

Build a TensorFlow Image Classifier in 5 Min

Build a TensorFlow Image Classifier in 5 Min

Tensorboard Explained in 5 Min

Tensorboard Explained in 5 Min

Generate Music in TensorFlow

Generate Music in TensorFlow

Build a Game Bot (LIVE)

Build a Game Bot (LIVE)

Deep Learning Frameworks Compared

Deep Learning Frameworks Compared

Introduction - Learn Python for Data Science #1

Introduction - Learn Python for Data Science #1

Build a Neural Network (LIVE)

Build a Neural Network (LIVE)

Twitter Sentiment Analysis - Learn Python for Data Science #2

Twitter Sentiment Analysis - Learn Python for Data Science #2

Recommendation Systems - Learn Python for Data Science #3

Recommendation Systems - Learn Python for Data Science #3

Predicting Stock Prices - Learn Python for Data Science #4

Predicting Stock Prices - Learn Python for Data Science #4

Pong Neural Network (LIVE)

Pong Neural Network (LIVE)

Deep Dream in TensorFlow - Learn Python for Data Science #5

Deep Dream in TensorFlow - Learn Python for Data Science #5

Visualizing Data with D3.js (LIVE)

Visualizing Data with D3.js (LIVE)

Genetic Algorithms - Learn Python for Data Science #6

Genetic Algorithms - Learn Python for Data Science #6

Enter Siraj [Music Video]

Enter Siraj [Music Video]

Build a Web Scraper (LIVE)

Build a Web Scraper (LIVE)

Why is P vs NP Important?

Why is P vs NP Important?

How to Make a Neural Network (LIVE)

How to Make a Neural Network (LIVE)

How to Make an Amazing Tensorflow Chatbot Easily

How to Make an Amazing Tensorflow Chatbot Easily

How to Make an Amazing Video Game Bot Easily

How to Make an Amazing Video Game Bot Easily

How to Make a Tensorflow Neural Network (LIVE)

How to Make a Tensorflow Neural Network (LIVE)

How to Make a Simple Tensorflow Speech Recognizer

How to Make a Simple Tensorflow Speech Recognizer

Joel Shor - Really Quick Questions with an Awesome Google Engineer

Joel Shor - Really Quick Questions with an Awesome Google Engineer

How to Make a Path Planning Algorithm Easily (LIVE)

How to Make a Path Planning Algorithm Easily (LIVE)

The Best Way to Prepare a Dataset Easily

The Best Way to Prepare a Dataset Easily

Catherine Olsson - Really Quick Questions with an OpenAI Engineer

Catherine Olsson - Really Quick Questions with an OpenAI Engineer

How to Make a Tic Tac Toe Neural Network Easily (LIVE)

How to Make a Tic Tac Toe Neural Network Easily (LIVE)

This video teaches viewers how to participate in a Kaggle challenge using reinforcement learning for time series forecasting, covering key concepts like Markov decision processes and Q-learning, and providing practical steps for implementation.

Key Takeaways

Use TensorFlow.js for pose estimation
Apply simple exponential smoothing for forecasting
Implement Holt's linear trend model for time series analysis
Use Kaggle Gym for reinforcement learning
Create an environment using the Kaggle Gym library
Define a policy using the reward and observation

💡 Reinforcement learning can be effectively used for time series forecasting, especially when combined with simulated environments and Markov decision processes.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

Building RAG-Powered AI Agents with AgentCore: What the Hands-On Tutorials Don't Tell You

Learn to build reliable RAG-powered AI agents with AgentCore by addressing common issues with vector databases and retrieval pipelines

From Tools to Workers: The Shift in Artificial Intelligence

The concept of AI is shifting from tools to workers, requiring a fundamental change in how we think about and approach AI development

IA local vs ChatGPT para empresas: qué usar y cuándo

Learn when to use local AI vs ChatGPT for your business and make an informed decision

MyClaw AI Isn’t Another Chatbot — It’s an AI Employee That Actually Gets Work Done

Learn how MyClaw AI is revolutionizing work productivity by acting as an AI employee that gets work done, unlike traditional chatbots

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)