If Top Chess.com Players were STOCKS - Live Coding Data Anaylsis Stream
Data analysis of Chess.com game history for the top players. Live stream from Oct 27, 2022. We code in python, pandas and use machine learning.
Follow me on twitch for live coding streams: https://www.twitch.tv/medallionstallion_
Community Competition:
- Link to the competition: https://www.kaggle.com/competitions/kaggle-pog-series-s01e03
- Register and join NVIDIA's GTC using this link to qualify: https://nvda.ws/3Qb0b9x
My other videos:
Speed Up Your Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg
Speed up Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg
Intro to Pandas video: https://www.youtube.com/watch?v=_Eb0utIRdkw
Exploratory Data Analysis Video: https://www.youtube.com/watch?v=xi0vhXFPegw
Working with Audio data in Python: https://www.youtube.com/watch?v=ZqpSb5p1xQo
Efficient Pandas Dataframes: https://www.youtube.com/watch?v=u4_c2LDi4b8
What You'll Learn
The video analyzes Chess.com game history for top players using Python, Pandas, and machine learning, and creates visualizations to compare player performance to stock market trends.
Full Transcript
foreign welcome to the stream it is Thursday October 27 2022 and I'm so glad that you are here with me if you are here with me or if you're watching this later we're gonna have some fun tonight do some coding and maybe learn something new about ourselves Maybe not maybe we won't learn anything I am cracking open uh uh Blackberry bubbly right now don't judge and then we're gonna get the um coating started um let's go ahead and switch over with that hello in chat hitch him what's up welcome let's switch over to this what is this website this website is called kaggle I spend a good bit of time on it um if you're trying to learn data science you should too I'm not paid to say that I am just a fan of it um they're they're really good people over there they make a great product so we're gonna actually go over to my data sets and check out this archive that I made so we started working on this a while ago well we have someone from Egypt watching welcome from Egypt hope you're doing well let's get my headphones up um let me know if you can hear everything okay too hopefully everything sounds okay So the plan tonight is it may be a short stream where we create a notebook for this data set um so what this data set is let's actually load up my terminal and let's remember what this data set is I like this um I like this music by the way um I hope it's not copyright watch it watch them take down my video because it says copyright even though it says it's not copyright let's turn the volume down just in case always got to worry about that okay so hit the up vote in like Jason yes you're right we got someone from Nigeria welcome from Nigeria all over the world we are a global phenomenon here over at Medallion data science AKA Rob streaming coding at night um we are gonna go into my repos directory where'd my music go oh it's just a different song okay so we're gonna go into my repos and we're gonna go into uh conda activate kaggle 2 and then we're gonna um should we go into twitch twitch stream projects no it was in chess yeah let's just start up a Jupiter uh let's start up uh no let's do chess cheating analysis this was a while ago and then we're gonna open up Jupiter lab in here and of course it's going to open a window way over there where I don't want it and I'm gonna pull it over here so we can all see it together okay so going into notebooks we're going to remember how I actually pulled all this game stuff down um so we took this function that I found online we took a function that we found I found online to download all of the top 50 players chess.com playing history and now we're going to do some exploration of those so it's the top 50 Blitz players as of the date that I pulled them so the top 50 could have changed since then sorry if you're like number 50 right now and you were you know 51 when I pulled it you're not in the list but that's just a list of data that we're starting with so I took all of their games downloaded them and then we ran this function that I created a while ago PGN to data frame which takes PGN files which are chess game history and converts them into a pandas data frame with all the metadata right there for us to analyze now that actually is what this data set is that is what this is that's confusing for me to say but um this data set is it public it is public it has these top 50 chess players Archive of their games with the parquet version I'm gonna paste this here what time zone am I in I am an EST Eastern Standard times or I'm not sure it changes when the time zone changes but I'm in Eastern Time whatever that is what time zone are you in xcode oh my dad is stronger than my uncle hey man I remember your username it's EDT now yeah I always forget so it's EDT um just quick sidebar here do you guys know that here in the United States our government voted to get rid of daylight savings time I'm excited for that to be over hey look Jesse PK thank you so much for the subscription subscribed with prime that's to tell you all if you're watching on Twitch which you should be right now um I am if you if you have a Prime account like with Amazon for free you can subscribe to my channel and subscribing on Twitch is different than on YouTube it's it's like uh usually costs money but Amazon will give you one for free so why not use it if you're gonna if you're gonna have it might as well use it so uh whenever that happens I spin the wheel and the wheel tells me something random I have to do thank you Jesse let me know in the chat how you found the channel and what led you to wanna what led you to want to follow or subscribe uh oh scream Kevin as loud as I can this wakes up my My Wife and Kids when they're asleep all right here we go come on all right if someone comes down soon and says what's going on down there it's because I screamed that as loud as I can all right let's go into the data again all right so I pasted the actual data to you all um you can go there and upvote this if you want you can check it out yourself we could go here and create an exploration notebook in kaggle or I could just do it locally and upload it let's stick locally today I say we do it that shall we I just need to remember how I have my data set up let's just start from scratch let's pretend like I haven't looked at this data before hey Chris jaska welcome thanks for subscribing gotta spin the wheel for you let me know how you found the channel hello it Nava well welcome to the chat oh I almost had to scream come in again but instead I do a 10 second hamstring stretch this gets me nice and ready for coding you gotta stretch before coding it's very important not really ah I lost my chair all right is that 10 seconds that's about 10 seconds thanks so much for subscribing Chris so chriso all right so we're gonna call this chess.com Eda for exploratory data analysis and we're going to do some data analysis let's make this a little bit bigger for you all um exploring the data set we'll put it here hey xcode thank you for gifting one sub thank you that's so nice of you you gifted a sub I'm gonna am I gonna just be spinning this all night maybe if we get we get a train or something got a hype train going 10 second hamstring stretch again all right so what we're gonna do while I'm doing this I'm going to talk about what we're going to analyze so we have 50 players we can focus in on a specific player um or we can just like look at them all we want to see um maybe some streaks that they've had of like when they've won a bunch of games in a row when they've lost some in the row and it initially we're looking at this to see if we could identify chess cheaters but it kind of devolved into just scraping the data hey Mr Gabriel bins blinds I hope you're doing well so let's go ahead and import pandas as PD import numpy as NP should we switch those around would that be cool import pandas and NP today no let's import matplotlib and let's also import chess which is the like the python chess module that lets us read in the game positions and all that stuff all right so let's read in the parquet file now I think I put this in like a chess.com folder let's import glob this glob will let us just list out everything in this folder startup parquet um so the way I organize this folder was I made a parquet file with the data frame format of all that player's history and then for the year that the player of that player's data some of them only played in certain years so we have to keep that in mind is this too big just to combine do I remember why do I remember why I did it this way no I don't remember why so let's go into the directory actually let's go into the data chess.com and let's remember why so each of these is just a few megabytes I okay so it really depends on the player let's see um another cool project would be to make the the data scraping pipeline into something that's automated like every month or every week on kaggle and then it would update this data set because right now it's kind of static but we're gonna we're gonna grab on Daniel because I know that guy plays a lot he's a twitch streamer too so even his biggest game is like 9.3 megabytes why did I split all these up all right so one thing to keep in mind is there could be and there probably are situations where players play against each other so in theory I could have duplicates for those schemes players in the top 50 would play against each other so just to explain what I'm talking about let's look at this first one now let's pick two players we know who played each other um so let's do Daniel Arquette's parquets pan quits I can't spell parquets like creating Cron job what are we talking about oh yeah yeah just like creating Keenan so that's the cool thing about what kaggle released recently is um it's basically like running a Cron job in kaggle on their compute for free but if you look at some of my data sets if you do exclamation point and kaggle I'm not sure if it'll work in um then you can look at some of my data sets like this Zillow home value index actually runs a script every month in updates I also have one that updates daily like this Mr Beast used tube steps I wrote the script that feeds this in look at updated an hour ago I'm not sure where the script is that loads us in yeah here it is so basically this runs every day this one has run 204 times and yeah consistently goes and pulls kaggle stuff so in theory well my point is I could just have this chest stuff chess script probably the best way to have done it is to have the script run and update this data set daily or weekly um okay so let's go into let's just go into everyone's 2022 actually that would be easier way to look at this so now we're only looking at the last year of games and we're going to pull these in um read parquet so this will read in all of the parquet files and then we'll concatenate them together uh oh my exclamation kaggle did not work let's do this on Twitch thank you for saying it's dope to watch this as a sophomore I'm glad you find this enjoyable all right there's there's the link to my kaggle profile if you want to see some of the other data sets I made that run daily okay I should make this all right data frame 2022 um so what my point I was trying to make is I basically went through for each of a given player like hikaru's games and I pulled him in and saved them into the Hikaru parquet file then I also went in for all the other players and pulled in their games but if they play each other and I concatenate all these together then we'll have duplicates but we sort of need those duplicates in order to do aggregations by the players so I like pulled in the game but then I also marked if the player won or lost and all of this is relative to the player that I that I cared about when I pulled their games so let's try to find let's try to find some duplicates we're going to go into this Mainline moves column which by the way has all the game data so like this is an actual game that we could take chess uh board we can make a board a chess board uh and then we can load in the fen I'm remembering I need to load in this uh PG PNG PGN um a certain way which we won't do now but let's just take this Mainline moves and let's do value counts and see if we have a bunch of games that are the same okay so this first game that appears 88 times is just uh E4 and then no other move so this is like abandoned game after one move that's not what we're looking for it's these ones like this game appears twice uh uh another way we could do this is duplicated this will put a true this will give us an array with a True Value if that row is duplicated but the thing is I don't think any of them will be duplicated oh wait some of them are duplicated completely all right so let's look at that Anish giri uh we also have to expand our display dot Max columns let's not do the chess board thing yet although I can show you that you can load it up like this we want to load at any point any game we can load like that we can also load in hey muggington welcome to the stream thank you for thank you so much for subscribing let's spin the wheel for you if I don't hear the spin noise you guys need to yell at me and chat uh what did someone else say there's a long chat hey friend do you plan to do live streams where you model data use in Beijing nearest or regression models would be cool to see you approach that topic I have to yell Kevin hold on a second all right um I I will I have modeled on stream I can do modeling on stream JJ back from NYC June Jai go welcome back so why I am surprised that we have duplicates is because there is this player win loss column which for when we run it for player a should be a win and B would be a draw oh but the draw o would be a loss but there's also a draw zyz thank you for subscribing we got too many subscribers tonight um I'm complaining about it I shouldn't play a sweet bass lick um okay super fast [Music] I really need to make sure I need to change all my things that I do as to not be so uh intrusive into the stream but I appreciate it thanks zyz no it was totally worth it it was totally worth it um so back to this I'm confused because see there are I guess there could be draws where this would be the same and also the player ELO should change that's that's a little strange why are there maybe I just accidentally pulled in duplicates so we can just drop duplicates reset the index and make this our new data frame let's just power through this not get too worried about it all right put this up here and then we're going to take data frame 2022 and we're going to look at duplicates with the subset of black white player's name and then oh we can even do just a link that's like that would be a unique oh yeah duplicated and then we'll DF 2022 and we'll locate these six thousand three hundred and then what we saw before zero if we do it this way okay I think this is still fine why don't we just pull in all the years too there's no reason to uh what we got to do is pull in all the years and then get the player's name though get the players uh that we are considering as like the main player that we're analyzing in the game as we Loop through so we can do that pretty easily so we're gonna do uh 4p in parquets parquets and we're gonna make this a star and everything um then we'll do read parquet we did this yesterday all right so DF temp and then we'll do assign oh then we need to split this that's right we need a split the actual parquet file name and get just the username so let's do split on this backslash and then we'll we'll take the end of that which will be just the file name and then we'll split on an underscore and we'll just take the first value which will be the username and then we'll assign to this uh something called like main player and that's going to be this and then we'll append DF with DF temp this should run pretty quick we'll concatenate we'll drop duplicates and let's just call this DF it doesn't need to be 2022 anymore this is just everything guys we're gonna do it maybe a meme player column to check which player file came from yes yes you got me chat has got my back yeah so um all right now we got it Let's do an info on this just to see how big this file is I don't know why I didn't combine it initially maybe it's because I thought that in they were big enough they need to be split up into years now I do also have in these folders in this folder of subfolders for each player that actually has their PGN file so if we like Vim let's open a new tab go into chess cheating analysis we're going to go into Data like if I do Hikaru here I can see all his PGN games and we could look at like oh yeah it's by month it's split up by month and that's the way that chess.com I guess archives them and gave them to us now the other interesting thing is we do have the clock time it would be really cool to try to split and parse out the clocks to get some sort of because I don't think anyone that I know have correct me if I'm wrong I don't know if anyone has done an analysis where they actually look at the details of how long does a player typically take to make their moves um I have not seen that hey go for gopher welcome to the chat oh so we got this we got this um oh and the size of this is 107 megabytes tiny tiny still over almost half a million games but tiny in terms of of actual file size now let's go ahead and hmm how can we find okay so now that we have main player why isn't why isn't main player there oh geez I did this wrong I need to concatenate this these temp files all right so I'm re-running it again I accidentally kept my old code that um yeah did it incorrectly let's also load this extension oh sorry you were trying to tell me that in chat thank you for having my back all right so now let's do now we do have a main player let's Group by the main players and I think there's like a player ELO yeah player ELO which is like the main player that we care about let's find their max value let's sort by values and let's plot a beautiful bar plot kinda equals bar hey by the way I want to mention this two streams ago we did make a visualization using um using flight data cancellation data and guess what I posted on data is beautiful and it got to the very top of data is beautiful the subreddit so we are famous from that then I posted responses on it too much that I think I got it blocked out so then I took it down and reposted it and it didn't get as popular but the first time like 7 000 up tokes clout chasing yeah I can't help it man it's my Achilles heel all right so these are the top your top 50 ratings of all time in Blitz um Let's Make a Better style sheet oh we can use one of these new style sheets that I was trying to use the other day but we didn't have installed Seaborn V underscore 8 dark if I just use Seaborn dark at least this looks a little better we'll do with of one definitely make this not as high and maybe only show the top 15. this needs to be a tail Edge color top 15 chess.com Blitz rankings ratings still a little bit large on the all right should we add the values to this I feel like adding the values to this will make it look nice um so the way that we do that is using this built-in method that they've added let's see if this just works out of the box yeah so now we have the values we do padding of like negative 10. oh negative 10. negative 50 and font size of maybe 12. let's also set this title gotta make the font size of the title bigger there we go go um maybe make this 40 45 and color is white now should we do something a little sneaky here uh we need a set that y label as the player something instead of saying main player and then we need to set the X label as top ELO top Blitz ELO and should we maybe a bold text for okay I could do format old nope that just makes it say bold font Deja Vu sends bold Maybe I don't know how to do a bold let's look it up matplotlib make font bold oh wait equals Bolt there we go does that look better bold font weight oh there's font weight there's a lot of uh synonyms for different variables in matplotlib and other libraries like this so what what extension do I use to format lookup NB underscore black that's what I use that's this load extension lab black all right so the funny thing we can do set X limit to maybe be like three thousand 230 600. now it looks a lot more drastic this is what they teach you in how to lie with statistics I mean sometimes I think it's it's like makes sense because the absolute numbers of their top ratings here like you can't really compare them that well it's just we're not lying here per se we're just kind of exaggerating the fact that this is the very top but I'm actually very surprised that there's a huge drop off and then also keep in mind like some of these players are really good they just don't play on chess.com a lot so maybe they haven't loaded to that point how to lie with Statistics is actually a really good book bar plots should start at zero true so so do you think that this is incorrect to have it like this it's still a comparison what if we said um what if we said the title was Blitz ratings minus three thousand for the tops second version we'll get more uploads I actually don't think this is as deceiving as some plots are hey thank you for the subscription clipped seven months seven months in the house it does exaggerate the difference but that uh we're not playing another bass lick I'm sorry please land on Kevin if it lands on base look again then I'll do it 10 push-ups there we go yeah I think so where I feel like the changing the axis is really kind of sketchy is when there's when you see plots that are like um comparing how things have changed over time like look at the stock market is just falling and then you actually zoom out and you're like what actually it's just random noise at the top or it's like not really that big how many hours do I study per week I'm done studying I'm not doing school I just learn constantly through working and doing streams um so yeah top 15 ratings now let's look let's also get a dictionary of these as the top players so we'll do this group by Max too dict uh we'll say this is top ratings so this way we could pull maybe as a data frame is better and then sort values by top rating yeah there we go faults will make it so that we see the so then we can kind of focus in on these top guys look at top top five as if they were stocks um oh another thing we could look at is like how much does each player actually play on chess.com like does playing relate to scores so we could do just like a main player value counts yeah so we can see that there's definitely a big difference between those players that play a lot and those players that don't and maybe we want to focus only on the ones that are also play a lot look at this we're gonna do something really cool we're gonna map this uh we should also reset this index and then this will work so now we have like a summary data frame called top ratings that has their top rating and the number of games they played Magnus Carlson under a thousand games yet is still in the top five all time very powerful what's the end game you get all this data and then what we're learning we're having fun this is just laid back I mean I'm I'm not doing this for the paycheck um also exclamation point YouTube those things work in on Twitch if you're not on Twitch you should jump over there I encourage you to uh but it would just bring you back to my YouTube which you're already at so it's like circular Loop infinite Loop people um if we plot this like with the x is top rating and Y equals games played will this look interesting kind is scatter I don't think this is interesting what do you guys think let's make it an even thing and let's also make this size our marker size like 20 or something big oh is it just got to make it something actually big there not that interesting it's just I mean the the main interesting thing is there are two guys that stand stand out and they're both twitch streamers Hikaru and Daniel narrow diski um those are these guys and then I guess it's it's it's basically what we saw just by looking at it so looking at before what date are we exploring so we have all the top 50 chess players past games so let's look at the top five with over What's this called top ratings with at least 10 000 games so let's not look at the guys who didn't haven't played a lot so let's query where games played is over ten thousand uh we can reset this index and we're gonna we're gonna call this I don't know top 10K this is our our cohort of people that will look at that actually play a ton of games online and then we'll look at them like they're the stock market so we did this before plotly candlestick yeah we did this before um and we're gonna iterate four four player in in the in the top players and then we'll do something for each player so for each player we will take this data frame and we'll query where the main player is this player that's just their games and then we're going to actually set the index to be this time when the game is played UTC date time I think it's always nice to have a date time index and then we're going to look at their player ELO so this is basically our like temp player data oh yeah and then we did this before so we could also look at like they're closing and opening for the day yeah so let's do this we'll Group by UTC date the only time this kind of gets a little bit weird is when is when they like will play Over Midnight in the UTC time zone so the way the reason why this works for stock market is because you have the open and the close and then there's no trading off hours but these players can be playing at any point of the day I'm learning to print my name on something called python any advices print froak go just write this write this and you've done it go into your terminal and type python if that loads into python that means you have Python and then you can just do Python and then right yeah Python and then print it in there there you did it you did it in Python good job um okay so we still need to sort by that sort values by this UTC daytime or it won't work so that's like our open and close oh we also want high and low which would be Max and min so we're aggregating by the UTC date for this player their first last that's open and close first is open last is close Max is high and Min is going to be low our goal here is to get Hikaru to watch this because Hikaru is one of the top or is the top chess player and he likes the stock market so we want him to look at this and see what his Candlestick looks like all right so we're going to take this player data frame do the open High close and see what this looks like uh player DF it the index is our date why is it saying that we don't have a oh this needs to be uppercase open boom okay I realize there's something we need to do here still there's something on my lip uh oh people are saying I said something weird oh I'm sorry if I did that I'm looking in there is something on my lip I got it okay good I I don't want that to have been there all night all right so one thing I'm realizing that I didn't account for all right um filter games by type uh by rating group all right so if we just took our data frame and we query where the player where the main player equals Hikaru and this is fine uh we set index to date time UTC UTC daytime and we plot this we see that there's like a bunch of up and downs here this is not this is kind of weird let's also make this just like a dots for each the reason why it looks like this is because there are some strange or they're different rating groups right so on chess.com if I go to chess dot com hit car where is his stats view full stats all right so he has a blitz rating a bullet rating a rapid rating and then a daily reading which I don't think these guys really play that many daily oh like he's played zero daily games so the way we need to find that is by and then there's also these weird um uh other like variants that is is it actually the text none all right so let's make sure we filter out where the variant is none that's this might clean it up a little bit all right but still a lot of other stuff there there is also a Time control all right so there's a lot of them 107. what is this one over two nine I have no idea what that is we need to look that up I've never heard of that before we can actually look up this game really quickly or these games so there's games with a one over two five nine two hundred which notebook using winner is different from Jupiter this is Jupiter you can actually look on my YouTube channel I have a whole tutorial on how I set this up on on Jupiter and all that stuff it's a great it's a great one um so let's look at this game link and try to figure out what's going on 24th chess tournament oh so this was three days each game was three days I'm guessing all of these if you divide it by 60 and then we divided by 24 so this 120 days oh maybe divided by 60 again because it's by seconds seconds minutes yeah so this is a three days that makes sense all right so what we're going to do is we're going to map these four time control for TC in this TC split TC equals TC split on this plus so the plus means that you're playing at a certain time control but you have an increment like every time you make a move you get two seconds in 180 plus two and then we'll just take the zero index this will be the integer version of it all right so it won't work for that will also split on the backslash and take the negative one of that all right so now we have all these TC ins all right now we now need to figure out the thresholds I think if TCN is less than or equal to 60 we're making a mapping here now right then the TC map for this time control should be bullet now let's look at TC map so yeah these are all the bullet time controls why do you play a 43 second game is beyond me again I'm just gonna have to look up some of these just because I don't believe it oh yeah this is a string all right these guys we're just going back and forth playing 43 second long games yeah there you go look at the start of this it doesn't have the time oh yeah here we go 43 seconds each side very funny does he have an edu page for data science mine there's 120 Day games not surprised those games are like you'll die before they're over if you want to just die instead of losing that's like the ultimate win if you play 120 day long game I guess like you wouldn't have made a game and or move in 120 days so then they would win but then you go on vacation on chess.com what's the Bev what's the Bev what does that mean isn't the data only Blitz no so we so we pulled in the list of place players based on their Blitz rating so that's a good point this plot up here is is not correct yet this plot up here is incorrect because it includes all their ratings so we're gonna fix this this is actually gonna go in our analysis above everything else to make sure that we then make our Blitz um correctly so let's load in data is the first thing we're doing classify by time controls and then below that is Eda of top ratings and now we can we'll be able to do one for um the other time controls so I think five minute games are just because we're lazy let's do this we'll name it Blitz first and then we'll rewrite it with bullet if it's a less then so 60 times 3 I think three minutes there's five minute splits and then I think like let's just say 120 is rapid and then we'll initially set it to be daily I don't know if this is gonna work oh TC map there we go I think this is right no so I didn't set them all as daily I was just was being lazy and I said set them all as daily but if it's less than this set it as rapid and then but so if it's something's rated bullet this is like the not efficient way to write this code but lazy way but um it basically had written over this four times by the time it got down to Bullet so let's see if we take this time control and we map this TC map rating group Maybe and then let's do this plot down here where we Group by rating group we have to do the group by after our queries and setting our index we're just grouping by for the plot to make sure the colors are different and look they are I think we got it no wait we didn't get right I think some of these up here don't make sense so let's do a legend so blue is Blitz orange is bullet here green is daily but I thought we saw Hikaru didn't play daily maybe not rated daily and then rapid should be these why do we have these blue dots that are down here so let's query where our rating group equals rapid and our player ELO is greater than three thousand because it doesn't look like that should be the case this was a rapid game it was six Min or ten minute game instead of that let's look at where it says it's Blitz and less than 2500. chest openings what is this variant what game is this going on oh oh it must be something weird where he's playing he's playing a not rated game is there any way to see if it's rated yeah he's playing with viewers and also we don't know if some of these are just like friendly games anyways I'm thinking of one way to do it is if player ELO if their ELO didn't change after this but that's going to be kind of that's going to be kind of hard to do so let's just forget about that but one thing we do know is that this data frame now should be query is just Blitz oops all right so actually the this switches up the ordering so what I had before was not legitimate throw it out this is the blitz ratings and let's put these all side by side foreign so we can do Blitz ratings so let's do for for I group in bullet Blitz and rapid and let's enumerate this see if this works group is what we're going to want to do and then here we're going to want to do uh making this a f f string doesn't make sense but I might want to add more text to it later and this will be I there we go we need to make it wider and remove this fig size here all right so then we also need to change this range we're going to make this into something separate where we do D temp D temp we're going to plot stick with me here all right so now we have all the players but we don't have a good range for them so for each D temp we need the min uh sorry d-temp min and we can just plot this limits by the Min and the Max and then we can add some padding see there now it's just the min and let's do minus some padding and plus some padding and we'll set padding to be Paddington Bear will set padding to to be a hundred let's make a left and right pad our left pad can be 500 and our right pad can be 100 boom let's also get a color palette so we're going to import seaboard there we go now our palette oof I feel like my nose is dripping all right so now we take our palette and what do we do here we're going to color this with our palette of eye just to give it a little bit of diversity in the coloring groups and then we're gonna do fig sup plot sup title look how small that is font size is 24 I don't know something bigger there we go so car is the top of all of these yes we do have the tight plot tight layout here not perfect but I it's kind of cool maybe make this less wide what do you guys think like this or like maybe something taller oh change X label you're right so this is going to be our group this is actually where we needed the F string so top bullet yellow Blitz rapid now we're still not still not sure this is perfect because there could be this weird variants where their rating is high but let's just assume their highest rating is going to be from um from a rated game or from not a weird variant I think I queried out the variance let's let's make sure by removing the variance okay I don't know if it changed caru's still at the top of everything all right let's look at the bullet let's do this with the bullet now because bullet is like the a game of extremes they're really fast games so we're gonna put all this together maker player data frame we make sure our variance is none and make sure our rating group equals bullet all right what's it complaining about I don't think we even need that anymore right there's no height so can I do fig size on this let's see what this Candlestick one looks like it's it's got a little bit more space with it okay so we should be able to do something like this and change what the figure looks like oh look it's blue in the background beautiful I still think we have some anomalies here two four three seven let's see what's going on there where the player ELO equals two four three seven okay so this is the one where it's like high and low all right so this first one's in 2014 it probably was a legit rating which is kind of crazy that maybe it's not legit I don't know what was going on here but this one is definitely not legit and that's this one that was in 2020 Is Not Too Legit to Quit it's three check it's a three check game tltl so do you guys know how three check works I haven't looked at chat in a while 2437 is a string it should be an integer uh uh okay so fig update layout height I think that's what we did here yeah what data set are you studying I guess it's chess yes chess is this python yes it is python check if there's ELO chains after every game he plays yeah so then I would have to find I would have to find the next one yeah so three check is a variant a variant where all you have to do is check your opponent three times and you win but it should list that in the variant column which it did oh wait why did I do variant not equals none that would explain a lot all right so I want variant is an A so this was basically keeping all the variants I want it to be not an a let's see if this fixes some of it what oh no no I want it to be an a yes this looks a lot more normal let's zoom in here if you were gonna buy Hikaru stock when would you buy it look he has this look at his trends it's interesting how he has these these moments of just going up up and up and then kind of takes a dive I've experienced this in my chest career self-join and shift um so yeah now we can actually explore his some details about them let's also uh try to add this height thing that you suggested as like 200 200. it's gonna be really short 10K um let's query let's forget about this first one where he like just opened his account now let's also figure out what's going on here player ELO has to be at least 2800 for us to care about it there we go the height is way too crazy what am I doing here 500 okay now we can kind of do it can we do a title in here I don't know why it cuts off the title oh it's probably because of this margin thing I do like the having the margins margins smaller though this is Hikaru bullet stock when would you buy what what happened here how did he make this jump cheating cheating I think we found some proof of cheating no just just kidding all right we gotta get Hikaru to um to comment on this make him watch this in his stream and we'll tell them hey would you have bought here and sold here because if so you were smart maybe he's gaming the system but has he also been going down in bullet ratings all right so now we're going to Loop this over every player basically and see if this works we'll just do the top five to see it first so here's hikara bulletstock here's Daniel Nordic diski whoa look how different they look how does Hikaru have these long streaks of just day after day going up and Daniel's just all over the place he plays a lot of bullet though nihal all right so those went up those are interesting oh man this looks like it was not Brandon was not having a good day when he made this fall was this a month 2021 March to April was just like downhill he went all the way from almost 3 300 to under two uh rating that I would die to have massive hit all right so we're buying and we're selling stocks let's do the same thing for bullet I mean sorry Blitz will I upload this notebook somewhere yeah I'll try to get it up on kaggle by the way we're using this data set that I've made public it's all the chess.com top 50 Blitz players all of their chess.com games in one archive so upvote this if you like it and I'll try to make a a notebook with this in it so you can check it out later we still need to figure out what these big jumps are I must have mislabeled some of these they could be other variants that I'm not accounting for potentially when they reach a new ELO height don't they play harder opponents more frequently yeah so usually there's like a band of if you they just go to random games if they're not challenging someone specifically then then uh you're gonna only be paired against someone plus or minus certain rating that you're in Val boolin I'm doing some Google translate on what you're saying and it doesn't make any sense the what you said when I converted it so that might just be Lost in Translation sort of thing look at nilharsen nilharin he's just going up can someone explain to me all right Hikaru has to explain this on his stream either I did something totally wrong in my data analysis or he needs to explain himself why does he have these big ups and downs when all the other players are kind of like jumping up and [Music] um uh you know what this could be is the Y scale no it's fairly simple Daniel and him have fairly similar scales maybe he just doesn't play bullet that much Daniel also plays these crazy like 30 second games and that's maybe why he has so many more ups and downs um yeah that's probably it you know that's it because if you have if he's playing like 400 games in a day the variance the the high and low that he has the opportunity to to reach are going to be much wider than someone just playing a few games a day so I think that's what we're seeing it's still kind of uh interesting Trend and I think as of today what do we do are we going to buy are we going to sell Hikaru stock bullet is definitely more unstable but yeah it's more unstable hikar too volatile okay you guys don't want to spend any money on all right so I think that's it for this stream like I said it was gonna be kind of short I mean the only other thing is one thing I I might want to look at before we end here this is just an idea I had can we find openings all right here's a here's a like uh your boss comes at you and you work at chess Inc you work for chess.com and your boss comes to you and he says he or she says or they says hey I need you to find what the top chess players some of the Hidden gems what are some of the Hidden gems in terms of chess openings that people are winning with a lot that no one knows about oh Mr Gabriel had a question too sell is it down Trend bull again my problems PP orov what is that okay what's your question Mr Gabriel Gabriel oh you had the same question as me okay I thought you were saying you had a another question when's the next stream and what is the time so usually it's uh Tuesdays Thursdays and Sundays 9 30 Eastern ish but sometimes I can't stream for one reason or another and then there just is no Stream So you kind of just have to catch me when I'm on that's why you make sure that you like this video comment in it subscribe on YouTube follow on Twitch do all that monster stuff do all that awesome stuff look we got 2725 followers on Twitch let's get that to 5K tell all your friends but yeah those are the nights that I usually stream in and maybe sometimes I'll stream randomly like I've done Saturday mornings randomly uh all Eastern Time Zone all right so the way we're gonna do this is Eco should we do eco okay so Eco is is like the UCO is the opening database and then Eco URL okay so the Eco is going to be like a more specific of an opening more in depth Eco URL is going to have the more generalized opening should we go specific so let's Group by eco let's talk about Rob let's talk about the stock market video stock analysis yeah you need to explain that to me so what do we consider to be like a winning opening I guess we want to see when white we need to see when white wins so that would be when the result equals 1 0 these are white winds and what do we call it time grouping equal Blitz rating group is Blitz oh and variant is an a we basically should drop all the variants at the very beginning of this analysis because we probably aren't ever going to look into those all right so there we go now we have all of the Blitz Games where white piece is one but we that's not what we want that's not what we want we want um we want to group by the Eco and then we want to do the result value counts and unstack this baby this puppy fill in a as zero as type int and Let's do let's call this let's openings granted these are like top player Blitz openings but Blitz openings nonetheless I think to attract new subscribers is very interesting topic analysis stocks and Bitcoin and cryptocurrencies yeah totally we've done that on stream before and maybe that's a great idea for maybe next week's stream to to actually do some analytics data in uh in stocks and stuff I do have some videos on that like if you want to look up uh my time series forecasting and I also have some some videos on like working with the economic data where you can pull up specific stock prices from Fred okay so what is Eco so yeah great suggestion Alps or consider it for future streams one thing I will say though is check out my other videos on the topics because I have a few of them Jupiter style question how often do you see doc strings and or out of object oriented programming used in notebooks it depends on what you're doing in the notebook but most of the time you don't see it because people are defining those outside in scripts and then they're importing it all right so we can find The Uncommon that's a good point we can't find the uncommon Point part here you say um but but we can but we can so what we want to do is sum axis equals one oh and someone was asking what Eco is Eco is the name like the name of the opening there's codes for certain types of openings um so we do a zero zero oh shoot chest opening it's this is just called an uncommon opening but this is a not sure what the move is yeah so sum of one will give us times played uh and let's look at the Times played plot kind equals hist bins let's make a bunch of bins because I bet it's going to be like really skewed yeah okay so I guess I could have also done like a bar plot but what this is what I'm expecting this to show is yeah there are all these super uncommon openings down here at the bottom and then the stuff that everyone plays so what we want something is like in The Sweet Spot in The Sweet Spot of if it's never played then the odds of it being like considered like a winning commonly one opening um could just be random chance but if we pick a minimum threshold so this is white win percent can we find any outliers here maybe not and maybe making this as a plotly plot would be better plotly scatter plot because then we can do um we can do like hovers do I do data here oh plotly Express oh it takes this data frame there we go now we can hover and show hover data is Eco which actually might be index let's reset this index now what we can see predict when over winds plays or have you got that backwards uh what did I do this is the number of times white wins over the number of times that opening was played and let's do this opening query let me know if that's wrong or if I'm missing something let's only look when the times played is greater than one thousand all right one thousand might be too much 500. all right this is our key opening c28 hey what do we got here well five months here I learned more about how to do Day science now nice super LOL thank you so much for subscribing on on Twitch I really appreciate that hope everyone soon run J apple J is rating with a party of one J apple J thank you for for doing that hey look Kevin all right I did that for you super LOL don't ask me why I yelled the name Kevin I just do um yeah so this is our plotly PX does stand for plotly Express all right so what we want is this this opening this is when white wins most often c28 what's chess opening c28 the Vienna game but not only the Vienna but the Vienna that goes to this position so the Vienna is E4 night out when I saw harm in here I thought it was a Beth Harmon game so this game was this opening was pretty popular a while ago and then kind of came went out of favor so in the 1970s and 80s it was popular so what what does it go like oh wait night out next then develop the bishop then here so this is like the winner for white apparently in these games in very fast games let's see if we can filter this down to my games hey I played this 37 times and I've won 54 of them so that proves that this is a good opening for me maybe I need to play it more but how often did they actually bring their night out can't force that all right we found it we found it this is the key one though over 60 percent of the time and let's see who plays that a lot let's see which player actually plays this a lot c28 and then we'll do we'll find the white player because this is a position that white usually wins we'll do a value counts what do you know hikaru's played it the most yeah your car well he also plays the most actually Daniel plays more than him so the fact that Hikaru plays this a lot means that Hikaru knows what he's doing steal this kind equals Katie e would be better for for this if a car plays c28 by the stock could we do yeah could we do win percent for this opening so basically do a group by white and then result value counts I'm just wrapping this up guys don't worry I think I want to do normalize uh I want to do this doing this really lazily so this will give us the sum that we can then add to this one what is going on here why are these old zeros they should be values that um all right whatever uh getting really nasty here and that's not gonna work let's go back go back all right so at least we have this the number of times they played it and then the times that they won lost and draw I wanted to get this as a percentage but it doesn't really matter when these numbers are so small for these players the query is wrong time's played please can you do out object-oriented programming for data analysis why does it have to be object oriented everything in Python's an object everything inherits from an object so this data frame that I'm working in is a python object right okay people we did some stuff what we learned today we learned carowindschess.com he's the top of all the things uh best Blitz opening for white is the Vienna all right so honestly though we have to keep in mind that our we don't have like an unbiased sample data set here and it may be that Hikaru wins games because he's good and he also happens to play this opening so disclaimer out there it could be
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Rob Mulla · Rob Mulla · 51 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
▶
52
53
54
55
56
57
58
59
60
A Gentle Introduction to Pandas Data Analysis (on Kaggle)
Rob Mulla
Exploratory Data Analysis with Pandas Python
Rob Mulla
7 Python Data Visualization Libraries in 15 minutes
Rob Mulla
Kaggle competition starter notebook walkthrough
Rob Mulla
Kaggle Competitions: A Beginner's Guide to Winning
Rob Mulla
Jupyter Notebook Complete Beginner Guide - From Jupyter to Jupyterlab, Google Colab and Kaggle!
Rob Mulla
Audio Data Processing in Python
Rob Mulla
Complete Data Science Project!
Rob Mulla
Make Your Pandas Code Lightning Fast
Rob Mulla
Image Processing with OpenCV and Python
Rob Mulla
Speed Up Your Pandas Dataframes
Rob Mulla
This INCREDIBLE trick will speed up your data processes.
Rob Mulla
Complete Guide to Cross Validation
Rob Mulla
Easy Python Progress Bars with tqdm
Rob Mulla
Economic Data Analysis Project with Python Pandas - Data scraping, cleaning and exploration!
Rob Mulla
Python Sentiment Analysis Project with NLTK and 🤗 Transformers. Classify Amazon Reviews!!
Rob Mulla
Get Started with Machine Learning and AI in 2023
Rob Mulla
The Trick to Get Unlimited Datasets
Rob Mulla
Video Data Processing with Python and OpenCV
Rob Mulla
Object Detection in 10 minutes with YOLOv5 & Python!
Rob Mulla
Pandas for Data Science #shorts
Rob Mulla
Object Detection in 60 Seconds using Python and YOLOv5 #shorts
Rob Mulla
Machine Learning for Facial Recognition in Python in 60 Seconds #shorts
Rob Mulla
Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption
Rob Mulla
Detect Text in Images with Python - pytesseract vs. easyocr vs keras_ocr
Rob Mulla
Solving an Impossible Riddle with Code
Rob Mulla
Do these Pandas Alternatives actually work?
Rob Mulla
Time Series Forecasting with XGBoost - Advanced Methods
Rob Mulla
Data Science Uncut - Data Shootout Kaggle Competition (Aug 1 2022 Stream)
Rob Mulla
Kaggle Dataset Creation from Scratch- Data Science Uncut (Aug 10 2022)
Rob Mulla
Chess Board Computer Vision AI - Data Science Uncut (Sep 7, 2022)
Rob Mulla
25 Nooby Pandas Coding Mistakes You Should NEVER make.
Rob Mulla
DEFCON Hacking AI CTF Solution on Kaggle - Data Science Uncut Sep 11, 2022
Rob Mulla
More Chessboard Computer Vision AI - Data Science Uncut - Sep 13
Rob Mulla
Medallion Data Science Live Stream
Rob Mulla
Community Kaggle Competition Overview - Corn Classification (
Rob Mulla
Deep Learning Image Classification - Corn Kernels - Data Science Uncut
Rob Mulla
OpenAI Whisper Demo: Convert Speech to Text in Python
Rob Mulla
Yolov7 Custom Object Detection in Python Tutorial - Chess Piece Detection
Rob Mulla
Live Kaggle Coding - Enzyme Stability Prediction - Data Science Uncut Sep, 27 2022
Rob Mulla
Finding Chess Cheaters with Python! - Data Science Uncut Livestream
Rob Mulla
Data Science Uncut - Kaggle Community Competition & Chess Data Analysis - Oct 4, 2022
Rob Mulla
Flight Delay Dataset Creation (Data Science Uncut)
Rob Mulla
5 Reasons to Kaggle #shorts
Rob Mulla
♟️ Data Science - Chess Data Analysis
Rob Mulla
EXTREME PYTHON & DATA SCIENCE LIVE STREAM
Rob Mulla
What is Clustering in ML?
Rob Mulla
What is K-Nearest Neighbors?
Rob Mulla
LIVE CODING: Flight Data Exploration with Pandas & Python
Rob Mulla
Kaggle Survey vs. Twitter Sentiment
Rob Mulla
If Top Chess.com Players were STOCKS - Live Coding Data Anaylsis Stream
Rob Mulla
Data Visualization BATTLE!
Rob Mulla
LIVE CODING: Stocks & Sentiment Analysis
Rob Mulla
Progress Bar in Python with TQDM
Rob Mulla
Flight Cancellation Data Analysis
Rob Mulla
Synthetic Dataset Creation for Machine Learning - Blender and Python
Rob Mulla
The Ultimate Coding Setup for Data Science
Rob Mulla
Dataset Creation SPEED RUN - Live Coding With Python & Pandas
Rob Mulla
Data Wrangling with Python and Pandas LIVE
Rob Mulla
Forecasting with the FB Prophet Model
Rob Mulla
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Knowing When Not to Decide: Uncertainty Estimation in Medical AI Systems
Medium · AI
Predicting Customer Satisfaction with K-Nearest Neighbours: A Binary Classification Project
Medium · Machine Learning
Predicting Customer Satisfaction with K-Nearest Neighbours: A Binary Classification Project
Medium · Data Science
How YouTube Decides Which Video Ranks #1 — Cosine Similarity Explained Step by Step
Medium · AI
🎓
Tutor Explanation
DeepCamp AI