Data Science Uncut - Kaggle Community Competition & Chess Data Analysis - Oct 4, 2022
Key Takeaways
The video discusses a Kaggle community competition for image classification of corn images and analyzes chess data using Stockfish engine, covering topics such as fine-tuning, retrieval augmented generation, and computer vision.
Full Transcript
foreign [Music] hello everyone it is October 8th 2022 Tuesday October 4th sorry to Essie October 8th October 4th 2022. welcome everyone to the stream thank you for hanging out if you're in the chat let me know by saying hi I'm going to test the chat oh and I'm realizing that I forgot to change on this view but um we're gonna get into it today I'm really excited for our stream today uh we have a lot to go over if you've been following along with the stream and you know anything about the Stream we have been putting on our very own kaggle competition and this week just yesterday it completed and we have a winner hey good evening hello hello everyone in chat how's it going we have niss there we have uh kadisha what's up so let's go ahead and switch over to here we can see um yes we do have the chats now coming in good evening architect how's it going so we do have our competition which ended this week and we're gonna go over it a little bit it was a great finish thanks to everyone who competed and you see that GPU back there do you well it's right here that 3080 TI I'm going to send that GPU out to our winner this week and um very excited for them let's go into let's just search for the corn competition oh it's not gonna pop up Community competitions of corn POG Champs number three so we ended up with over 92 teams which was pretty strong showing giving the fact that this is just our third kaggle competition and there are a lot of really awesome submissions that we had up there too it's not like we had 92 teams and no one was really competing uh the top teams really were pushing it hey Andrew Spears hey we have our winner in chat how's it going Andrew uh congratulations on your win so yeah let's look here over at the leaderboard we see here at the top Andrew had a big jump now one thing I want to mention about this competition is we knew that there may be a shake up on the leaderboard given the fact that I only had 20 percent of the test data as um our public test so you know then that you need to trust your local valuation because with only 20 of the test data uh being public that means that there's a potential for a big Shake now there wasn't that big of a shake as you would think I mean some teams moving up 20 spots more or less um Andrew did jump up 18 spots so um yeah awesome to have you in in the live stream Andrew great job we're gonna look over your solution here in a second um but awesome work here um I was surprised here and I'm not gonna lie Andrew I was a little bit worried because I saw your account um that you were only a contributor not really that active on on kaggle and then um well I guess you have a lot more stuff now uh but before you add no competitions no real discussions and I thought hmm maybe this is someone's fake account and they're just cheating here uh somehow you know it was possible to go cheap but Andrew did a great job writing up his solution with all the details and actually doing a great job of of uh documenting everything in a notebook so congratulations Andrews we have two Majin who got second and ayuka Shaman as third hey welcome to the family we have someone new here go for gopher welcome to the family glad to have you here so we are um we're gonna take a look at some of the top Solutions uh I also want to let you all know that we do have a survey here I know everyone's favorite thing is to go into surveys but we do have a survey that I'm going to put in the chat this is a Survey Monkey for anyone who has competed in the competition or knew about the kaggle competition that we're putting on please check that out and let me know give me some feedback as to how you felt the competition went so I can make it better next time and we can keep on improving it so um yeah thanks everyone for hanging out with us tonight we're going to go over some of the solutions just a reminder if you don't know what this competition is or what it was all about it started about two weeks ago 16 days ago to be exact we released a bunch of data that was images a bunch of uh images of corn in the competition was to design an algorithm that was capable of of deciphering what classification to give each image a corn and the image the classifications were either pure that they were perfect uh broken discolored or silk cut and it's a multi-class classification competition hey zedtel welcome to the fam um so it was cool to see uh everyone's different approaches to this obviously it's a very um you know standard sort of setup for image classification where you're given a bunch of different images and their labels and you had to train um uh machine learning model that could detect which classification to put into the metric that we're using the valuation metric was just classification accuracy so the amount that you did correctly khadisha is asking when's the next competition you guys are already jumping on me for the next one um hopefully soon I honestly we need to get some some more sponsors uh if you if you guys have any suggestions as to who we could reach out to that would be cool um but I think it definitely makes it a lot more fun when there's something like that GPU on the line so um yeah I think we might take a little bit of time off reevaluate and then definitely launch another competition at some point which would be our fourth POG Champs uh pretty awesome so yeah we're giving all these images you gotta evaluate uh how well you did at predicting and then of course thanks to Nvidia our sponsor here where you're giving away this 3080 TI GPU to to our top Place finisher so Andrew that's going to be sent to you congratulations uh I do want to just early on here um get in here so we do have for the second and third place competitors we have uh deep learning Institute vouchers that I'm going to be giving out and we also said we're gonna give out a deep learning Institute voucher to a random chatter in the twitch chat and we are going to be giving away um we are going to be giving away it to the best notebook as voted on by you live coding time for everything to run perfectly on the first try database success that's exactly right uh kadisha said you can help out with with getting sponsors yeah definitely reach out to me on um on Twitter or through through kaggle if you have any hookups that would be very cool to to hear about is cackle an appropriate site to start getting involved with data science with zero previous coding experience it may be a little bit tough if you have zero coding experience I would probably recommend checking out um you know a basic python beginners course if you're just getting started out and then after that um maybe like an intro to machine learning course but I will say that kaggle does have some of its own learning um stuff here so you could just try to start if you're going to start brand new start by doing some of these courses like intro to programming and python these will probably get you started out right so um you could do your intro programming through here I'd I haven't heard that many people doing it this way there's also uh some courses I I recommend like the MIT X intro to python actually if you do exclamation point YouTube you can check out my YouTube channel which those of you who are on watching on YouTube probably already know about and I have a video on like getting started with in data science I would I would recommend checking that one out machine learning how to get started yeah this this I'll put into chat and this would be the one that I recommend to watch but great question there yo how does the that data scientist life on a Tuesday night pretty good Herbie Hoover how are you doing tonight POG stands for Juan it is uh POG is like a a twitch meme for I think its Origins go way back to this old video where uh I don't know if you guys know pogs it's like these little bottle caps like um paper bottle caps let's look up POG champ Origins oh look there's a whole Wikipedia article on it so it's intended to express excitement Intrigue Joy or shock so I did kind of steal the POG champ name from chess.com site they have a Pog Champ series also where they take famous uh twitch streamers and they have them play chess and they're not necessarily all amazing chess players but it's it's fun to watch go on uh Felipe says you're Brazilian welcome from Brazil uh yeah POG was a sweet game for us 90s kids I remember tell a little bit of a Side Story I remember when I was in uh in middle school I think and kids would play pogs which is this game where you just have these like little uh Circle piece of paper that like cardboard paper with different images on them and they're kind of like trading cards but you play this game where you would try to flip them so you you would challenge someone else you'd have them both face down and then you try to flip them and if you flip both of them then you would win them so you would actually take them from the other people sometimes you play for keeps and sometimes for not but I remember kids there may have been some kids in our Middle School who put tape on the bottom of their shoes and they would uh just walk around and kick over people's pogs holders and then step on the pogs to see how many they could just walk away with there may have been some kids who did that I'm not going to name any names but that may have gone on in my middle school not cool looking back on very not cool to do by the way don't do that rolling with these custom POG Slammers yeah Slammers those were the things you used to to like flip over the pugs hey Nick momby welcome so um back to this we're going to give a prize to the top to the top notebook there was a lot of awesome sharing in this competition and honestly I've learned a lot there's still a lot to learn but there I've learned a lot just from reading the top solutions to this competitions there are a lot of um use of Transformer based Vision models which have I guess gained a lot more popularity in recent years I haven't competed in a image competition on kaggle in a little while but back then it was like efficient net was everything and now it seems like these Vision Transformers are really becoming powerful and people use uh vits I've used those before but uh especially in this competition we're gonna see how those Vision Transformers work really well data Crusade 1999 welcome to the family good to have you here hope you're doing well so um way we're gonna do this and if you're not on Twitch come over to Twitch and join us um it's right here gonna put this in the chat definitely come over to Twitch and vote with us we're going to vote on the best notebook now I could take every single notebook and I could try my best yeah I think it's just too much so what I'm going to do here is I'm going to take however many the voting lets me take of the top voted on notebooks and we're gonna vote individually on those Andrew Spears hey you're on uh twitch now nice awesome glad to have you here um so we are going to make this survey and I'm going to highlight these top five let's start with this one I'm going to put this in as number one into chat this is uh so the first one is the quarantine labels and dimensions the second one is fast AI Baseline now fast AI was used in the top solution here that's gonna be number two another fasta Baseline and this is a CNN Base number three hey R1 auger welcome to the chat I hope you're doing well all right number four is this one it's corn by so I don't think I've seen this one that much detail so definitely check this out and Mariella Prada's CNN mobilenet model and there are some memes in that one so we're gonna vote on these um I don't know if that'll let me get a sixth in there we'll try manage a poll new pull and we're going to make this 10 minutes yeah it only gives me five spots so I'm gonna make it vote on your fav for it uh corn notebook all right so you guys have the links there you have the links there um we are going to look at these in a second but so number one is this corn seed number two is the augmentations fast AI number three is this one CNN fast AI Baseline number four I'm surprised uh none of the none of the um exploratory ones got here in the top number four is corn by so and CNN mobile Nets number five how many projects do you recommend I do to become proficient enough at data analytics two three one zero Sammy I don't think you ever become I don't think I've ever felt like I'm proficient enough it's always something where you feel like you can constantly get better and learn uh your image processing video really helped with the Eda thanks so much Nick momby all right so we're gonna launch this poll one vote per participant I don't think I allowed extra that's the poll right there feel free to vote on this one with your favorite notebook I linked all of them in the chat hopefully you guys can link open them up if you want again number one we have the it's corn image labels and dimensions uh this is some exploratory analysis and looking at the dimension relationship I think there was yeah some box plots some bar plots to see what the labels were this is a fast AI album stations album mintations am I saying it right albummentations uh notebook showing using things like uh cut out augmentations where you kind of add these uh black squares to the to the images when training or during test time augmentation and yeah this is a great example of of uh that this one uh ningi notebook I remember this one it's got this very very cute corn at the top did some exploratory analysis and uh a lot of like I'm going to put my vote in I mean I maybe I shouldn't vote that wouldn't be cool but this one is really good is what I'm saying and um a very clean notebook I would say uh this well organized and um well really good explanations and then it's corn by so got some corn kernels here at the top so check this one out we got some augmentations talking about overfitting and then Mariella's uh one which does have some pretty dank memes okay so you guys are voting on that while that's going on uh we can see okay we got two votes coming in feel free to vote people uh you are not asking me my tape but I say do as much as possible go all the way and never step stop that's a good suggestion Pawn 23 110. I want to complete at least three by the end of this month complete three what complete complete three competitions that's pretty ambitious uh but definitely keep going never stop all right so um yeah let's talk about some of the top Solutions all right so how did people approach this and let's recently posted let's look at um should we start from the top and work our way down or go the opposite way when Rob gonna tell us that he was the one who labeled all the corn images yeah Matt I went through all the corn and I'm the expert I'm the expert there were people asking about that who who did it oh by the way let's let's uh shout out to the corn I mean it's not that hard to find this data set so this is the data set we're going to give its proper shot out in chat and I'll put it on the the link I didn't want to obviously link to it during the competition because that would kind of Point everyone where to look for this answers but I did use this data set which was provided and is open for using for research like we were using it for and thank you very much to This research team that created it labeled it and um they also did their approach which if you want to read their paper what they did was they created again to um actually made make more images I was surprised that no one in this competition did that I don't know if it actually would work in this kaggle competition but they used a generative additive model to create more samples to train on so definitely thank you to this these people who created I'm not three projects from analysis analyzing data and it reports uh 2310 Sammy says oh yeah I totally agree that that makes sense uh three projects in a month one a week sounds about good I support it 100 all right we're tied two to two in the the race for the best notebook alrighty so uh yeah let's go from number one down let's start at the top and we'll we'll see a lot of common threads here in this you'll see when you look at the top um solutions for any competition on kaggle is a lot of time there's a lot of overlap and then surprisingly you'll see people that had completely different approaches also do well together and that just shows that diversity in your model solution usually gives you a better overall score on the private leaderboard if you're taking a robust solution with a bunch of different approaches and combining them it usually does uh pretty well so hey Europa welcome to the family how's it going so let's start with uh Andrew who's here with us solution I'm not going to read everyone's solution in complete detail but I am going to go over the general points so the one thing I will say about what Andrew did here which was really impressive is he linked to his GitHub repo where he he add every step written out I haven't gone through and actually looked at all of these um but he did use fast AI you can see the first one is initial experiments and you can kind of go through here and see the steps that he took in order to get to his winning solution so there was a lot of work that went into it and it becomes clear when you see it here you can see um also some good documentation written up on what actuals was used in the solution I think it's the same write-up as we're going to read here on kaggle so one common thread I've noticed here and you guys can tell me uh Mr Gabriel blintz first time watching good night to you as well uh first time watching live okay cool well well welcome I'm glad to have you here so let's let's look into what this solution was honestly I thought it was going to be a candy corn poll what's a candy corn Pole um so a lot of people used Fast a d fast AI deep learning framework uh which is great to see I've uh said it in the past I've tried fast Ai and it never really caught on for me but that was a few years ago so fast AI in both this competition and the last competition were heavily used by really uh successful participants so I think that that's to say if you're getting started in deep learning it may be something that you want to go out and check out fast AI is number one a framework so like a package in a library you can use in Python for training these type of networks neural networks but also there is um there's a course that goes alongside it that kind of steps through how to use it and the reason behind why you use each of the steps I think that's a good explanation for what it is for someone who hasn't really done the course in a while uh correct me if I'm wrong leaderboard changes were insane yeah we talked about that Mr Gabriel Blends that tends to happen especially when you have a small public leaderboard so people can really overfit that leaderboard uh ishwari says give some tips to study python check out my YouTube channel I got a bunch of them a bunch of tips uh won't go into too many details what distinguishes a library from a framework that's a good question I don't really know the difference technically so it's it's uh it's whatever you want to call it it's a library slash framework let's call it um so what sort of models were used most of them are going to be used from the Tim repository Tim is the pi torch image models repository uh that is standard and used across uh kaggle competitions often definitely check this out if you haven't already this GitHub repo that kind of talks about it but basically it gives all these different architectures gives you an easy way to access the different architectures using that package so not only does fast AI a framework but I guess it's built on top of other libraries so everything in in code kind of inherits from it from other stuff right uh almost signed up for an introduction to the Linux class but it's 169 dollars don't spend money spend time uh Jack Dawes thank you for the appreciation I I appreciate that back thank you namaste uh we stand on the feed of giants lopta was that a correct statement did you it was that a Freudian slip all right so Cove next large vit that's a vision Transformer based model and then these win models swin models are new I don't know much about them other than their Transformer based and uh I think they're good for like picking up on small changes like at the pixel level uh between images and focus noticing really what to focus in on the image now in this competition we noticed that some people had noted that the the classifiers were actually picking up on the shape in the background of the image and since we had top down top and bottom image views of each of the Corn it's kind of important for the model to be able to determine what part of this image to focus in on uh to sort of ignore this background in the focus in on the the seed part because that's what it's actually using the classifier right so I guess these wind models do really well on that I had two days at home this week account for positive covid test oh sorry laptop yeah at least you can breathe that's always better than um you know when you get covered you gotta take a chill pill hang out ride through it you'll get through it all right so one thing you'll notice too is that there are a bunch of different experiments that were done uh that's another thing that you'll see in a lot of winning Solutions is that you try a lot of things some May Fail most May Fail even and then you just focus on the things that that work and do well and you you evaluate what does well based on Cross validation not necessarily only on the leaderboard but if you see a correlation between the leaderboard and your local cross validation you want to kind of stick with that is it possible to get the same results by using tensorflow slash Keras Pi torch and fast AR are being very dominant So in theory in theory with this Tim Library you should be able to import all the architecture from into tensorflow Keras or Pi torch or fast AI or fast AI I guess is built on top of Pi torch but regardless of the um regardless of the back end the architecture should stay the same so if the architecture is is the same you're training with the same learning rate you all the tensorflow or Pi torch is doing is it's it's doing the math equation to uh to find like the optimum weights within this model uh to satisfy whatever loss you're giving it so in theory if you're using the exact same parameters you should be able to in a sense recreate the exact same results Pi torch or Keras tensorflow the same now in practice it's not always that easy and part of I think what fast AI does really well is helps you find that learning rate to start on and uh and that can be really important especially for training these image classifiers where getting the correct starting learning rate is really important in order to reach that optimal point so definitely check out Jeremy Howard's fast AI course if you haven't already all right so let's look at augmentations uh uh random flip and rotate only random rip flip and rotate with small cropping rotate with small lighting cropping and warping aloud so that's interesting that the the augmentations weren't that strong sometimes you'll see Solutions with incredibly strong augmentations and I guess the amount of augmentations you add should may impact um may actually impact how much you have to train because the more augmented your training data the the harder it's the longer it's going to take to fit the model it's doing the grunt work for the algorithm yeah kind of laptop it's automating the part that's not automated already when most of it is it follows some learning rate schedule and freezes parts of the network and ways to make most networks train well yeah that is Andrew Spears that's a good point explanation at high level in 10 seconds yeah you can freeze up to certain points right and you can freeze certain layers in the model so that's a good point the the something that fast AI definitely does well um it also kind of makes it more of an iterative process to reach that optimal at loq welcome to the family thanks thanks for joining us glad to have you here hope you're doing well um so yeah Andrew I know you're in chai if there's anything else from this that I should kind of jump into uh he gives a bunch of shout outs here definitely everyone should check this out and congratulations to Andrew for your first place solution great job um we can look at the second place solution and just maybe note some of the differences uh so Chum engine did not qualify for the prize because the submission was not made through a notebook that's one thing to note here so I guess this is like a like a two with the asterisk on it um so a bunch of different models let's see if there's the swim large is here swim large patch for yeah thank you Andrew congratulations and I'll be sending that GPU to you very soon um so fast AI again one fit cycle uh fp16 so that allows you to um basically get twice as much of bang for your buck what what floating Point 16 allows you to do is um use less space on your GPU and actually less computation time and really get the same results as if you use floating point 32. so most people use flim Point 16 pseudo hard labeling were used for some models so yeah that's interesting oh the other thing is I think that Andrew mentioned it but using test time augmentation is an approach where you actually augment the images that you're predicting on um and it looked like a lot of people did that is that single Precision floating point uh maybe laptop I'm not sure yeah a lot to learn from all these Solutions out here congrats everyone great job all right third place the interesting thing about this is are you good men said mainly wanted to use Transformer attention-based Vision models so this is where the swin model comes in and these b i t model b-i-t it's like a burp based image Transformer based model man these models are getting crazy it's hard to keep up with all of them um so yeah these are like the state of the art the the really cutting edge type of models that are being used in his solution uh also showing us here using all album mutations all the different augmentations being used and then these are the uh four times TTA test time augmentations that were applied during the prediction phase um oh okay and then also using some mix up so I remember I had back in the day I had a cat had had to write up the mix of augmentation myself from scratch but I guess there's an implementation now yeah so I think I may have used this implementation too Tim has oh mix up within within their Library so what mix up does is it basically instead of feeding the exact image into the model when training it combines two of the images to the model as an image and then it gives it the um the labels of the average of the two depending on what like what uh level that you apply the mix up hey kenjito two welcome to the family thanks for joining let me know in chat how you found it um yeah some great work here so definitely good job on third place let's go down the line fourth place we don't have yet hopefully we'll get fifth seventh eleventh oh oh also let's go back here albuminations fast AI is the winner of the of the notebook challenge so congratulations to whoever won that Mr Gabriel Blends thank you for subscribing you don't need to subscribe but when you do do you know what happens we spin the wheel of Destiny so let's do this picker wheel and spin it that's what I do when we get a subscription thank you so much Mr Gabriel uh by the way if you have Amazon if any of you out there have Amazon accounts and you log in oh gotta type Pizza 100 times and you log into that linked at to your Twitch account you can you get one Prime subscription for free every month doesn't cost you a thing uh so let's go to Sublime Text find my pizzas all right I have typed Pizza 1500 times on stream 100 more to go thanks to that subscription uh so while I'm typing this let's talk a little bit more about what we're going to do tonight we're going to look a little bit more over people's Solutions maybe not get that more into it a reminder there is a survey out there if you took place took part in the competition and you want to give some feedback that would be great let me know what you want to see in the next competition um man Mis type that one yeah or if you have any other feedback as to how you thought it went I thought it went great you guys did such a great job of sharing the right amount not too much um also afterwards all the different solutions that were written up I was very impressed and encouraged by the way people came together on this competition and working on someone noted this in the discussion but working on kind of a small-ish image data set like this in a controlled environment like a kaggle competition I think can be really helpful because a lot of times on kaggle competitions nowadays the data sets are so large that you kind of need to know where to start all right there we go 100 for you why are you writing Pizza okabe side because every time that I get a subscriber on Twitch I spin this wheel and I do whatever it says so if I landed on 10 push-ups I would have done that and we just got a new subscriber and I had to type Pizza a hundred times and as you can see I've written it on my stream before 1600 times so we've been here before yeah so I I really think working on it is that this this size is great for people trying to learn because you have a good size of data that you can still train the model on but you don't have to spend hours and days just to get results back from your experiments so I hope that everyone kind of enjoyed this uh learned a lot keep your eyes open keep your eyes peeled maybe I'll do a banana competition keep your eyes peeled for the next competition whatever that may be and that's to be determined anything else we want to go over in chat let me know if you have any questions about this competition before we officially close it out we have our winner of the winner of this with just augmentations need to remember this congrats you won the notebook challenge reach out to me and I will send you your DLI voucher there we go any other questions okay we have um Andrew saying finishing my PhD in computer science computer vision during the pandemic not an ml based been Skilling up during my jobs and learning as much as I can I've been prototyping ml related things intensflow part pytorch and fast AI when the competition was two weeks long Anew I had to go fast that AI or rapid training prototyping Ah that's cool so fast AI was kind of a a choice based on how quick the competition was too just like the name says you can kind of iterate over things quickly with it so when will the next chance come out I don't know yet we we will know we will find out soon uh when I find out I will let you guys know great work awesome job everyone claps claps for for chat all right you guys ready to change gears a little bit close this so I have been doing some more analysis all right so we're going to switch gears now if uh if you don't know the background behind some of the stuff we've been going on over previously in stream there has been some drama in the Chess World clouded Spirit welcome to the family thanks for joining let me know in chat how you found us um so there's been some drama in the Chess World and there's been some analysis out there over potential cheating I don't know if you guys saw in the Wall Street Journal there was a new article that came out just today about this um so this guy Hans Neiman a U.S chess player beat the best chess player Magnus Carlsen in an event last month and out of that came some accusations uh direct or indirect of potential cheating and and also since then uh a lot of people have pointed to different data sets that supposedly prove that this guy's cheated now he admitted he admitted to cheating in games online but not in games he said he's never cheated over the board so in a real live tournament now in this exclusive that came out today and I think uh this report came out so I don't have access to the Wall Street that sorry the Wall Street Journal version of this but there there's actually a report that chess.com just released that kind of contradicts what Han said how much he cheated so they're saying he cheated in over a hundred games many of which were for cash money hey baby restroom welcome to the channel uh Andrew Spears do you have Discord been trying to do ML and seem very knowledgeable yeah hey guys by the way exclamation point Discord let's find our Discord Channel if you want to join my Discord maybe you guys can connect over there I'll put the link in there if anyone didn't get it already please feel free to join we would love to have you so uh there have been some allegations that have come out about over-the-board chest results both by chess.com kind of indirectly and then also with other people releasing their YouTube videos and analyzes out there so um we were doing we're trying to see okay if we are gonna find a chess cheater how would we do it using just the data so let me open up vs code here and talk about some of what we've done so far some of the data showed Hans had a hundred percent accuracy which is impossible yes so that's what that's what we looked at last time n g h ing I hope I'm saying that right welcome also um great job on your work in the competition you had a great uh notebooks and I really enjoyed your how active you were on the competition uh that just ended so yeah that analysis that showed that Magnus and I think we we looked up the YouTube video before we were watching through it um has been disproven so that analysis relied on chess base for the amount of uh accuracy that they were giving Hans Neiman for certain games and that was I guess doctored um so I'm sure I can find the post here and we can talk about that so basically what that what that had shown was that for some of the games that Hans was playing in they showed 100 100 accuracy and I was actually writing this code in order to not disprove that but to double check it um and it's been kind of disproven since then and we were not seeing 100 accuracy when we were running this analysis on the games that they said um they said that he was doing so let's see I thought this was pretty interesting so putting this all in context the Reddit chess subreddit um this is what people on the Reddit subreddit believe about what Hans has done so most don't believe that he cheated in the singfield cup which is the event that Magnus Carlson uh left after playing and losing the Hans and then it's kind of interesting how divided it really is in terms of what people think at least on on the chess subreddit which I don't know is like a good sample population but it's a population nonetheless the thing I wanted to see though was wow there's a lot of infographics oh this is another statistical analysis of Neiman's sent upon law so this is kind of what we're getting at we're going to be looking at the centerpoints Lost stuff um this is a Brazilian scientist who kind of went into detail about his analysis which I guess came out a few days ago and goes in line with what we're looking at but I think some of his conclusions are a little bit stronger than what I would say so what this guy has actually been doing is uh all right so let's let's start out here at the top level so uh what's what even is sent upon loss so there's there's no like clear definition unless you're in the very end games where it's been theoretically proven what the best move is but it's very hard to determine in any given position of a chess game what is the best move you can make and all you have to go off of is what the engine says is the best move but if you make the engine run deeper or longer and we're running this I'm running stockfish 15 here if you make it run longer or with more depth it may find that another move is better but what's sent upon loss is telling us is at least based on what the what the engine thinks is the best move how many Pawns of an advantage does one side or the other have um after the next move is taken so let's let's actually look up sent upon lost to make sure we know it it's a unit of measure used in chess as a representation of the advantage so if you're playing a game on chess.com and you see like a plus one that's uh you're up a pawn the the engine is saying that white is ahead by a pawn right a centipond is equal to one hundredth of a pawn so when we're talking about Santa pons just divide it by 100 if you want to see the the amount relative to a pawn these values play no formal role in the game but are useful to players top computer move will lose zero centiponds so that's assuming that the top computer move is the best and like Alpha zero doesn't beat that computer or whatever but lesser moves will result in a deterioration of the position the value may be used as an indicator of the quality of play so think about it this way if the best engine has a zero Cent upon loss if you also play the best move then you have zero synth Pawn loss based off of that um so what does our code do let's step through it again what I've what I've written this code to do is number one we download we can download uh the let's find this new repo chess cheating analysis repo that I created so I downloaded all the different Grand Masters games using this PGN mentor website which has a zip file full of their PGN was like represents all the moves from their game so we can actually walk through or sorry not pgns pngs of their games then um I created a script that took this these pgns and converted them into kind of an easy to read data frame format so let's let's actually walk through that here I know I'm jumping all over the place let me know if this none of this makes sense or if you guys want me to slow down are you continuing with your chest chat cheese chess cheat analysis imposter engineer asks yeah I'm trying to take a little bit first a little bit forward um um so yeah so there's this that high level analysis where they've taken the scent upon loss and they you can look at like an average over your entire game what you're sent upon loss is and then you can also look at your your um specific sent upon loss for each move and another thing you can look at is the player's rating compared to their scent upon loss so that's one of the things that kind of came out with this new video was this guy was looking at foreign close this people have been looking at the scent upon loss versus the the rating of the players and making some I guess um making some assumptions about if you are a certain level player you should have this correlation between your average sent upon loss in your rating um so this is a very clear example where I guess across all of this is across all of the players it's this straight correlation that doesn't mean that you can't deviate from that and not be cheating um were you able to very verify some of it so imposter engineer what we found was that prior analysis where the person had released the older like the accuracy of each of Hans Neiman's games was actually corrupt and that's why we're seeing so many games that showed a hundred percent accuracy and when I try to recreate it with my analysis I could not actually find the same results um there are some weird stuff going on with chest bass apparently that caused those results to be uh incorrect but what this guy is doing is kind of similar to what I was I was looking at which is actually and analyzing the centipotent loss by running your own uh computation of the player's moves versus the top engine move or the centipod difference between those moves because if you actually just look and see how many times the best player best move was picked it can actually you can actually win a lot of games and not pick the number one top move you could pick the second to top move and it'd be almost as good as the top one so you kind of want to compute that for each move what the Cent Pawn loss difference is um are there any beads recognized during the calculations nope nope no beads recognized so what this guy is showing is a bunch of different players their correlation here and then he gets to Hans which I which is kind of interesting because he's first showing that up until 2018 that that this there's the same correlation that you kind of see for other players right Raphael milk you know who this is milky chess yeah but then when he adds in this other data point now granted there are four data points here like you don't expect this to necessarily always be correlated let's say if you had a fifth data point that was down here at 45 and it was 2700 right then maybe you would start to see actual correlation here but with four data points it's pretty it's pretty sketchy to kind of make some conclusions here uh but the the statement that's being made here is because this correlation is not here that obviously he's a cheater now I think this is a little bit too broad of a statement to be making and you'd have to actually see this for a bunch of different players to actually make that as a as a um a statement that I would be able to stand behind how would that even work you supposedly without any actual nerve endings to have any reaction uh I I think that that whole beads thing is just a just a funny joke yeah I don't know if this is good enough for me as um analysis to say that for sure he cheated so I've been running this in the background and I'm gonna go ahead and stop this and we're going to walk through this code that I've been running in the background so you haven't been noticing it but my computer's been hard at work actually running analysis on all of the games from the singfield cup now my plan is to ramp this up to more games but we're going to look at exactly what I'm doing in this code here step by step so we have um let's just go here and step through this code so what I have here is the same PGN oh I need to import this up here um so I have downloaded this singfield Cup 2022 PGN file which has each of the games a stockfish for python nice yeah this is stockfish in Python so this stockfish you could actually use using the chess API or you could just use stockfish directly and I'm using stockfish directly and I'm linking it here to my stockfish 15 version which I downloaded and built here on this uh on my operating system so it's good to go run this then I'm going to run my little script here that converts all these pgns into a data frame format we're going to see what that looks like it happens real quick and then we now have instead of uh and I keep on saying PG PNG I think so forgive me but PGN file converted into a tabular format where now we have the player's name who's playing Black they're ELO um this is the opening that was played the result of the game the round numbering which is actually pretty clean in this format and then this is what we care about the main line moves so if we go to Mainline moves in this data frame and look up this first one these are all the moves in the game between Wesley so and who's playing this white that's way over here uh darville darf so this is the this is this game and we could see the main line moves then we can actually load this into a game board uh chess board which would look like this and we actually have this board which we can then step through all the moves of so in this position opening position we could run stockfish here but it's pretty worthless then we could do four move in board up move [Music] oh main line moves uh so basically what we can do through here is we're loading this this main board then we're going looping through every move in these Mainline moves and we can break after this and actually take this board and push this next move now if we look at the board our new position has this first move done I was reading PNG instead of PGN and I was wondering how you would use an image yeah it's a little confusing Mr Gabriel I'm getting confused myself as I do it but you can see we could use this board object to actually push the different moves and then from here we can get the fen notation the fear notation is actually showing us all the position of all the pieces on the board so it's like the same representation of this image version but the fen can be fed into stockfish stockfish is one of the strongest computer um engines out there so we can take this Fen and create our stock fish before we do this we have to create our stockfish object when we create this object there are a few things that we have to set so the number of threads that we want our computer to run you can see here my machine has 64 threads that we can could be using I am also streaming right now so I don't want to use all of them but I think 32 is fine and then minimum thinking time might not need to actually set that I think it defaults maybe to 30. we wanted to go faster if it doesn't actually need to think about a move that's an obvious move and then this is the big one the depth so the more that you increase depth it's not a linear relationship with the amount of time it takes to run so a depth of 10 could go pretty fast 17 out depth of 10 I think would take uh like on this machine 20 seconds to maybe go through a whole game a depth of 15 maybe two minutes 17 maybe four and then when I was doing 20 it's like 10 minutes to run uh pure electricities yeah it's a it's a decent machine I built this myself a few years back so it's it's held up over the years so yeah let's let's actually set this depth to be pretty low and then we can show that with this stock fish uh object we can uh set the fen position so let's call this the fen now our stockfish can we get the if we get the fen position it's this which we could then load into our board I think we load it this way right thank you let's not worry about that just take my word for Grant word for it that we loaded in that this at the end of this position and then we can get the top moves so by default this gives us the top five and I want to have it give us the top 10 moves now created this is a depth of 15. so five so um yeah let's do this let's just see what the timing is to run for a single move with a depth of five we're going to run time it on this so time it we'll run this a bunch of different times in a row seven times or so ashay thank you for um for your positive words about the POG James competition I had a lot of fun we had a lot of fun reviewing it today too so if you want to go check out that let me know what's Fen Southern TW says Fen is actual notation for the board setup so if you think about PGN portable game notation PGN actually walks you through an entire chess game with each move that was made so it's saying like these are the moves in order starting from a certain setup you also have some metadata in it like the who's playing his white who's playing his Bat black the date Etc Fen Fen notation Forsyth Ed's words notation lets you set up the board in any position it could actually be a position that's not legal to make to to have in chess because you're just telling for each of the rows in the the board what piece should be on each position so lowercase I believe is our white pieces upper cases are black pieces and then these eights are just saying that those those rows are blank someone else said depth is how many moves ahead stock fits engine is looking more depth yeah more depth equals more variations of the game so let's see if there's like a good image of it think about this as like a tree that it's looking in hey welcome to the stream hope you're doing well let me know in the chat how you found this is Theory to compare frequency of all players using centerpie loss on all moves if a cheater only chooses to use the engine at one or two difficult points in game good point Matt that's kind of what I want to look up that's kind of what I want to get at seems like your data might be drawn up by all the easier movements to assess thus sent upon loss for move does not equal y yeah the other thing to keep in mind and also welcome to the family we had someone join uh I missed who it was spiked punch man welcome so yeah we want to uh we want to actually see what the senapon loss is at certain moves also if you're set upon loss is low in a move but it's forced you got to consider that too like if it's a force move or if there are really no other options other than that one best move then you might pick that move and not necessarily be an indicator of cheating so this is a lot of complicated going things going on at the same time so yeah this is like a maybe maybe an easier way to kind of compare what depth is like the number of simulations or situations that it's looking into depends on if a few things but depth here is like how deep down into moves is it going to look to see who actually to evaluate who is doing better um so you can imagine these trees as your depth gets larger and larger this grows into a huge amount that's why as we increase our depth here we're going to see that it's going to become slower and slower Okay so uh about half a second plus or minus 40 milliseconds per per run to get the top 10 moves at a depth of five now let's run it at a depth of 10 excuse me number of children in binary tree I think so I think kind of like that I don't I'm sure stockfish does some smarter um smarter like pruning of the trees there are certain trees that you might go down that are obviously lost situations which you wouldn't actually go and see the full depth of but at some sort of representation of that all right so then we ran a depth of 10 uh 641 so actually not that bad not so bad now we're doing a depth
Original Description
Timeline:
00:00 Intro
02:30 PogChamps #3
14:40 Voting on Best Notebooks
23:38 Top Solution Reviews
45:00 Chess Data Analysis
1:27:02 Coding
Follow me on twitch for live coding streams: https://www.twitch.tv/medallionstallion_
Community Competition:
- Link to the competition: https://www.kaggle.com/competitions/kaggle-pog-series-s01e03
- Register and join NVIDIA's GTC using this link to qualify: https://nvda.ws/3Qb0b9x
My other videos:
Speed Up Your Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg
Speed up Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg
Intro to Pandas video: https://www.youtube.com/watch?v=_Eb0utIRdkw
Exploratory Data Analysis Video: https://www.youtube.com/watch?v=xi0vhXFPegw
Working with Audio data in Python: https://www.youtube.com/watch?v=ZqpSb5p1xQo
Efficient Pandas Dataframes: https://www.youtube.com/watch?v=u4_c2LDi4b8
* Youtube: https://www.youtube.com/channel/UCxladMszXan-jfgzyeIMyvw
* Twitch: https://www.twitch.tv/medallionstallion_
* Twitter: https://twitter.com/MedallionData
* Kaggle: https://www.kaggle.com/robikscube
#chess #python #livestream #datascience
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Rob Mulla · Rob Mulla · 42 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
▶
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
A Gentle Introduction to Pandas Data Analysis (on Kaggle)
Rob Mulla
Exploratory Data Analysis with Pandas Python
Rob Mulla
7 Python Data Visualization Libraries in 15 minutes
Rob Mulla
Kaggle competition starter notebook walkthrough
Rob Mulla
Kaggle Competitions: A Beginner's Guide to Winning
Rob Mulla
Jupyter Notebook Complete Beginner Guide - From Jupyter to Jupyterlab, Google Colab and Kaggle!
Rob Mulla
Audio Data Processing in Python
Rob Mulla
Complete Data Science Project!
Rob Mulla
Make Your Pandas Code Lightning Fast
Rob Mulla
Image Processing with OpenCV and Python
Rob Mulla
Speed Up Your Pandas Dataframes
Rob Mulla
This INCREDIBLE trick will speed up your data processes.
Rob Mulla
Complete Guide to Cross Validation
Rob Mulla
Easy Python Progress Bars with tqdm
Rob Mulla
Economic Data Analysis Project with Python Pandas - Data scraping, cleaning and exploration!
Rob Mulla
Python Sentiment Analysis Project with NLTK and 🤗 Transformers. Classify Amazon Reviews!!
Rob Mulla
Get Started with Machine Learning and AI in 2023
Rob Mulla
The Trick to Get Unlimited Datasets
Rob Mulla
Video Data Processing with Python and OpenCV
Rob Mulla
Object Detection in 10 minutes with YOLOv5 & Python!
Rob Mulla
Pandas for Data Science #shorts
Rob Mulla
Object Detection in 60 Seconds using Python and YOLOv5 #shorts
Rob Mulla
Machine Learning for Facial Recognition in Python in 60 Seconds #shorts
Rob Mulla
Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption
Rob Mulla
Detect Text in Images with Python - pytesseract vs. easyocr vs keras_ocr
Rob Mulla
Solving an Impossible Riddle with Code
Rob Mulla
Do these Pandas Alternatives actually work?
Rob Mulla
Time Series Forecasting with XGBoost - Advanced Methods
Rob Mulla
Data Science Uncut - Data Shootout Kaggle Competition (Aug 1 2022 Stream)
Rob Mulla
Kaggle Dataset Creation from Scratch- Data Science Uncut (Aug 10 2022)
Rob Mulla
Chess Board Computer Vision AI - Data Science Uncut (Sep 7, 2022)
Rob Mulla
25 Nooby Pandas Coding Mistakes You Should NEVER make.
Rob Mulla
DEFCON Hacking AI CTF Solution on Kaggle - Data Science Uncut Sep 11, 2022
Rob Mulla
More Chessboard Computer Vision AI - Data Science Uncut - Sep 13
Rob Mulla
Medallion Data Science Live Stream
Rob Mulla
Community Kaggle Competition Overview - Corn Classification (
Rob Mulla
Deep Learning Image Classification - Corn Kernels - Data Science Uncut
Rob Mulla
OpenAI Whisper Demo: Convert Speech to Text in Python
Rob Mulla
Yolov7 Custom Object Detection in Python Tutorial - Chess Piece Detection
Rob Mulla
Live Kaggle Coding - Enzyme Stability Prediction - Data Science Uncut Sep, 27 2022
Rob Mulla
Finding Chess Cheaters with Python! - Data Science Uncut Livestream
Rob Mulla
Data Science Uncut - Kaggle Community Competition & Chess Data Analysis - Oct 4, 2022
Rob Mulla
Flight Delay Dataset Creation (Data Science Uncut)
Rob Mulla
5 Reasons to Kaggle #shorts
Rob Mulla
♟️ Data Science - Chess Data Analysis
Rob Mulla
EXTREME PYTHON & DATA SCIENCE LIVE STREAM
Rob Mulla
What is Clustering in ML?
Rob Mulla
What is K-Nearest Neighbors?
Rob Mulla
LIVE CODING: Flight Data Exploration with Pandas & Python
Rob Mulla
Kaggle Survey vs. Twitter Sentiment
Rob Mulla
If Top Chess.com Players were STOCKS - Live Coding Data Anaylsis Stream
Rob Mulla
Data Visualization BATTLE!
Rob Mulla
LIVE CODING: Stocks & Sentiment Analysis
Rob Mulla
Progress Bar in Python with TQDM
Rob Mulla
Flight Cancellation Data Analysis
Rob Mulla
Synthetic Dataset Creation for Machine Learning - Blender and Python
Rob Mulla
The Ultimate Coding Setup for Data Science
Rob Mulla
Dataset Creation SPEED RUN - Live Coding With Python & Pandas
Rob Mulla
Data Wrangling with Python and Pandas LIVE
Rob Mulla
Forecasting with the FB Prophet Model
Rob Mulla
More on: CV Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
FREE AI Sin City Photo Generator — Turn Any Photo Into High-Contrast Noir Art (2026)
Dev.to AI
Google makes Gemini’s personalized image generation free for all US users
The Next Web AI
Gemini’s personalized AI image generation is now free for U.S. users
TechCrunch AI
WebP's Compression Secret: How a 1MB PNG Becomes a 200KB WebP
Dev.to · swift king
Chapters (6)
Intro
2:30
PogChamps #3
14:40
Voting on Best Notebooks
23:38
Top Solution Reviews
45:00
Chess Data Analysis
1:27:02
Coding
🎓
Tutor Explanation
DeepCamp AI