Community Kaggle Competition Overview - Corn Classification (

Rob Mulla · Beginner ·🎨 Image & Video AI ·3y ago
In this live stream we talk about the Third PogChamp Competition - corn classification. Join for your chance to win a brand new GPU - Link to the competition: https://www.kaggle.com/competitions/kaggle-pog-series-s01e03 - Register and join NVIDIA's GTC using this link to qualify: https://nvda.ws/3Qb0b9x Follow me on twitch for live coding streams: https://www.twitch.tv/medallionstallion_ My other videos: Speed Up Your Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg Speed up Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg Intro to Pandas video: https://www.youtube.com/watch?v=_Eb0utIRdkw Exploratory Data Analysis Video: https://www.youtube.com/watch?v=xi0vhXFPegw Working with Audio data in Python: https://www.youtube.com/watch?v=ZqpSb5p1xQo Efficient Pandas Dataframes: https://www.youtube.com/watch?v=u4_c2LDi4b8 * Youtube: https://www.youtube.com/channel/UCxladMszXan-jfgzyeIMyvw * Twitch: https://www.twitch.tv/medallionstallion_ * Twitter: https://twitter.com/MedallionData * Kaggle: https://www.kaggle.com/robikscube #kaggle #python #livestream

What You'll Learn

The video discusses the Community Kaggle Competition Overview - Corn Classification, where participants classify images of corn into four categories: pure, broken, discolored, or silk cut, using tools like Kaggle, Nvidia GTC, and Rapids, with the goal of winning a 3080 Ti GPU.

Full Transcript

foreign [Music] hello everyone it is Sunday September 18 2022 and I can hear myself in an echo and I muted that um but we are streaming live tonight big announcement for you all big announcement so if you're in the Stream you're in the right place to be you can join us over on Twitch that's the easiest way to chat and be engaged I'm also streaming this on another platform YouTube so um if you're watching on that stick around and watch there too but I hope you all are having a great uh Sunday or Monday morning wherever you live and you're ready for this announcement so Let's test out the chat just to see let's just make sure it work hey powwow thank you for uh saying you're a big fan I appreciate it I appreciate that you're here watching the stream too so I have a big announcement that I'm going to make on The Stream which is that I am launching the Third in the series of POG Champs competitions so I know people from the twitch streams are still coming in so hopefully um they'll catch this or maybe I'll just repeat it but um if you weren't aware next week we have Nvidia GTC which is coming oh you're in Powell you're in Poland and it's 3 25 a.m did you wake up for this or just have you not not gone to sleep yet so nvidia's GTC conference is happening this week and they gave me a little something to uh hype it up a little bit to support our competition I've showed this on the stream before when we were given this but this is a brand new and I've just put some fingerprints on it which is nasty but 3080 t i g of G-Force RTX much better GPU than I have in my computer I'm jealous of whoever's going to win this this will be going to the first place finisher in our latest POG Champs competition so um we're going to go over what that competition is yeah Power pow Y is that am I saying that right um someone in the YouTube chat so the YouTube chat doesn't appear here but Poway is saying that I'm pronouncing his or her name incorrectly I apologize for that I do it all the time I'm really bad about that but I try so that's what matters man twitch has taken a while for people to come in or maybe just no one's going to come in tonight so let's just let's just go for it let's just go for it shall we um we have this kaggle competition that I'm going to launch and it is based around drum roll please the the most popular song This these days meme song right now is about corn and so is our competition it's about corn so let me show you all what I tweeted out the other day I tweeted out um this which is alluding to the fact this has been our logo for the POG Champs competition in the past looting the fact that there is is corn related stay tuned and we're gonna we're gonna launch this competition tonight and you can join it and uh anyone's allowed to join anyone is able to either win this GPU or uh an Nvidia deep learning Institute uh voucher and you can get that for free just by finishing in the top three or four I forget what we're gonna say for this so what is the competition what's it all about you may be asking well we're gonna go through that tonight we're gonna talk about it we're gonna make a a starter notebook and to summarize to just put it in a few words we're given a data set with a bunch of images of corn and we need to label them so this is an image classification competition um you can see here some examples of the Corn images now um I'm not gonna talk too much about the source of the data but it should not matter where the source of the day came from because the some of the rules in this competition is you must pretty much train your model from scratch so don't go searching for this data set and try to does a corn have the juice it does when you try it with with uh butter I I've heard everything changes it just changes everything so we're launching this competition you can see here at the top it says this competition is not yet live only competitions hosts can currently view it well guess what people we're gonna go live tonight on stream get hyped and you're gonna witness this um hopefully I've done a good enough job of explaining it here it's a pretty simple and straightforward thing that you're trying to do here it's classified into four different categories one of four categories each image into either if it's pure so I guess that's like uh no issues with the corn broken discolored or silk cut we can Google silk cut uh to qualify for oh the valuation metric it's very simple just classification accuracy how accurate you are of classifying each of them each uh image it has a unique seed ID and you provide it the label that's all you have to do for this competition super straightforward prizes we already talked about this RTX 3080 TI which is sitting here the only thing I will say is if you are not in the U.S and you win this prize I will pay the shipping but you have to take care of any sort of customs uh taxation that you make it uh that was a little unexpected thing last time I sent out a GPU I didn't realize that we had to pay for the custom stuff so that's the only thing hey what is this what is this imposter engineer is that corn the Emoji doesn't show up in my window but I see that you did some sort of a emoji let me paste it somewhere else oh it looks like a crying face teary eyes yeah it doesn't show up in the actual stream but uh teary eyes because you're happy hopefully anyways I hope you guys are excited about this as I am we're gonna launch it here tonight I'm gonna answer any questions anyone might have uh we're going to talk a little bit about the rules and then maybe we'll create a notebook or something to kick off the competition I want you all to have submitted something so that someone's on the leaderboard now random people are going to win uh these vouchers so second through third place can win a Nvidia deep learning Institute voucher I'm also going to give a a best notebook award that will go to the top notebook and from the competition and also a random twitch or YouTube viewer is going to win some things you need to do and I think everyone watching this right now have probably already of this uh Salah city likes image comps nice are you please tell me you're in for this competition then in order to qualify for the prizes you must register for this is key and I'm going to put this into the chat you must register for G DC conference why isn't it posting that into twitch chat hey David Jade Jackson shout out to David J he he streams data stuff too um he's in chat David J click that link register for nvidia's GTC so if you click that link and it's important that you use that link so they know it's me and that they'd support this stuff in the future but it'll take you to this page where you can register for free for nvidia's GTC which is going on this week starting tomorrow however don't worry if you're watching this like no September 23rd you can still join and watch one of the previous you just have to register and then you can uh watch one of the oh the pre-streamed sessions there are a lot of good ones out there let's look at the session catalog shall we shall we take a gander look they already have stuff on Thursday September 27 oh okay so that's late uh so you can watch some of the Keynotes obviously keynote's great because you get to see like a lot of awesome stuff hey who's sending me Discord oh wait someone's checking out my Discord man what's up uh we also have uh uh a lot of other good sessions so talks in panels Deep dive into Rapids now we know I haven't used Rapids as much as I'd like to but Rapids is like a panda's on steroids it lets you do uh manipulation of data on your GPU so if you own a GPU which you will if you register for GTC and win this competition then and hopefully you do already have one but then you can use Rapid so like that would be a good one I kind of want to check out that one oh look at this applying lessons from kaggle winning solutions to real world problems all of the grand masters from the CAG of the Nvidia Grand Master team are going to be on this panel I'm definitely adding this one to my schedule I don't think I'm logged into my account but that is one that you do not want to miss so you this these are things you should want to watch anyways but in order to qualify it's important important important that you register using the link that I sent that Robert what are you talking about bro um so yeah so this is the Pug Champs third run we're going to get a lot of Engagement so just to remind you all of POG Champs season one episode two trying to keep it organized by making seasons and episodes was this music classification uh competition competition you got first place and was sent a GPU so if you're guessing if this is legit or not go ask him um but this was a super fun competition we did with music classification this year it's a little bit different and we're doing um we are doing let me organize this we are doing corn classification so it's all images super simple it's beginner friendly but it's also hopefully going to get really competitive on the top of leaderboard I think there's a lot of interesting augmentations and training techniques that might be used to win this competition so definitely join using this link after we launch which I haven't launched yet um so you just need to attend one of the sessions you can attend the session after GTC is over so don't worry about a lot um that even though you should register now you need to register using this link if you've already registered using a different link just re-register with that one uh follow me on Twitch and YouTube that's a requirement and be a totally awesome person I think you guys got that covered are they satellite images corn or photographs of the ears of corn there's there's individual corns what do you call so corn is like the whole what do you call one of the things a knob that's what it says in the song I don't know what we call one of those things in corn um so yeah you can check out here the description this is what they look like so actually they're like top and bottom images of of it and Okay so another thing is you might say to yourself I have never worked with image data how does this work like how do I even use Python to work with image data well you can check out my YouTube channel where I do have this video it's a little cringe for me to watch my own videos but uh where I talk a little bit about the basics of working with image data and we're also going to go over this today but definitely check out this video which I've linked there and I'm also going to send into our chat right now let's go check that out if you haven't already and if you haven't already followed my YouTube channel give that a gander all right so um we haven't launched yet I'm also going to talk about the rules so I I uh give myself the liberty of changing anything about the competition at any point although I do not plan to if something goes terribly wrong we may have to restart it let's hope that doesn't happen and it definitely won't happen if people follow these rules so you must only discuss within a team I don't want to see any of that funny business where people are talking with non-teammates about details of how they got all the way to the top of the leaderboard that's not cool there is unfortunately we're not gonna allow any external data the reason behind that is uh yeah you I mean you could try to go out there and like hand label these or find the labels to them hey hello one pot dish hi I'm new to your channel what should I expect here you should expect some awesome data science in in enjoying life that's what we're doing here that's the number one is enjoying life and number two is data science like a big gap between the enjoying life and the but yeah no you should you shouldn't expect from this competition a lot of fun and really the point of this competition is to learn it's so that people have a chance to learn to grow maybe experience doing some modeling with images which you've never worked if you've never worked on before it can be really fun to learn about just like some of the setup um okay Reuben you are the first follower of the night on Twitch welcome to the family thanks so much for hanging out with us so uh some other rules you cannot hand label the test data clearly that would be against the rules so when you submit this your submission notebook must be shared with me so all of the submission inference if you train a model somewhere else that's fine on this data but you cannot have the inference done somewhere else and then upload the CSV that will not count it has to be done through a notebook the reason for that is I don't want any funny business where you went in and hand labeled all the images and the test set and I have no way to know that for sure unless I see your submission notebook and I also May request to to check your training code if you have trained multiple models I'd want to make sure that you've only trained on the training data you are away from keyboard hey Medallion how's it going hey Killen a 13 are you killing it hey I need some advice ask me man just ask I'll try to answer as much as I can can't guarantee it's going to be a good answer but I will try to answer uh look look at this look at this no hand labeling of the test set I accidentally put in twice so I'm going to delete this second one but that's how much it was on my mind if someone has a tech lead who isn't qualified and isn't mentoring Junior devs what should they do find a new job no I'm just kidding I mean you're gonna have to deal with this in every company every job that nothing's ever perfect so uh it's so nuanced I can't answer that question can we use Gan secrete synthetic data that's fine that that's fine because because people um that would not be external data that would just be creating new data that you've created and if someone wants to argue or like convince me that external data should be used and if there is like an external data set I don't know that could be used in to make this better that's not actually the like true labels of the test set that's my concern is if you allow external data then people could just find the test set and make a perfect model and that's not what we're here for we're here to take the training data and try to make a as good of a classifier as you can on the test data imposter Engineers asking if Robert E has graduated I was actually Robert it's it's strange but I was thinking today when I was about to stream earlier today I was like man I haven't seen Robert E in chat for a while I wonder I wonder if that's just the ebb and flows of streaming like some people will watch for a while and then not be into it but it made me really happy to see you pop up here today so welcome back but we have one more grads glass you can't even type class you too imposter engineer you guys are the ogs of this of this uh streaming Endeavor that I've taken on uh let's talk about data data is pretty simple you have your train CSV which has a unique seed ID for all the training data I hope I didn't screw this up foreign hope I didn't let's let's make sure the test so where are my like hearts gonna drop if I screwed something up the test what are the train IDs for here started why is there number two okay so yeah this makes sense so the training IDs are just random I thought that well the reason why I got scared here is I thought that I saw like no gaps here in the numbers but they're just randomly assigned fake speak welcome to the family let us know in chat how you joined what led you to join but um glad to have you so yeah so the the seed ID is just a unique identifier to get you to the image and also the identifier that you're going to be using to make your submission as you can see in the train CSV we have the label we also have one additional field which is if the image is taken from the top or the bottom of the Corn I don't know what you would consider top of corn or bottom of corn but they know and then all the labels and then for test it's identical to this except for we don't have the labels because that's what you're predicting sample submission looks like this so this is just a random sample submission I don't think I would like to see you all submit this just to see if it works but this is just random noise that we're predicting like one of the four classes in this submission it needs to be a submission for every image in the test set any questions about this oh yeah oh then we have our training folder with our images of corn and our test folder with our images of corn it's corn man Robert E is suggesting that should fire this person isn't this person above you in the company if the if you're the Junior and they're the if Junior devs could fire senior devs who would be chaos all right so what should we do number one is we should launch this if you launch checklist launch competition now why wait oh it's launched the competition has launched so now I'm gonna put this in chat and I want people to click on it make sure it works for them and confirm with me not only has it launched well now's a good time to switch over to something else and just say a big thank you to people um how should I show this in my YouTube studio um but but something I was very excited about that happened this weekend where is this happened a few days ago was that we hit 10 000 YouTube subscribers so thank everyone who is subscribed if you haven't click exclamation YouTube okay rajneesh can see the competition that's a good sign but thanks to everyone who has subscribed hit 10K let's go for a hundred how about it why not let's keep it going I mean this is like a fun this is a fun learning Endeavor for me as well so I appreciate that people have been so positive back towards me I it genuinely makes my day when I get someone in my YouTube videos or in chat here saying things like hey I learned a lot from your your stream or I learned a lot from your video and I helped someone out because it took me a long time to learn a lot of this stuff I'm not like advanced perfect at every area of data science but at least if I can share the little bit that I know to you all especially those you are new starting out that would be that's going to be fun all right oh I didn't talk about team size I think yeah I made the maximum team members three so here's another thing is if if you are a team of more than one person and you win yeah I'm only going to send the GPU I only have one GPU so keep that in mind I can't send I can't cut this GPU into three pieces and mail it out or I could but that would not be smart image neural next is the hardest thing in data science uh p k is it it depends on what you're doing there's so much like it's gotten easy to do classification not easy not easy but it's gotten a little bit easier but but that's a good take I think that like it's doing video stuff doing segmentation object detection like details within the image has gotten a lot more advanced too and then obviously all this stuff like open Ai and stable diffusion image stuff is cool so people can see this no one's on the leaderboard but I would like someone out there to go and go to the data tab make sure you accept everything go to this sample submission download this killing a says I'm new to ml so definitely have found your resources useful awesome thank you so much Killen I can't read your name without thinking about the Rage Against the Machine song Killing In The Name Of uh to use functions is one thing to understand how to fine-tune it is me it's hard oh I agree okay so I agree there yeah being able to really optimize image classification classification models is hard getting the right learning rate getting the right schedule getting good augmentations it's definitely an art chicken man has lost 20 pounds in two months congrats man that's awesome please add no external GPU and RAM that would be fun what does that mean that's awesome man though anyone able to submit by downloading this sample submission and then going to submit predictions and putting it here [Music] if someone could do that that'd be great oh another thing I didn't mention this is going to be a pretty quick competition as of now unless I change the date and actually who's gonna look it over but um the the closing date is in about two weeks so you you guys need to get on this are you allowed to train locally yes you can you can per the rules training locally is fine submission needs to be done through a notebook and I may request to check your training code so don't just go training like crazy and not tracking things I want to be able to if you want that GPU you need to be able to show to me how you trained it um I might ask things like what's the learning rate what's the data set what splits did you do those sorts Jason Liam is part of the family now thanks so much for joining um we're talking about a new kaggle competition just launched and you could win this GPU um so super important again I'm going to reiterate this so we don't forget but the prizes you must uh the the biggest thing is you must register and attend a GTC session using this link this link nice nice happy to be part of the chat Jason thank you thank you for becoming part of the chat made a sample submission seems to work okay 0.27 0.27 we've got someone on the leaderboard this GPU is yours if we were in to end today that's it all right should we start a discussion let me tweet this out let's do a tweet I'm gonna say just launched Chicago Community competition POG Champs number three hey welcome to the family real apotheus you are part of the family now so I'm saying register here can I use this image in it I'm trying to make a tweet you'll guys sorry I hope this is the right link too if I screwed up this link compete for a brand new uh let me make sure I type it right are art GPU RTG Force [Music] RTX 3080 TI GPU there we go just launched this now people have the link they have the link to register it's corn uh should we do a baseline exploratory data analysis notebook [Music] may maybe you can add one more constraint that I have a use of limited use kaggle resource only limited Ram GPU time that would be fun yeah that would be fun but I understand that other people have competitions going on that they're involved in that also will use GPU so um yeah I I don't want to restrict that much I want to I want to let people because you can people have won major prize money on kaggle training only using um only using Google Cloud instances or um Google collab or collab Pro so if they can do it I think you guys can do it too I don't think that the GPU is going to be the limiting factor um yes man this is a a Das Keyboard oh wait should I do this do you guys want to see the keyboard it's a little off-center but so let's let's take a look at the data let's also start a discussion let's try to reuse as much as I can from the last competition um I probably should have done this stuff beforehand but [Music] welcome to the Third foreign view let's link to the overview you URL which is here a little redundant and rules which is here let's link this it's corn I just like saying it's corn a lot it's a little corny I'm gonna pin this and can I upvote my own no I can't forget what you cannot feel you can open your own notebooks but you can't upload your own posts apparently okay so any questions from chat or anyone else about what this competition's about there's what 22 people here on Twitch a few more on YouTube watching so if if these are the only people entering this competition you have pretty good odds of winning like a thousand dollar that thousand some dollar GPU right now wait corn or music yeah campus I'm on both now for for the time being the description said second where oh the description yeah you're right good thing there's a edit button no wait there's no edit button where's the edit button unpin foreign my post says music okay this is you guys are you guys are right this this is how how I'm trying to take a shortcut here and no one's letting me do it and that's good because that means you're keeping me honest here so um let's go back to this music one I'm just trying to take the last one and not write stuff writing things is not my favorite thing to do like writing text um in school and stuff I'm always like ah it's not worth my time I like coding Welcome to our third this time we are creating models that can classify classify images of corn I hope you can join read the overview I kind of was expecting there to be that edit button so that I could change things like this but still on me that we're having to do this right now rules rules rules we're going to link that here preview it before I publish look good everyone there was an edit button where was the edit button oh it's hidden here it wasn't always here or maybe I just don't start discussion topics much you're right not nod Kings you are on top of things I'm just blind I swear the edit button is here like if if I do a test post here test uh let's do this yeah the edit is like right here for so I wasn't expecting to see it bottom left user error let's delete this now it's going to show it as deleted right this comment has been deleted that's fine uh how can I join this comp I don't see it in the kaggle competition list so that's a great question that you can use exclamation competition oh wait we don't have that yet I'm going to link it here I'm going to link it here I also just tweeted about it so it's in the Twitter verse and um yeah it's a community competition so I think if you go to competitions and you go to community tab then you'll see it here but you got to do like search for POG maybe you can see all the POG competitions I this is the test one and these were the last two so go to community tab search for POG or just use the link uh any code session today will there be an overview of the best solution at the end yes hopefully the the whoever wins will share their solution in full uh they're only required to share it with me in order to get the GPU but hopefully there'll be a lot of sharing and discussion going on um look we already have a second person on the leaderboard so we have phaedrus and we have levent two people this person second place currently is winning a deep learning Institute uh voucher just for submitting the Baseline we will do some coding today so let should we start a new notebook anything else that I've forgotten keep me honest people all right let's turn the Dark theme on just a little bit nicer for doing this sort of coating and we'll take a look at the images shall we POG Champs number three oh whoa I didn't know I could make the title this long it used to be like really restricted in the length uh but we got it it's corn classification so let's uh in this competition we are given images of corn and need to classify them into one of four labels the labels are uh we could look it up somewhere else or we could just go into this description pure broken discolored or skill silk cut um and now I'm realizing I need to change that exclamation competition so how's my 3080 hey clipped you this is got your name on it it's got your name on it you can win this one for real this time um I need to go into my nightbot and change I forget how to change the command automatically [Music] but I can do it here manually let's make this now link to the Third all right let's try it now competition exclamation point competition now it links to the third one perfect perfect I've got my YouTube subscriber count up there all right so we need a we need a classify images uh will I participate in the competition I'll participate in watching it and commenting on what I see but I won't I won't I I'll probably just do a baseline I don't see myself competing to the point where I'm gonna get high I and I actually don't think that it will let me show up on the leaderboard as a host but we'll see if it does I mean I might try to get up on there um so let's do data over you we have a train CSV and test CSV with the with metadata about the images the train data set includes labels the test data set is what we need to design our model to predict or that's probably bad English but predict four should you not end a sentence in four uh from glob import glob or we might not actually do that so let's read in train and test train this is usually how I like to do it so we're going to read in the training data set foreign autocomplete you're not very good on kaggle notebooks autocomplete POG series corn oh I didn't know it all shows up in a subfolder called corn that's not intended but not a deal breaker all right we have our train our test data frame now and we also are going to bring in us the sample submission sample submission is there anything else in there you would think I would know because I made this data set but no uh train so our labels we can do a value counts on this label this will tell us how many in the training data set of each label we have we can also do a plot and make this a kind equals horizontal bar plot fig size of ten by five and Let's do let's import matplotlib shall we hey welcome to the family five shot 97 great year so you are part of the family right now you just gotta let me know and and chat how you found the channel and what led you to follow so we're launching the kaggle competition right now where you can win this GPU this GPU you need to attend GTC which is going on this week nvidia's hosting it using the link in the description so exclamation point competition will take you to all the details and you can join us why would you not want to it's so fun it's just so fun it's so fun uh title our title is uh uh count of labels in training set oh I also want to do Style use maybe Gigi plot this will give us a better looking background we can show oh whoa that was weird show this all right what is silk a description of silk cut stress related loss of Kernel integrity that's what it's called it's a kernel a kernel of corn I was trying to think of what it was called earlier it's a kernel what's cool about 97 I don't know it's a good year good morning laptop GTC but jtt Jonathan Taylor Thomas I remember 97 like it was yesterday no it's it I remember 97 like it was 25 years ago was it that long Okay so uh now we know the numbers let's get let's plot plot some examples of each class so um remember in our training data set we have our image site so we're going to query where the label equals or let's do train Group by label so this is going to be for label D in this in this group by uh we're gonna break here so then that gives us our label so the first one is broken and then D is going to give us a bunch of the images so we're going to just go into the image of this values and let's do a grid so like and let's make this like uh three by three which will give us nine a fig size is going to be like a round number maybe this is going to be too big 15. and then we're going to take in each of these images so for I and this and we're just going to take the first five images and then we're gonna do so we're doing for I file name in enumerate we're going to enumerate over these so this will give us an index then we're going to also flatten this we're gonna I am show first we need to read in the image we're going to read in this file name oh but this file name is going to be this file name is not going to be correct because the file name is not relative to this directory so let's make it relative to this corn directory let's call this like our base directory and I could use like the pathlib is probably the correct way to do this but I'm just going to stick with what I know for now so this is base dur plus the file name is that going to work missing one positional oh I am show we're gonna miss that there we go all right now we have let's also take these grids off and we're going to make a subtitle for this which is going to be the label name let's make an F string example label um I need to pick the subplot okay so what's up people okay what did I miss I found you browsing software and game development not many people stream data side besides you and nikwon yeah well welcome five shot also if you want to join this competition and win a GPU you can do that too hey I already registered for GDC event using another giveaway link so does it mean I can't have a chance of winning your event no it doesn't it doesn't mean that you can't win from my event it just means that you just should register again using my link click my link like 50 times two that'll help I don't know how they track it just yeah it would be helpful for me if when you register you register using my link yeah I know a lot of people are doing giveaways too uh just make another account when you register and use my link I'm kind of late to the show because they gave me my GPU late but um are we playing corn kernel versus asteroid that would be a fun competition uh super loli welcome to the stream did you just join we just launched a competition for kaggle competition Community kaggle competition where you can join and win a GPU this GPU that's in my hands I haven't opened it still sealed Signed Sealed Delivered and I will send that to your door if you win this competition we're we're classifying corn it's corn I want to play that song but I'm sure I'll get like copyright struck for doing this I should use botnet to sign up which well you you can't be overly uh suspicious Robert I appreciate it but um I want it to be legitimate people who sign up they're gonna if they see an outlier you never want to be the outlier right if you're you're like in second place that's perfect like you don't want to be like uh what's his name Hans uh Neiman in in the chess tournaments and in doing sketchy things online and then they find out that you're cheating not that I would cheat but I'm just saying it can't be that obvious and machine learning algorithms are pretty smart these days aged Gmail accounts they also need to attend too so just use five gmails sure uh I did five here but this should be nine all right so what did we do here in case you're just watching this and you want to learn what what we're plotting here um what I did is I I we're We're looping over each of these labels in the training data set then we're making a sub slice of that data frame called D for each label then for that we're creating a subplot a nine a three by three grid of nine images we flatten this axis because otherwise this is like a 2d array and we want to just iterate over it and then We're looping through the first nine images in this sub slice and we are plotting it to uh location in this grid so you can see these are example and then we make a subtitle which is the which is the label let's make this font size a little bit bigger and let's also make the Y I always get thrown off by this because yeah there there's a little bit further down otherwise it's like way up high I don't know why matplot live subplot subtitles always do that um so these are broken images uh should we also should we also show what uh if it's the top or bottom I guess that doesn't really matter I'll let you guys figure out if that matters I don't I honestly don't know uh what's up hat night how are you doing welcome broken kernels yeah this is my uh Ubuntu kernel right here broken so these are all broken these are discolored so I guess you can tell like the actual breaks yeah these are discolored these are pure um I just want to eat these ones all up that one looks a little sketch but it's okay I'll let it through and this is silk cut which I'm still not clear on what that exactly means but I think it has to do with like the silk uh that you know when you're shucking corn and that silks there it like cuts it from when it you pull in the silk off hey you go with data and dropping by before you go to dinner I hope you have a great dinner we just launched a kaggle competition you should totally join tell your friends to join uh everyone follow Yucca with data if you're not already big shout out to I forget how to do this shout out to Yuko with data she streams data science stuff too but we're doing image classification of these images so guys we we know we know what we have on hand here now let's just make a dum-dum classifier image quality is like 1997. yeah I mean these are small um but it's real life this is what you can expect to see in real life now here's that's a good point real real apparatios one of the issues you run into a lot when you're creating a machine learning model is that you train it on perfectly clean data and then expect it to perform on unseen data to the same level of accuracy or whatever your evaluation metric is as to what it does on the perfectly clean data but in most cases you want your training data to be of the similar quality as to what you expect it to predict in real life so let's just let's just imagine here that eventually you want this model to be predicting based on images or or on this like going down uh like a a product line or something and you want to pick out a machine that will automatically pick out the bad ones maybe this is the quality of the image it'll be because there'll be thousands of them going down I don't know there are things we know we know I have a solution train on shitty data and then just make really poor generalized models that's a solution too another solution is to up the quality of whatever your inference stuff is but you know it's a lot harder in practice than you think create a classifier um what should we do I want to do like a very basic Bare Bones image classifier [Music] foreign so let's do from sklearn model selection import train test split we're just going to do a train test split for easy sake is there a term to describe a distribution shift in machine learning usually sometimes it's referred to as like drift in real life if you're predicting on your training set and then like there's a there's a fundamental shift in the type of data that you're getting in and how it Associates with your label then you call it like a drift but I don't know if that's a technical term uh you're in the process of watching my k-fold validation video Rob yes then that means hat night you can create that's by the way one of my least popular videos but probably one of the most important videos um but you could create hat night a full k-fold cross-valid validated notebook and share it with everyone I'm trying to think of the easiest way to classify all this um I don't really want to go through the process of doing a full classifier foreign so there's one way you can just take the pixels you haven't been able to see many sources that do that in Python so my video helped I'm glad that helped at night maybe the average color of each picture everything is in R when it comes down to textbooks yeah do I recommend mend a bad classifier sure go for it we can try to do it just like in Keras use the mean image for each kernel type foreign is lit I agreed let's see if we can just like use this setup all right uh [Music] so I'm going to link to my I'm going to link to my video oh wait it's here foreign link to that here hopefully people will watch it [Music] um and we're going to do a train test split on this training data um hey the cawthorne Gotham welcome to the family glad to have you here how are you doing tonight um foreign let's do test size we're just doing a train set split here test size is 0.2 and then we need to take our train that index this is the ray we're going to feed into it I guess we could do like seed ID maybe that'd be more appropriate uh this is train IDs and this is valides then we could take our training data and we can locate and we can make this called a group which is uh this will be train and this will be Val we didn't stratify or anything oh yeah we also need to look at when train ID seed ID is equal is in these train IDs there we go all this nice and tidy now we have our train set split into train invalidation uh uh now I may need to turn the GPU on for this accelerator should be GPU let's turn it on I might have to reset our instance very bad I did that for mnist it performed very oh okay what are you guys saying average the RGB values for each pixel Group by each kernel then Define a distance function between the image and the classify yeah yeah that's like a I guess you could do K nearest neighbors too you could just try to like neighbors group I don't want to get too much into what the best solution is though I want you guys to find it I have a quick question you know a way to introduce punishment in ml example if the task is to predict a tomato saying it is an apple is kind of okay but saying it's an Apache helicopter needs to punish so the next training iteration won't predict that yeah so you can make custom loss functions [Music] um that's interesting I don't know if there's any papers on that specific example but um that would be cool to see you can you can definitely change your optimization function but it needs to be something that is um something that you can like apply gradients to and and minimize so I don't know if that's like a non-linear example that you gave where it wouldn't actually work that's cool how do you turn on the Dark theme so this is just like a plug-in for uh called Dark reader on Chrome browser would that be more related to reinforcement learning maybe maybe I don't know I haven't hey TP rototype welcome to the family glad you joined us tonight we're just uh training a classifier on our corn look at our beautiful corn let's save this version for now I'll make this public someone needs to go out there and make a really good notebook on this it's corn uh do you guys like how I use the corn color to make this logo I'm really proud of this logo I think it looks pretty um still two people on the leaderboard foreign trying to see how I can get the path of images all right so we're gonna generate a data set this looks very similar to uh Pi torch is this foreign so they actually split here and why would you make a validation split Within your validation data set that I don't know image size so uh we need to do some pre-processing because the images are not all the same size now we notice that here that you first your data loader is going to need to take this in and resize these images to the same size because it the model is going to need that in so data set from directory resize will this automatically size to resize okay so this automatically takes care of that it'll take it in and resize our data to the correct Dimensions so let's make this image size yeah 180 by 180 batch size 32 seems fine foreign because we haven't done any of this yet we have to rerun from the top like a like a new baby gotta s h you till those directories right yeah I need to see how this no such file or directory Okay so foreign data set from from list like I just want to give it a list of images and not the directory the problem is that um I guess I can sh you till like move the files into the given directory so let's um excuse me that's what I'm doing right now so it takes a label from the base directory you know what I think I can do is I can just do our train test split within this I think this validation split will take care of it so if I just make a training data set and granted this is for example just for an example and we take our baster and we take add the train will this work is it thinking no images found what are you talking about there's a bunch of images oh that's right this is the correct directory yes no Yeah so basically I see what you mean I need to move all these into folders with the right name foreign [Music] so I can make a dirt called train broken I don't know how much room they have in pure broken discolored and what's it silk cut and then I can move the data into each of those so for train and let's take our base stir and add this as a direct uh as a column so that we can just or we could have just done full path which is the baster yeah that's probably better we don't need to do it that way train dot image baster plus this um there we go so image path that's going to be the full path of the image and let's query where label equals uh or let's do train uh Group by label just like we did before so for label D in here and then let's do sh util copy should I do them one by one so we're going to take for each of these images we're gonna copy the image into a new folder called did we make these directories uh is there any video in your story of becoming Grandmaster no I've thought about making it do you would that be interesting to people to go through my story because I know Abhishek has a really good video that I've watched number of times about how he became a Grandmaster I thought it was really cool um so yeah that would you watch that real apotheos so this needs to be the train and then the uh I don't think I have permission to write to kaggle do you Pi torch or tensorflow or more Rob um I prefer pie torch but there are times when I like tensorflow as well I don't know it it kind of depends on the project I think pie torch is a little bit more flexible but I I don't know I thought I was going to do something quick here but like always it's never quick um do I do you need Advanced degrees to be good at machine learning no five shot you don't need it foreign is it a bad thing to get an advanced degree that's another question I don't necessarily I'm not as anti degree getting as some people are but all right so we need this label so we're gonna do sh util move copy move copy from this source which is going to be our D2 image path Into A New Path which is going to be train plus this label directory yeah okay so now it moved this into the broken directory and if we LS this train broken zero zero or or LS this directory we will see this zero zero zero is now in there so this should work yeah let's just let that run what are we doing here we are taking we're iterating through each of these and we're putting images into the folder name of their labels I don't think you need a degree to get good but unfortunately I believe in today's Industries you might need one for your resume to even be read by someone yeah unless you figure out a way to like making it make a name for yourself some other way like let's say just in theory or like as a thought experiment maybe that's not even a good way to say it one way to think about it though is if you had no degree and had no desire to get a degree and all that stuff and you just started grinding on kaggle and became number one world ranked on kaggle competitions and you just won everything and you blew everyone out of the water and you're just like knew all the best techniques and you were awesome at it I think you would be able to get your way into a job networking works but as Rob said you need to somehow impress people with something tangible yeah I I think that um I think that it it depends like what what is your ultimate goal are you just trying to get a job because you think data science is cool or do you actually enjoy this stuff because otherwise just find a different get a different job um but if it's something you're really passionate about and you feel like the roadblock for you is that you don't have money for a degree or time well time is probably more valuable than anything but if you don't have the ability to get a degree or you don't have a degree yet then you can still learn and become good at it and then everything else will just follow and if you decide you want to get a degree you can go do that um are you allowed to use pre-trained weights in this competition you are allowed to use pre-trained imagenet weights is what I wrote in the rules because starting from scratch can be tough because it's the the amount of uh uh training that you need to get something from from no weights is kind of tough so I I said imagenate is fun it's a very small data set yeah so if you look at here in and let's see if anyone else has joined nope rules imagenet pre-trained models Allowed no other pre-trained models you can convince me other pre-trained models are are necessary and I'd probably say it's okay non-player Craig is that create Craig Craig the Craig that I know search dreaming crew to make your title row to 270k until you're making at least 200k for your stream for eight hours a day like a job watch the money roll in I don't know my streaming stuff has given you know how much money I've made on my streaming probably negative money um like these competitions that we host and even mailing out the GPU costs money and I'm not doing the streaming for money so you need more time and a better schedule all right so now we have this new directory called just called train like in our base directory uh found these many files belonging to four classes oh geez what what just happened here what is this successful number node read from CIS negative value but there must be at least one number node so we're turning the number node zero let's this is from 2020. they're saying this worked it's a non-fatal warning so basically we can ignore this but this guy gave just like a line of code that I could also run and at 71 upvotes uh and there's a read-only file system so I can't run this all right TR let's let's plot these images using this cleanup called clean up on aisle three I don't know what this what this is all about isn't a data science great degree a bit of overkill for basic analyst position unless you plan on getting into data scientist roles later on um yeah that's true I mean I definitely think it's cool if you're gonna go get your degree if you're already gonna go get undergrad hey all vimpinero thank you for joining the chat and for joining our family you're part of our family now and you can tell me and chat how you found the family uh
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Rob Mulla · Rob Mulla · 36 of 60

1 A Gentle Introduction to Pandas Data Analysis (on Kaggle)
A Gentle Introduction to Pandas Data Analysis (on Kaggle)
Rob Mulla
2 Exploratory Data Analysis with Pandas Python
Exploratory Data Analysis with Pandas Python
Rob Mulla
3 7 Python Data Visualization Libraries in 15 minutes
7 Python Data Visualization Libraries in 15 minutes
Rob Mulla
4 Kaggle competition starter notebook walkthrough
Kaggle competition starter notebook walkthrough
Rob Mulla
5 Kaggle Competitions: A Beginner's Guide to Winning
Kaggle Competitions: A Beginner's Guide to Winning
Rob Mulla
6 Jupyter Notebook Complete Beginner Guide - From Jupyter to Jupyterlab, Google Colab and Kaggle!
Jupyter Notebook Complete Beginner Guide - From Jupyter to Jupyterlab, Google Colab and Kaggle!
Rob Mulla
7 Audio Data Processing in Python
Audio Data Processing in Python
Rob Mulla
8 Complete Data Science Project!
Complete Data Science Project!
Rob Mulla
9 Make Your Pandas Code Lightning Fast
Make Your Pandas Code Lightning Fast
Rob Mulla
10 Image Processing with OpenCV and Python
Image Processing with OpenCV and Python
Rob Mulla
11 Speed Up Your Pandas Dataframes
Speed Up Your Pandas Dataframes
Rob Mulla
12 This INCREDIBLE trick will speed up your data processes.
This INCREDIBLE trick will speed up your data processes.
Rob Mulla
13 Complete Guide to Cross Validation
Complete Guide to Cross Validation
Rob Mulla
14 Easy Python Progress Bars with tqdm
Easy Python Progress Bars with tqdm
Rob Mulla
15 Economic Data Analysis Project with Python Pandas - Data scraping, cleaning and exploration!
Economic Data Analysis Project with Python Pandas - Data scraping, cleaning and exploration!
Rob Mulla
16 Python Sentiment Analysis Project with NLTK and 🤗 Transformers. Classify Amazon Reviews!!
Python Sentiment Analysis Project with NLTK and 🤗 Transformers. Classify Amazon Reviews!!
Rob Mulla
17 Get Started with Machine Learning and AI in 2023
Get Started with Machine Learning and AI in 2023
Rob Mulla
18 The Trick to Get Unlimited Datasets
The Trick to Get Unlimited Datasets
Rob Mulla
19 Video Data Processing with Python and OpenCV
Video Data Processing with Python and OpenCV
Rob Mulla
20 Object Detection in 10 minutes with YOLOv5 & Python!
Object Detection in 10 minutes with YOLOv5 & Python!
Rob Mulla
21 Pandas for Data Science #shorts
Pandas for Data Science #shorts
Rob Mulla
22 Object Detection in 60 Seconds using Python and YOLOv5 #shorts
Object Detection in 60 Seconds using Python and YOLOv5 #shorts
Rob Mulla
23 Machine Learning for Facial Recognition in Python in 60 Seconds #shorts
Machine Learning for Facial Recognition in Python in 60 Seconds #shorts
Rob Mulla
24 Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption
Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption
Rob Mulla
25 Detect Text in Images with Python - pytesseract vs. easyocr vs keras_ocr
Detect Text in Images with Python - pytesseract vs. easyocr vs keras_ocr
Rob Mulla
26 Solving an Impossible Riddle with Code
Solving an Impossible Riddle with Code
Rob Mulla
27 Do these Pandas Alternatives actually work?
Do these Pandas Alternatives actually work?
Rob Mulla
28 Time Series Forecasting with XGBoost - Advanced Methods
Time Series Forecasting with XGBoost - Advanced Methods
Rob Mulla
29 Data Science Uncut - Data Shootout Kaggle Competition (Aug 1 2022 Stream)
Data Science Uncut - Data Shootout Kaggle Competition (Aug 1 2022 Stream)
Rob Mulla
30 Kaggle Dataset Creation from Scratch- Data Science Uncut (Aug 10 2022)
Kaggle Dataset Creation from Scratch- Data Science Uncut (Aug 10 2022)
Rob Mulla
31 Chess Board Computer Vision AI - Data Science Uncut (Sep 7, 2022)
Chess Board Computer Vision AI - Data Science Uncut (Sep 7, 2022)
Rob Mulla
32 25 Nooby Pandas Coding Mistakes You Should NEVER make.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
Rob Mulla
33 DEFCON Hacking AI CTF Solution on Kaggle - Data Science Uncut Sep 11, 2022
DEFCON Hacking AI CTF Solution on Kaggle - Data Science Uncut Sep 11, 2022
Rob Mulla
34 More Chessboard Computer Vision AI - Data Science Uncut - Sep 13
More Chessboard Computer Vision AI - Data Science Uncut - Sep 13
Rob Mulla
35 Medallion Data Science Live Stream
Medallion Data Science Live Stream
Rob Mulla
Community Kaggle Competition Overview - Corn Classification (
Community Kaggle Competition Overview - Corn Classification (
Rob Mulla
37 Deep Learning Image Classification - Corn Kernels - Data Science Uncut
Deep Learning Image Classification - Corn Kernels - Data Science Uncut
Rob Mulla
38 OpenAI Whisper Demo: Convert Speech to Text in Python
OpenAI Whisper Demo: Convert Speech to Text in Python
Rob Mulla
39 Yolov7 Custom Object Detection in Python Tutorial  - Chess Piece Detection
Yolov7 Custom Object Detection in Python Tutorial - Chess Piece Detection
Rob Mulla
40 Live Kaggle Coding - Enzyme Stability Prediction - Data Science Uncut Sep, 27 2022
Live Kaggle Coding - Enzyme Stability Prediction - Data Science Uncut Sep, 27 2022
Rob Mulla
41 Finding Chess Cheaters with Python! - Data Science Uncut Livestream
Finding Chess Cheaters with Python! - Data Science Uncut Livestream
Rob Mulla
42 Data Science Uncut - Kaggle Community Competition & Chess Data Analysis - Oct 4, 2022
Data Science Uncut - Kaggle Community Competition & Chess Data Analysis - Oct 4, 2022
Rob Mulla
43 Flight Delay Dataset Creation (Data Science Uncut)
Flight Delay Dataset Creation (Data Science Uncut)
Rob Mulla
44 5 Reasons to Kaggle #shorts
5 Reasons to Kaggle #shorts
Rob Mulla
45 ♟️ Data Science - Chess Data Analysis
♟️ Data Science - Chess Data Analysis
Rob Mulla
46 EXTREME PYTHON & DATA SCIENCE LIVE STREAM
EXTREME PYTHON & DATA SCIENCE LIVE STREAM
Rob Mulla
47 What is Clustering in ML?
What is Clustering in ML?
Rob Mulla
48 What is K-Nearest Neighbors?
What is K-Nearest Neighbors?
Rob Mulla
49 LIVE CODING: Flight Data Exploration with Pandas & Python
LIVE CODING: Flight Data Exploration with Pandas & Python
Rob Mulla
50 Kaggle Survey vs. Twitter Sentiment
Kaggle Survey vs. Twitter Sentiment
Rob Mulla
51 If Top Chess.com Players were STOCKS - Live Coding Data Anaylsis Stream
If Top Chess.com Players were STOCKS - Live Coding Data Anaylsis Stream
Rob Mulla
52 Data Visualization BATTLE!
Data Visualization BATTLE!
Rob Mulla
53 LIVE CODING: Stocks & Sentiment Analysis
LIVE CODING: Stocks & Sentiment Analysis
Rob Mulla
54 Progress Bar in Python with TQDM
Progress Bar in Python with TQDM
Rob Mulla
55 Flight Cancellation Data Analysis
Flight Cancellation Data Analysis
Rob Mulla
56 Synthetic Dataset Creation for Machine Learning - Blender and Python
Synthetic Dataset Creation for Machine Learning - Blender and Python
Rob Mulla
57 The Ultimate Coding Setup for Data Science
The Ultimate Coding Setup for Data Science
Rob Mulla
58 Dataset Creation SPEED RUN - Live Coding With Python & Pandas
Dataset Creation SPEED RUN - Live Coding With Python & Pandas
Rob Mulla
59 Data Wrangling with Python and Pandas LIVE
Data Wrangling with Python and Pandas LIVE
Rob Mulla
60 Forecasting with the FB Prophet Model
Forecasting with the FB Prophet Model
Rob Mulla

The video teaches how to participate in the Community Kaggle Competition Overview - Corn Classification, where participants classify images of corn into four categories, and provides an overview of the tools and techniques used in the competition.

Key Takeaways
  1. Launch the competition
  2. Read in training data set and test data set
  3. Do value counts on labels
  4. Create a horizontal bar plot to visualize the labels
  5. Use sample submission as a reference for the model's output
  6. Move images into folders with the right name based on their labels
  7. Query the database to get the full path of the image and its label
💡 The competition allows participants to use pre-trained models and provides a platform for practicing image classification techniques, with a focus on corn classification.

Related AI Lessons

Google makes Gemini’s personalized image generation free for all US users
Google's Gemini personalized image generation is now free for all US users, allowing them to generate images informed by their Google data
The Next Web AI
Gemini’s personalized AI image generation is now free for U.S. users
Gemini's AI image generation is now free for U.S. users, allowing for personalized images based on user interests and data
TechCrunch AI
WebP's Compression Secret: How a 1MB PNG Becomes a 200KB WebP
Learn how WebP compresses images more efficiently than PNG and JPEG, and why it matters for web development
Dev.to · swift king
Beyond TinyPNG: Fast, Private, and Zero-Server Image Conversion
Learn how to achieve fast, private, and zero-server image conversion beyond TinyPNG, and why it matters for developers and designers
Dev.to · Yao Xiao
Up next
OpenAI Kills Sora then Descends into Chaos
ColdFusion
Watch →