Kaggle Competitions: A Beginner's Guide to Winning
In this video kaggle grandmaster Rob Mulla goes over his tips for first time kagglers joining a kaggle competition. A must to watch before starting. Learn the basics of how to get started on your first kaggle competition. Kaggle is a great way to learn data science and machine learning.
Timeline:
00:00 Intro
01:00 Selecting a Competition
03:51 The Competition Overview
08:27 The Data Section
10:14 The Discussion Section
12:21 The Code Section
14:20 The Leaderboard
16:04 Rules, Team and Submission
18:25 Outro
Join the competition: https://www.kaggle.com/c/kaggle-pog-series-s01e01
Follow me on twitch for live coding streams: https://www.twitch.tv/medallionstallion_
Intro to Pandas video: https://www.youtube.com/watch?v=_Eb0utIRdkw
Exploritory Data Analysis Video: https://www.youtube.com/watch?v=xi0vhXFPegw
* Youtube: https://youtube.com/@robmulla?sub_confirmation=1
* Discord: https://discord.gg/HZszek7DQc
* Twitch: https://www.twitch.tv/medallionstallion_
* Twitter: https://twitter.com/Rob_Mulla
* Kaggle: https://www.kaggle.com/robikscube
#Kaggle #DataScience #MachineLearning
What You'll Learn
Kaggle competitions and machine learning fundamentals are demonstrated using the Kaggle platform, with a focus on getting started and participating in competitions.
Full Transcript
hey youtube my name is rob i'm a data scientist and i make videos about machine learning and how to get started in data science if you've worked in data science before or you're getting into data science you've probably already heard about the website kaggle so kaggle is an online community of data scientists it's a place where you can go to find code examples also host data sets and there are a bunch of forums where you can talk about different machine learning topics but kaggle started out and still is a competitive platform where people compete in competitions to predict things and i know that for someone new to kaggle it can seem very overwhelming to get involved in your first competition but i think it's a great way to learn so in this video i'm just going to take you through some basic ideas of tips that i would give to someone looking for their first kaggle competition and hopefully by the end of it you'll feel comfortable enough to jump into one yourself all right let's get started so of course the very first thing you're going to need to do to join a kaggle competition is make an account and if you don't already have an account and you go to a kaggle website it'll look something like this and all you'll have to do is click on register make account link it with your email or your google account and then you'll be in let's assume you have a kaggle account you'll have a home page sort of like this and if you look here on the left side there's a bunch of different things you can choose from competition data sets code discussions and courses we're going to look at competition since that's what this video is about and there are a lot of things here so if you're not already signed up for a competition they won't necessarily show up as active but we're going to click on all competitions anyway to see just everything that's out there now a few tips for picking the competition you want to be involved in a few things i would keep in mind first you don't want to join a competition that's ending soon or has already ended the real benefit of joining a kaico competition is the fact that you get a work in a community ask questions look at notebooks as they come out when people make them public and you really don't get that sort of benefit if you join a competition that's already completed so after we click all competitions we're going to actually filter by recently launched or we could go by closing zoom but let's go by recently launched and under each competition here let's close this left side you can see that there is a type of competition a number of teams that are involved in it and how long there is to go so right now there are a few competitions with days to go months to go but we would wouldn't really want to join one that's just ending in a few days because usually kaggle competitions are a month to three or four months long and you want to really get the whole benefit by joining early so let's pick as an example this three months ago one now another thing i want to mention before we get too far is there are also things called community competitions that just were launched this year and with community competitions they're not necessarily going to show up on this page um but i'll show you one as a as an example these are just put on by anyone who wants to launch one and i've actually launched my own kaggle competition the pog youtube video series we're trying to predict video likes on youtube videos given some data there and you can find the link in the description to this video but what we're going to go through here is one examples of an active competition so let's go back to this market prediction competition and i'll take you through some of the steps i would take almost like a chess checklist of items i would want to go through before i joined the competition now one thing you definitely will want to do is read everything on this overview tab the overview tab usually has a description here that talks a little bit about the competition where the data is comes from what you're trying to predict and how it would be a benefit to have a predictive model that can predict whatever the target is so you definitely want to read the full description get a good idea of what's going on first at that point you might decide this competition isn't for me maybe i want to work with images or i want to work with text or i want to predict stuff like the stock market then you would choose to move on to a different competition but if it looked like something that you're interested in then you're going to want to keep on going through the overview and i can't stress enough reading and re-reading the valuation page is very important so every competition you're trying to optimize your predictive model to some sort of target and the way that you're being judged on that is shown in the evaluation tab if you're not optimizing your model for this evaluation metric then you're doing the wrong thing so this competition is using the mean of the pearson correlation coefficient which is pretty standard but you might have a competition that has a more complicated way of calculating their evaluation and you want to make sure you understand that fully before you move forward and try to create your model another thing to keep in mind is there's a timeline for every competition so usually in most competitions it'll launch there'll be a period where anyone can join and they can submit predictions and then usually about a week before the competition is set to end that they close things off so you can't join teams with anyone else anymore and in the final week to actually try to make it so you don't publish any public information about any notebooks on the website about that competition and you kind of hunker down for that last week on the bottom here oh and then at that when the competition ends for many competitions right then they'll release the the final scores and the rankings and a few days after that they'll check to make sure remove anyone who has been identified as cheating and remove them from the website from the leaderboard and it'll be finalized but you'll see down here on this competition there's a little timeline tells you how long ago the competition started how long we have until this rules acceptance and a team merging deadline and then the close of the competition now this one's a little bit unique because they're actually using the predictive models on future data so even after this mod this competition ends there'll be a an additional time window where they're going to be scoring all the models on future data so this is this is interesting but not always uh typically competitions go this way then the prize page which is nice to know what the prizes are but that shouldn't be your main goal if you're starting a competition for your first time and then understanding that there are different types of cargo competitions for predicting the most common ones nowadays are code competitions and what that means is in a code competition you actually have to submit your predictive model in a notebook and that notebook will run on the test set that is hidden from your model and you don't actually get a csv or the data files for the test set that you create predictions for but your code will create the predictions for it when it runs in submission another type of kaggle competition you'll see is where they actually provide you all the feature data for the test set and you create your predictions as just a flat file that you can upload or push to the website but code competitions are much more common these days for new new ones that have launched um looks like they also have some contact information all right so once you understand everything on this page you can move over to looking at the data uh way cop competitions run is there's always a public uh training set for most competitions there's a training data set and what this contains is a bunch of features about your data this could be images it could be a csv file it could be audio files but they're giving you the training data and they'll also provide you the target value that your model wants to predict so let's say it's a bunch of images and they're asking you to predict what the images is of the training data set will have both the feature data and the targets provided the test data you don't know the target for so your model it needs to predict for that in this fi data description tab they will go over all the files provided the tra in this one there's a train csv file that they provide uh with some unique identifiers like the row id the investment id and then these are a bunch of anonymized features that will be provided in the data set you can also see that they they'll go into some detail about how the data was collected understanding this very thoroughly is going to be important for you to perform well in the competition you can also go down scroll down here to the very bottom of the page and actually see the data set by clicking on the csv files this can be kind of helpful if you just want to see quickly what the distribution of certain features are before you join it you can also see the size of the data set itself this one is about 18 and a half gigabytes okay so i'd actually suggest skipping the code tab at first and next checking out the discussion section so usually in the discussion there are some pinned discussions at the very top uh competition question and answer or usually the competition hosts will post a message just saying hi welcome let us know if you have any questions this is a great thread to always read thoroughly because especially right after the competition launches there'll be a lot of questions or uncertainties about certain things in the data or what's going on and this thread will usually have a lot of your questions already asked or you can post your questions in here and get a response from the competition host so always check out these q a's before posting any new discussion threads yourself because the question could already be answered there there is also usually a looking for a team thread where if you want to team up with other caglers to compete in this competition you can do it here you don't want to now here's a very important thing you don't want to discuss details about the competition with anyone outside of this discussion forum or the code form unless you are on a team with that person it's against the rules to most competitions to discuss privately and you don't want to get kicked out of the competition so if you do have a question post it on the forums people are usually really friendly and helpful with answering questions i would also recommend once you accept the terms of the competition to follow it what that does is it'll email you anytime there's a new thread and then you can follow threads and you'll get emailed every time there's a new discussion topic you'd be surprised how many times really important tricks to doing well on the competition will be actually discussed in the forms and if you just read the forms thoroughly you pick up on some things that might not be obvious otherwise now let's look at the code tab so the code tab will provide a bunch of notebooks that competitors in this competition have made you can sort these by most votes most comments oh look my notebook is one of the top ones for this competition but you will see that there are a lot of eda which is exploratory data analysis of the data you'll probably want to do that yourself but it's always good to sort of look through and see some examples that work now you can also sort by best score this is the best scores on the public tests data set not necessarily the best scores that will be revealed on the private but it's it is helpful sometimes to go here sort by best score and just to see what type of code is being used to submit to the leaderboard and get a somewhat decent score i would recommend going through and reading trying to understand what's going on this person's has made a deep neural network that they've trained on this data and then you also will get a better idea of some nuances about how to submit to that specific competition as i mentioned before there are code competitions and then there are competitions where you can just submit the prediction test file and if it's a code competition this will provide you all the steps needed in your code to submit successfully it can be one of the frustrating parts when you submit to a competition and you do something wrong and the code runs for hours but then fails so at least knowing that you can write the code to run end to end and submit and get on the leaderboard these can be very very helpful now let's talk about the leaderboard so as i mentioned before that the competition as it's going on for the however many weeks months it's going on for there will be a public leaderboard this is updating live and showing who is submitted to the leaderboard and how they place on the public test set now it's important to understand the public leader board is not the final rankings that determines the winner of the competition it's just a guide to show you how well people are doing on the public test set once the competition ends this private leaderboard tab will populate and show you all of the private leaderboard ranks now in this competition it's a little different because it'll be rerun in the future but on most other competitions as soon as the deadline that these three months to go end the test private leader board will be revealed and you'll see who the winners are and part of the nuance of doing well on a kaggle competition is learning how to create a predictive model that is general enough that it can predict well both on the public leader board and on the private test leaderboard so you can also see one thing i'll note about this is when you scroll through the leaderboard if someone used a notebook to submit to the lead public leaderboard you can see that linked here on leaderboard itself so we click on this and we can see this may have been the notebook we were looking at before so just another thing to keep in mind now i did mention the rules you'll probably want to read all these rules through thoroughly but the main ones are that you can't share about the data you can't share your approaches or your solution with other people outside the discussion forums unless you've joined a team and there usually are some rules about what you can and can't do with the data set or publish about the data set and that leaves us with this teams tab so the teams tab will let you invite other people to join your team you can do that here by requesting to merge you can also rename your team name some people like to make funny team names so it looks fun on the leaderboard and and yeah that's that you can also see if you have any pending merge requests or requests to merge with other people on your submission tabs you'll see all the times you've submitted and the public leaderboard score after the competition is over you'll see the private leaderboard score and on the submission submit uh predictions this button if it was a competition where you could just upload a file of your predictions you'd be able to do it here otherwise you can select the notebook of code that you're going to use to submit to leaderboard um so let's just wrap up by going here to this competition that i'm hosting it's predicting a youtube video likes and what we're given is a bunch of data about the youtube videos the different information like how long the video is the description the category of the video the title and even we're providing in this competition the thumbnails for every youtube video and you can see in the leaderboard here people are doing a pretty good job of detecting the like this is the target of this is actually the like to view ratio that they're trying to predict and this is a competition you could join right now so i have another video where i go through a code notebook where i discuss my approaches to a baseline model that you can watch and i hope you decide to join this we'll have a series of these competitions where people can join in the future as well thanks so much for sticking around this long i hope that you learned something new about how to join a kaggle competition and you decide to join one please like and subscribe if you enjoyed and i will see you in the next video
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Rob Mulla · Rob Mulla · 5 of 60
1
2
3
4
▶
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
A Gentle Introduction to Pandas Data Analysis (on Kaggle)
Rob Mulla
Exploratory Data Analysis with Pandas Python
Rob Mulla
7 Python Data Visualization Libraries in 15 minutes
Rob Mulla
Kaggle competition starter notebook walkthrough
Rob Mulla
Kaggle Competitions: A Beginner's Guide to Winning
Rob Mulla
Jupyter Notebook Complete Beginner Guide - From Jupyter to Jupyterlab, Google Colab and Kaggle!
Rob Mulla
Audio Data Processing in Python
Rob Mulla
Complete Data Science Project!
Rob Mulla
Make Your Pandas Code Lightning Fast
Rob Mulla
Image Processing with OpenCV and Python
Rob Mulla
Speed Up Your Pandas Dataframes
Rob Mulla
This INCREDIBLE trick will speed up your data processes.
Rob Mulla
Complete Guide to Cross Validation
Rob Mulla
Easy Python Progress Bars with tqdm
Rob Mulla
Economic Data Analysis Project with Python Pandas - Data scraping, cleaning and exploration!
Rob Mulla
Python Sentiment Analysis Project with NLTK and 🤗 Transformers. Classify Amazon Reviews!!
Rob Mulla
Get Started with Machine Learning and AI in 2023
Rob Mulla
The Trick to Get Unlimited Datasets
Rob Mulla
Video Data Processing with Python and OpenCV
Rob Mulla
Object Detection in 10 minutes with YOLOv5 & Python!
Rob Mulla
Pandas for Data Science #shorts
Rob Mulla
Object Detection in 60 Seconds using Python and YOLOv5 #shorts
Rob Mulla
Machine Learning for Facial Recognition in Python in 60 Seconds #shorts
Rob Mulla
Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption
Rob Mulla
Detect Text in Images with Python - pytesseract vs. easyocr vs keras_ocr
Rob Mulla
Solving an Impossible Riddle with Code
Rob Mulla
Do these Pandas Alternatives actually work?
Rob Mulla
Time Series Forecasting with XGBoost - Advanced Methods
Rob Mulla
Data Science Uncut - Data Shootout Kaggle Competition (Aug 1 2022 Stream)
Rob Mulla
Kaggle Dataset Creation from Scratch- Data Science Uncut (Aug 10 2022)
Rob Mulla
Chess Board Computer Vision AI - Data Science Uncut (Sep 7, 2022)
Rob Mulla
25 Nooby Pandas Coding Mistakes You Should NEVER make.
Rob Mulla
DEFCON Hacking AI CTF Solution on Kaggle - Data Science Uncut Sep 11, 2022
Rob Mulla
More Chessboard Computer Vision AI - Data Science Uncut - Sep 13
Rob Mulla
Medallion Data Science Live Stream
Rob Mulla
Community Kaggle Competition Overview - Corn Classification (
Rob Mulla
Deep Learning Image Classification - Corn Kernels - Data Science Uncut
Rob Mulla
OpenAI Whisper Demo: Convert Speech to Text in Python
Rob Mulla
Yolov7 Custom Object Detection in Python Tutorial - Chess Piece Detection
Rob Mulla
Live Kaggle Coding - Enzyme Stability Prediction - Data Science Uncut Sep, 27 2022
Rob Mulla
Finding Chess Cheaters with Python! - Data Science Uncut Livestream
Rob Mulla
Data Science Uncut - Kaggle Community Competition & Chess Data Analysis - Oct 4, 2022
Rob Mulla
Flight Delay Dataset Creation (Data Science Uncut)
Rob Mulla
5 Reasons to Kaggle #shorts
Rob Mulla
♟️ Data Science - Chess Data Analysis
Rob Mulla
EXTREME PYTHON & DATA SCIENCE LIVE STREAM
Rob Mulla
What is Clustering in ML?
Rob Mulla
What is K-Nearest Neighbors?
Rob Mulla
LIVE CODING: Flight Data Exploration with Pandas & Python
Rob Mulla
Kaggle Survey vs. Twitter Sentiment
Rob Mulla
If Top Chess.com Players were STOCKS - Live Coding Data Anaylsis Stream
Rob Mulla
Data Visualization BATTLE!
Rob Mulla
LIVE CODING: Stocks & Sentiment Analysis
Rob Mulla
Progress Bar in Python with TQDM
Rob Mulla
Flight Cancellation Data Analysis
Rob Mulla
Synthetic Dataset Creation for Machine Learning - Blender and Python
Rob Mulla
The Ultimate Coding Setup for Data Science
Rob Mulla
Dataset Creation SPEED RUN - Live Coding With Python & Pandas
Rob Mulla
Data Wrangling with Python and Pandas LIVE
Rob Mulla
Forecasting with the FB Prophet Model
Rob Mulla
More on: ML Pipelines
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Almost Quit Java After My First Project (Then One Bug Changed Everything)
Medium · Python
FastAPI for Production AI: From Notebook to Scalable APIs
Dev.to AI
Is BMAML correct decision, and how can one implement it?
Reddit r/deeplearning
Easiest Way to Understand Machine Learning Concepts
Medium · Machine Learning
Chapters (9)
Intro
1:00
Selecting a Competition
3:51
The Competition Overview
8:27
The Data Section
10:14
The Discussion Section
12:21
The Code Section
14:20
The Leaderboard
16:04
Rules, Team and Submission
18:25
Outro
🎓
Tutor Explanation
DeepCamp AI