Introduction To Titanic Kaggle Competition | Part 1

Data Science Dojo · Beginner ·🧠 Large Language Models ·9y ago

Key Takeaways

The video introduces the Titanic Kaggle competition, explaining how to create a Kaggle account, submit a model, and participate in the competition using the Titanic dataset, with tools such as Kaggle, Python, and CSV.

Full Transcript

hello all right my name is uh fuk dang I'm the senior data engineer of data science dojo and I'm here to walk you through day two's homework I hope you've been enjoying uh so far of the boot camp okay so quickly just go to portal. science dojo.com all the homework is laid out there for you um if you prefer for a video this is a video and walking you through the homework but just so you know the homework is elaborated in on both sections so there are two parts of the homework the first part of the homework is to apply what you learned today so basically take uh the Titanic data set and apply a predictive model to it so go ahead and use our part or if you've gotten to random Force go ahead and use a random force model okay that's the first part of the homework the second part of the homework is to actually enter into a kagle competition okay so both the homeworks are elaborated here um I'm just going to go talk and show you how to do the kago competition real quick okay so this really is your data science Capstone project for this course so by the end of Friday basically you'll be working from Tuesday all the way to Friday to perfect your model and then you'll be ranked among your peers your peers being basically everyone at the boot camp and then there are prizes on the line so I'll talk more about the prizes later but for now you'll this whole page talks you through how to create a kagle account how to submit and how to do all that good stuff now I will talk you through that also here as well okay so um what you want to do is you want to Google kaggle Titanic okay so notice that we've actually entered you into a kaggle competition since day one so the Titanic data set actually comes from this kaggle competition and what is kaggle well kaggle is a crowdsourced way of doing data science so real companies like Home Depot Liberty Mutual All State Netflix they come together and post real data sets and from these real data sets there is a data mining problem and you're ranked among your peers as you do these data mining problems on what are called leaderboards okay so the Titanic competition is basically the introductory uh kagle competition homework uh that that we'll do uh together okay and then if you notice there's a if you go to data okay in this thing there are a bunch of data sets that are associated with this uh kagle competition so if you notice here there is a train. tsv and let me tell you what that is real quick so you notice that throughout this cackle competition you've been given this data set with 189 or 191 rows right this is this is the training set okay this is the set that you've been working with although some of you should have been kind of suspicious if you've been paying attention to history the Titanic boat actually housed about 2,000 people yet we only have 891 passengers I wonder where the rest of the other passengers went well it turns out kagle actually is withholding the other passengers in this test set okay so your homework is actually to basically build a predictive model your Capstone uh is to build a predictive model on this training set and to apply it to this test set so I'm going to go ahead and download this test set we can see what is inside of it okay so if you open up this test set you will notice that the passenger ID starts at 892 okay so these are the remaining passengers that were on the Titanic but you also notice that we have one less column you notice that survived is now missing okay that is your job okay you're supposed to predict whether or not these people will survive or die okay so notice that that is all the kagle competition is it's kago wants your answers they want to know whether or not individual passengers lived or died for example passenger 897 did they live or die based upon these demographical conditions that are that are going to be read in by your Ive model okay so I'm going to show you real quick how to submit to kagle um for the purposes of just uh uh introduction so uh for tonight's homework you don't need to hook up a predictive model and submit to kagle you just need to just submit okay and I'm going to show you how to submit so kago wants two things from you it wants passenger ID right and it wants basically survive did the person that is corresponding with that passenger live or die so kago just wants two columns from you so the fact that these columns are here ah irrelevant so we're going to delete it so kago wants uh a column called passenger ID and notice that the I is capitalized and the p is capitalize and it's one word okay and it also wants a column called survived notice that it's past tense and there's a capital S kago will check for that and we're going to build a very simple model a model where everyone dies okay so you notice that if everyone dies uh then that this is going to be a very uh basically it's not even a predictive model we're just going we're just going to say if you step on this boat you will die but notice that if you remember from day one when we did data exploration when we looked at the class distribution of survived versus dead we we noticed that there was about a 62 is% chance of death just by stepping on the boat so actually by saying everyone died we have a statistical likelihood of saying of be doing better than a coin flip right doing better than 50% so I'm going to go ahead and say everyone dies here and I'm going to save that as a CSV so I'm going to go and save this as a my own model so everyone dies. CSV and I'm going to save that all right and what you need to do is you need to go to kago and upload this file so go ahead and make a submission there's a make submission button here so click on make submission and then go ahead and we'll upload a submission in here so everyone dies. CSV and we'll go ahead and submit that all right so it just so happens notice that we notice that we don't even give kaggle uh our predictive model we just give kagle the answers that makes it so we can build a predictive model in Python asure it doesn't matter it is now class agnostic they only care about your answers and notice that we are just submitting predictions to kagle and kagle is actually going to score this and kagle is actually going to be able to give you an accuracy out of this that's because they actually hold the true labels kagle actually knows whether or not the person lived or died and if you remember from evaluation if you compare predicted versus actual you'll get a confusion Matrix right so you submit predictions kaggle has the actual from that kaggle BS to confusion Matrix from the confusion Matrix you get accuracy and notice that it spits me out and accuracy and it says my submission got a 62% accuracy and I rank 5,517 in the world okay all right so the Capstone here is basically we're going to enter all of you guys into a kagle competition within the class okay and to enter yourself into this kagle competition save the name that appears on the kagle leaderboard so notice that I'm fuk Hong so I'll save my username as fuk H and then I'll go ahead and go back to that kagle submission homework and I'll paste it into this form down here so this form down here will actually uh go ahead and enter your KAG username into our internal leaderboard okay and on Friday after lunch uh we're going to end the kagle competition wherein the first prize the first place winner basically the first the person that ranks highest will get a prize right the prize will be an advanced statistical uh r book okay and it's a very good book if you want to do some of these these extra Advanced Data Mining processes in R that's in there and notice that we only can teach you so much that book actually contains a lot of the other stuff that we couldn't teach you for example there is actually more than one way to cross validate right we taught you just kfold cross validation but there's also leave one out cross fold validation right so there's four other ways to cross validate that we were not able to cover in class and that book covers that and then the second and third place winner will get an oal book called doing data science uh I also really enjoy that book I would I was raised by o Al and hopefully you will be as well okay and more importantly yes I know you can buy these books I know you can go ahead and just you know kind of pass this off but this is really important you want to do this kago competition and be able to ask the instructor questions while you're still in class right cuz there's actually lot of minute little steps to go along the way here uh that might that might basically you when you go back to work and you try to work on your own Calo competition or your own data sets Okay but more importantly your honor is on the line you have to defend your honor and basically and you will get big bragging rights from all this okay all right now happy modeling

Original Description

In this talk, we will explain what Kaggle is and show you how to create a Kaggle account and submit your model to the Kaggle competition. Titanic data set: https://www.kaggle.com/c/titanic -- At Data Science Dojo, we believe data science is for everyone. Our data science trainings have been attended by more than 10,000 employees from over 2,500 companies globally, including many leaders in tech like Microsoft, Google, and Facebook. For more information please visit: https://hubs.la/Q01Z-13k0 💼 Learn to build LLM-powered apps in just 40 hours with our Large Language Models bootcamp: https://hubs.la/Q01ZZGL-0 💼 Get started in the world of data with our top-rated data science bootcamp: https://hubs.la/Q01ZZDpt0 💼 Master Python for data science, analytics, machine learning, and data engineering: https://hubs.la/Q01ZZD-s0 💼 Explore, analyze, and visualize your data with Power BI desktop: https://hubs.la/Q01ZZF8B0 -- Unleash your data science potential for FREE! Dive into our tutorials, events & courses today! 📚 Learn the essentials of data science and analytics with our data science tutorials: https://hubs.la/Q01ZZJJK0 📚 Stay ahead of the curve with the latest data science content, subscribe to our newsletter now: https://hubs.la/Q01ZZBy10 📚 Connect with other data scientists and AI professionals at our community events: https://hubs.la/Q01ZZLd80 📚 Checkout our free data science courses: https://hubs.la/Q01ZZMcm0 📚 Get your daily dose of data science with our trending blogs: https://hubs.la/Q01ZZMWl0 -- 📱 Social media links Connect with us: https://www.linkedin.com/company/data-science-dojo Follow us: https://twitter.com/DataScienceDojo Keep up with us: https://www.instagram.com/data_science_dojo/ Like us: https://www.facebook.com/datasciencedojo Find us: https://www.threads.net/@data_science_dojo -- Also, join our communities: LinkedIn: https://www.linkedin.com/groups/13601597/ Twitter: https://twitter.com/i/communities/167
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Science Dojo · Data Science Dojo · 44 of 60

1 Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Data Science Dojo
2 Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Science Dojo
3 Reading External Data Sources | Beginning Azure ML | Part 2
Reading External Data Sources | Beginning Azure ML | Part 2
Data Science Dojo
4 Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Data Science Dojo
5 Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Data Science Dojo
6 Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Data Science Dojo
7 Feature Engineering & R Script | Beginning Azure ML | Part 6
Feature Engineering & R Script | Beginning Azure ML | Part 6
Data Science Dojo
8 Building Your First Model | Beginning Azure ML |  Part 7
Building Your First Model | Beginning Azure ML | Part 7
Data Science Dojo
9 Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Data Science Dojo
10 Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Data Science Dojo
11 Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Data Science Dojo
12 Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Data Science Dojo
13 Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Data Science Dojo
14 Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Data Science Dojo
15 David Wechsler on the Impact of Data Science Bootcamp
David Wechsler on the Impact of Data Science Bootcamp
Data Science Dojo
16 Andrew Choi on the Impact of Data Science Bootcamp
Andrew Choi on the Impact of Data Science Bootcamp
Data Science Dojo
17 Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Data Science Dojo
18 Michael DAndrea on the Impact of Data Science Bootcamp
Michael DAndrea on the Impact of Data Science Bootcamp
Data Science Dojo
19 Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Science Dojo
20 Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Data Science Dojo
21 Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Data Science Dojo
22 Scale R to Big Data with Hadoop & Spark | Community Webinar
Scale R to Big Data with Hadoop & Spark | Community Webinar
Data Science Dojo
23 Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Data Science Dojo
24 Ryan DeMartino on the Impact of Data Science Bootcamp
Ryan DeMartino on the Impact of Data Science Bootcamp
Data Science Dojo
25 Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Data Science Dojo
26 Wade Wimer on the Impact of Data Science Bootcamp
Wade Wimer on the Impact of Data Science Bootcamp
Data Science Dojo
27 Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Data Science Dojo
28 Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Data Science Dojo
29 Lance Milner on the Impact of Data Science Bootcamp
Lance Milner on the Impact of Data Science Bootcamp
Data Science Dojo
30 Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Data Science Dojo
31 Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Data Science Dojo
32 Michael Atlin on the Impact of Data Science Bootcamp
Michael Atlin on the Impact of Data Science Bootcamp
Data Science Dojo
33 Amina Tariq's In-Person Experience at Data Science Bootcamp
Amina Tariq's In-Person Experience at Data Science Bootcamp
Data Science Dojo
34 Ceo's Revelation about Data Science Bootcamp
Ceo's Revelation about Data Science Bootcamp
Data Science Dojo
35 Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Data Science Dojo
36 Kevin Hillaker on the Impact of Data Science Bootcamp
Kevin Hillaker on the Impact of Data Science Bootcamp
Data Science Dojo
37 Marko Topalovic's Experience with Data Science Bootcamp
Marko Topalovic's Experience with Data Science Bootcamp
Data Science Dojo
38 Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Data Science Dojo
39 Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Data Science Dojo
40 Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Data Science Dojo
41 Vang Xiong on the Impact of Data Science Bootcamp
Vang Xiong on the Impact of Data Science Bootcamp
Data Science Dojo
42 Data Scientist's Experience at Our Data Science Bootcamp
Data Scientist's Experience at Our Data Science Bootcamp
Data Science Dojo
43 Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Data Science Dojo
Introduction To Titanic Kaggle Competition | Part 1
Introduction To Titanic Kaggle Competition | Part 1
Data Science Dojo
45 Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Data Science Dojo
46 Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Data Science Dojo
47 How To Do Titanic Kaggle Competition in R | Part 3.1
How To Do Titanic Kaggle Competition in R | Part 3.1
Data Science Dojo
48 How to do the Titanic Kaggle competition in R | Part 3.1
How to do the Titanic Kaggle competition in R | Part 3.1
Data Science Dojo
49 Delve Deeper into Data Science with Data Science Bootcamp
Delve Deeper into Data Science with Data Science Bootcamp
Data Science Dojo
50 Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Data Science Dojo
51 Shaena Montanari on the Impact of Data Science Bootcamp
Shaena Montanari on the Impact of Data Science Bootcamp
Data Science Dojo
52 Types of Sampling | Introduction to Data Mining | Part 12
Types of Sampling | Introduction to Data Mining | Part 12
Data Science Dojo
53 Sampling for Data Selection | Introduction to Data Mining | Part 11
Sampling for Data Selection | Introduction to Data Mining | Part 11
Data Science Dojo
54 Data Aggregation | Introduction to Data Mining | Part 10
Data Aggregation | Introduction to Data Mining | Part 10
Data Science Dojo
55 Data Cleaning | Introduction to Data Mining | Part 9
Data Cleaning | Introduction to Data Mining | Part 9
Data Science Dojo
56 Missing & Duplicated Data | Introduction to Data Mining | Part 8
Missing & Duplicated Data | Introduction to Data Mining | Part 8
Data Science Dojo
57 Data Noise | Introduction to Data Mining | Part 7
Data Noise | Introduction to Data Mining | Part 7
Data Science Dojo
58 Graph and Ordered Data | Introduction to Data Mining | Part 5
Graph and Ordered Data | Introduction to Data Mining | Part 5
Data Science Dojo
59 Document Data & Transaction Data | Introduction to Data Mining | Part 4
Document Data & Transaction Data | Introduction to Data Mining | Part 4
Data Science Dojo
60 Data Quality | Introduction to Data Mining | Part 6
Data Quality | Introduction to Data Mining | Part 6
Data Science Dojo

This video introduces the Titanic Kaggle competition and explains how to participate by creating a Kaggle account, submitting a model, and using the Titanic dataset. It covers the basics of predictive modeling, data science, and supervised learning.

Key Takeaways
  1. Create a Kaggle account
  2. Submit a model to the Titanic competition
  3. Use the Titanic dataset for predictive modeling
  4. Save submission as a CSV file
  5. Upload submission to Kaggle
  6. Enter Kaggle username into the internal leaderboard
💡 Participating in Kaggle competitions can help improve data science skills and provide a platform for crowdsourced data science.

Related Reads

📰
How to Use Poe for Llm-Friendly Content Structure in 2026
Use Poe to structure content for search engines and AI-powered answer engines
Dev.to AI
📰
Kairos-4B: the open-source world model that just lapped the competition four times over
Learn about Kairos-4B, an open-source world model that surpasses competition four times over, and how it achieves real-time performance on edge devices
Medium · Machine Learning
📰
Google’s Open Knowledge Format (OKF): Is This the Beginning of the End for RAG?
Google's Open Knowledge Format (OKF) might enhance Retrieval-Augmented Generation (RAG) rather than replace it, and understanding OKF is crucial for professionals working with AI and knowledge management
Medium · Programming
📰
New AI tutor achieves 0.71-1.30 SD effect size in Dartmouth course [pdf]
Phosphor, an AI-powered learning platform, achieves significant learning gains by integrating LLM-graded formative assessments into instructional content, increasing student engagement and efficacy
Hacker News (AI)
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →