5 websites to get Free Real-World Datasets for Data Science/ML Projects
Skills:
Data Literacy80%
Key Takeaways
Provides 5 websites for free real-world datasets for data science and machine learning projects
Full Transcript
hey friends welcome to a little coda in this video we are going to see how best you can find open lead asset for your data science project it could be that you are a new beginner and you are trying to make a hobby project or it could be that you are a researcher and you want data set for your particular research or analysis or it could be that you are a blogger and you want to write an article and then you want to find the relevant real old data set for that so this video I'm going to show you few places where you can find industry or you're close to industry real-world data sets on internet so the first place where you want to look at it is this one which is a collection of lot of available public data set so the collection of public data sets that are available on internet this place has already collected and but the problem is this is quite an exhaustive list so unless until you know what you exactly want you might be lost and it doesn't have any search on it so this is a good place if you exactly know what you want let's say you want some data set in economics so you go to economics and then you try to find the relevant link and then you go to that particular place and then you look for the dataset but if you do not know so I'm going to show you other places where you can look for the detested the second place where you can search for dataset is Google's latest search so for example it is assumed that you wanted a desert that is about UK Internet okay so when you search for UK Internet you're going to get places this is more like a search engine just like Google but only for a given dataset so you can see you from statisti you can see from Google sorry you case in government website so you can see all these places where you can actually download your data set and if you have a specific need saying that you want a dataset but that is only in a given format let's say a table then you can specifically mention that you want a given table you don't want any other format so Google Data a data set search is actually a good place if you want to search and understand where you have given data set so let's try something else saying that probably car usage and then again it would list you places where you can get relevant so the only downside is like I said so you it is it is like a search engine which means that once again you're going to get a lot of options and you may not get the raw data's it for example in this case it would probably give you data set that was used to make this chart but not like a rod it has it you can build your project or analysis so that is another problem with this thing but still it is a good it start if you want to start something smaller you have something in mind and you want just data to prove that point or analyst that point so the third one is data dot world so data or dirt world is a good place uh it's it's it's actually growing community and you can find data set for a lot of things the only problem in this is you have to login and register for a community edition so if you go to the pricing pages you might actually see free trial and then you might be put off this is a paid solution so they have a lot of options for paid users for organizations or if you're a professional law you know doing some kind of consulting but if you want a free community edition they have a free community edition where you can search for hundreds and thousands of data set you can even connect directly from your project and the problem is that you get only three private projects that you get to build use under your storage limit is one Kryon GB private storage for that you have to sign up so this is Gator dot world or you can look for data out world you have to sign up for a free community edition still it is it is a good option because these datasets are curated and then it's not like anyone can just simply upload it it is in walk away so that way it is a good option for you so the next option that I'm going to show you is actually one of the most well-known places for machine learning data sets um also kind of a cliched place but still I wanted to count it because of the good data sheets that they have called and the constant object that you get it is you see a machine learning so you can simply google for you see a machine learning and you would be taken to that place so most of the things that covered you can simply google online so you can look for awesome dataset you can look for Google dataset you can look for later on world which is also they have a straight URL and finally what we are looking at right now is you see a machine learning repository so again this is the place which is predominantly used by researchers in fact it gives him information about what is the most popular dataset you can like if you are from the our world you would probably know Irish if you are from the Python world probably you know like wine quality or adult income US census data it so these are all quite very very familiar data set and there are some data sets like human activity recognition using it smartphones like it say you are a college student and you want to university student and you want to build the project you want to use deep learning or you want to use some sort of machine learning to detect some human activity just based on the smartphone sensor and then this is the data said that you have to go so this this is again a constantly updated data set so this is again a good place so there is another advantage in this is that you can pick data set based on your needs so let us assume that you want to do a machine learning and you want to only you know numerical data then you can pick numerical or you want to do a machine learning task which is only regression then you can pick regression and then you can do so it has this kind of attributes or tags defined which would make it much easier for you to use this platform and this is quite helpful if you are actually you know data science content creator let's say you want to make a youtube video you want to make a blog tutorial you are a developer advocate then this is an this is a very good place for you to pick a dataset and then you know build your analysis or model on top of it so this is again a very good place quite cliched very well known lot of users aha Emily used it but still it is one of the best places in the world still and the next one is if you are interested in particularly economics related data or macroeconomics related data or world related data the World Bank's our latest website is really really good one for example if I want to know something about India I can just simply type India and it would take you to the data sheets that are available to related to India so they have got a couple of good options where you can download it that's it or you can do visualization online yourself you can build small plots and then share it with your network so it is it is really really good place and then again this is a place where you want to look for data straight especially for your own you know analysis or let's say you are a journalist and you want to build something this is exactly the place where you want to look for a date yes because you know the legitimacy of this data set how genuine it is yes it comes from a body that that deals with this kind of data so this is definitely a good place for you to pick the addresses for your economic projects you could be from economics you could be from statistics or you are doing some kind of research so this is definitely a good place for you to pick a dataset let's say you want to find agriculture dataset so let's let us see if they have got anything related agriculture okay so it didn't show anything related to agriculture you can probably say wholesale index or official price index or you can look for inflation so you've got all these economic indicators and partners in this website so this is a very good website for you to click economics related dataset the final one that I'm going to show you okay so the pre-final one is our world or world Internet let me just yeah our world in data that is a website or name our world in gator not huazi this is really really good website there are a lot of data said that are relevant to you know for the recent things that we come across for example you want Internet related data or you want let's say some this the new quantum accreditor or you want the commutation related data so this is this is really really good platform often you know highly ignored by a lot of people who talk about it I said so for example let's say in this case I want again look for agriculture so you can also show you later human country and let's say there is a plot and below that plot you can actually see the data set that was used to make this plot so this is this is really really a handy website it also provides you a lot of insight so for a given data set and you can really really pick a good later either to support your hypothesis or test your hypothesis are you in to start an analysis from scratch so this is really a good place for you to look for the tested in that case especially related to you know everything that is around us so the website name is our world in grey dot o-r-g which is a highly ignored and very underrated so I would give a good rating for them to you know people to use it and then build analysis on top of it and finally we are going to get into something that is quite popular with machine learning community which is calculate effects cattle was a platform that used to be only computation platform so if you do not know about cattle so I have a be not related to cattle taking people through cattle platform so I would link that also in the description you can look at it the cattle has a section called data said it has really good amount of later said especially because if you do not know calculus currently owned by Google and then Cal has been pushing datasets a lot on the section later said so let's say you want you want to build annotation addition prediction model so you can just type attrition I know you would get I hope you get something related to employee attrition let's say you want to understand about marketing funnel let's see if there is an idiot etcetera yeah so you have marketing funnel related dataset you want to look for data set related reviews you get later hit related reviews so candle is really really a good platform again for a lot of community driven data set so if you're if you want to do some analysis on YouTube that's a YouTube you also get popular videos on YouTube beta C so this is really a good platform one because you get dated sorry one you get data set available who you can actually use cattle notebook to build analysis on top of those leaders it so again for a lot of people who are getting started with Daniel analytics data science machine learning whatever you would like to call it cattle data it is really really good platform for you to look for the data set so you can afford a dataset and then also you can look at the license information for a given dataset so what license is updated and you use it for commercial purpose so you again this this makes it a lot clearer for you to use your data at any given project and this is a good place for you to look for it is it starting from smaller sized data sets to like any size so this is a really good platform if you want to use a test hit and then you know build hobby projects or even if you want to write articles download data set that is quite industry relevant and then based on that you can write blogs about it so that is all it and I wanted to name this video I places where you can look for industry related data it I think we have exceeded five so to quickly summarize have we started with two places that are like aggregator so honest awesome data which is a github repo that would take you to all the places where it is are available the second aggregator is some Google data set search which is like a search engine for finding data since then we started with proper websites which list data said we start we started from Gator world and we moved on to you you see a machine learning which is again a very popular place then we went to cattle sorry way then we went to World Bank data set for macroeconomic indicators and financial indicators then we went to our world in data dot o-r-g which is a very good place for a lot of relevant world a related dataset and finally we looked at cal platform where we get a lot of community driven dataset sometimes uploaded by companies themself and it's again good if you want to build hobby projects or real world close to real world analysis that can add a good you do your portfolio so I would link all these videos all these websites in the description and please let me know if I had missed any place where you usually find good dataset and I hope this video was helpful for you to kick start your machine learning or data science portfolio or just simply find it I said that can keep you occupied and if you have any suggestions please please let me know in the comment section and see you in the next video take care bye bye
Original Description
In this video, You'd learn about 5 websites where you can find free real-world datasets that you can use for your data science or Machine learning projects or for writing Articles / Blogging or for Academic Research.
1.https://data.world/
2.https://archive.ics.uci.edu/ml/index.php
3.https://data.worldbank.org
4.https://ourworldindata.org/
5.https://www.kaggle.com/datasets
Search/Aggregator
1.https://datasetsearch.research.google.com/
2.https://github.com/awesomedata/awesome-public-datasets
Kaggle Overview: https://www.youtube.com/watch?v=sstKQYgZRPo
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from 1littlecoder · 1littlecoder · 23 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
▶
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
How to create your Free Data Science Blog on Github with Fastpages from Fastai
1littlecoder
Making Interactive Matplotlib Plots for Data Science Visualizations on Jupyter (Python)
1littlecoder
Create your first Data Science Web App using R Shiny
1littlecoder
How to create a Reproducible Example in R using reprex
1littlecoder
No Code Visualization using esquisse with Tableau-like Drag and Drop GUI in R
1littlecoder
Scrape HTML Table using rvest and Process them for insights using tidyverse in R
1littlecoder
Google Teachable Machine Learning Build No Code AI solution
1littlecoder
Create meaningful fake tidy datasets in R using fakir [#rstats Package]
1littlecoder
How to enable using R Programming with Visual Studio VS Code
1littlecoder
Python, Community, Books - with Abhiram R - Bangpypers Co-organizers | 1littlecoder podcast
1littlecoder
Growing a Tech Community across India - Anubha Maneshwar, Founder Girlscript | 1littlecoder Podcast
1littlecoder
Intro to Google Colab - How to use Colab
1littlecoder
Intro to Plotly Express - Complex Interactive Charts with One-Line of Python Code
1littlecoder
Indic NLP Python Toolkit Open Source Development - iNLTK Creator Gaurav Arora | 1littlecoder Podcast
1littlecoder
Do you want a career in Data Science - Tamil Webinar
1littlecoder
Android Smartphone Analysis in R [Live Coding Screencast]
1littlecoder
Programmatically create Images, Memes, Watermarks using Python with imgmaker
1littlecoder
Kaggle Walkthrough to get you started with Data Science - Webinar
1littlecoder
Community, Corporate Job, Coding - Gnana Lakshmi T C aka Gyan, WomenWhoCode Leadership Fellow
1littlecoder
Easy ggplot2 Theme Customization with {ggeasy} | Data Visualization in R
1littlecoder
Excel to R - Pivot + Bar Chart in Excel & R using tidyverse [Live Coding]
1littlecoder
Excel to R #2 - VLOOKUP in Excel to LEFT_JOIN, MERGE in R
1littlecoder
5 websites to get Free Real-World Datasets for Data Science/ML Projects
1littlecoder
Excel to R #3 - APPROXIMATE VLOOKUP in Excel to FUZZY LEFT_JOIN in R
1littlecoder
Correlation-alternative PPS (Predictive Power Score) Python Package Demo
1littlecoder
Automated Website Screenshots in R using {webshot}
1littlecoder
Installing Custom RStudio Theme (Synthwave85)
1littlecoder
Analyse Google Trends Search Data in R using {gtrendsR}
1littlecoder
3 Tips to ask question on Stack Overflow the right way to get answers
1littlecoder
Learn Data Science with R - Mini Projects - Web Scraping Zomato
1littlecoder
Easily make Dumbbell Chart using {ggcharts} | Data Visualization in R
1littlecoder
GET Hackernews Front Page Results using REST API in R
1littlecoder
Quickly deploy ML WebApps from Google Colab using ngrok
1littlecoder
Use Jupyter Notebooks within VSCode (Visual Studio Code) in 2020
1littlecoder
Plotly Interactive Plots as Pandas Plotting Backend df.plot()
1littlecoder
Stack Overflow Developer Survey 2020 Highlights for New Programmers
1littlecoder
Matplotlib Animation Charts in Python using Celluloid
1littlecoder
Coding, Postwoman, Passion Project Book - Liyas Thomas Open Source Developer - 1littlecoder podcast
1littlecoder
Aspiring Data Scientist, Tips on How to learn Business Domain Knowledge
1littlecoder
Bokeh Interactive Charts as Pandas Plotting Backend df.plot_bokeh()
1littlecoder
Easy Fast Python Pandas Summary with Sidetable | Pandas Tips & Tricks
1littlecoder
Inception, Content Ideas, Consistency - Srivatsan Srinivasan AIEngineering YouTube Content Creator
1littlecoder
ggplot2 Text Customization with ggtext | Data Visualization in R
1littlecoder
Penguins Dataset Overview - iris alternative | EDA Data Visualization in R
1littlecoder
YouTube Growth Tips, Content Creation - Bhavesh Bhatt, YouTuber (Data Science & Machine Learning) #7
1littlecoder
Matplotlib Animated Bar Chart Race in Python | Data Visualization
1littlecoder
Simple Python GUI Development using {guietta}
1littlecoder
#8 Niche, Growth, Monetization - David Langer - YouTuber Dave on Data
1littlecoder
Simple Fast 3-step Python OCR using Deep Learning 40+ Languages
1littlecoder
Github New Feature Profile Summary/Mini-Resume - Profile Views
1littlecoder
Otto ML Assistant, GPT-3 on Philosophers, Nvidia-ARM - 3 ML Tech News
1littlecoder
What is OpenAI GPT-3 - Hype, Examples, Worries
1littlecoder
Julia 1.5, Datamuse API, Live HDR+ Pixel 4a - Machine Learning Tech News
1littlecoder
Self-driving Car Engineer sentenced, arXiv Dataset, AI/ML Startup Idea - Machine Learning Tech News
1littlecoder
GPT-3 Explorer, Ciphey (Automated Decryption), Py-Sudoku - ML Tech News
1littlecoder
How to use Advanced Google Search to extract Email Ids from Linkedin
1littlecoder
Cartoonizer Toon-IT (AI Web App), GPT-3 Advice, Android Earthquake Detection - ML Tech News
1littlecoder
Flow - R Package to visualize code logic, functions as a Flow Diagram
1littlecoder
Build GPT-3-like Language Model on Google Colab with minGPT [PyTorch]
1littlecoder
Create a Pencil Sketch Portrait with Python OpenCV
1littlecoder
More on: Data Literacy
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2
Medium · JavaScript
Stop Overfitting With Basically One Line of Code
Medium · AI
Stop Overfitting With Basically One Line of Code
Medium · Machine Learning
Stop Overfitting With Basically One Line of Code
Medium · Data Science
🎓
Tutor Explanation
DeepCamp AI