Data Science learning roadmap for 2021
Key Takeaways
Harshit Tyagi presents a data science learning roadmap for 2021, covering skills like programming, data extraction, and machine learning
Full Transcript
hello everyone uh first of all a very happy new year to all of you i wish you all a fulfilling and excellent 2021 now although nothing really changes except for the date but a new year fills everyone up with the enthusiasm and hope to start things afresh everyone starts making uh new resolutions and you know setting goals but it's very hard to achieve these goals without any planning so adding a bit of planning and a road map always helps you keep a clear mind and protects you from going astray or getting lost along the way so in this video i'm going to be sharing with you the data science learning roadmap this is going to be an extension of the curriculum but it charts out the multi-level skills map that details out on what all skills you should hone how you should measure yourself on each level and techniques to master each of those skills so i have created this roadmap based on my past experiences interview experiences frequently asked questions in the interviews and you know requirements and eligibility criteria mentioned in across so many job descriptions and along with this roadmap you would also get a learning tracker that i've created so that would help you keep track of how much you have done your progress and how much is left along with that i have also provided a few learning resources on each of those skills and each of those topics that you should master so without any further ado let's deep dive into this road map [Music] there are a number of skills and tasks under the field of data science now the roadmap is based on these tasks i've ordered them here in multiple strata or levels based on the level of complexity and how commonly they are used in the industry so we start off by programming at the bottom which is the most common one or i should say prerequisite in majority of the tasks in data science then we have data extraction and wrangling which gets slightly more complex as you have to extract manipulate data clean it and take care of other issues with the data then exploratory data analysis and storytelling which requires business acumen to be able to define good questions create insightful reports dashboards for the management to make decisions then we have applied statistics and mathematics which is commonly asked in data science interviews and a few topics that require deep understanding after that we have machine learning with the three types of learning supervised unsupervised and reinforcement learning and not many organizations actually need you to do machine learning modeling so uh it's basically low on common usage uh and then lastly we have ai or deep learning which gets more niche and it helps us solve problems in computer vision and nlp getting a lot of traction these days and it uses deep architectures of neural networks it's a field in itself now and it's beyond the scope of this particular roadmap so we'll probably have to create you know a dedicated roadmap for deep learning and it's beyond the scope of this video here we have the data science roadmap you can see uh i have taken out so many branches the these all of these branches are basically the skills the major section the level uh that we talked about in the pyramid so you can see i have programming data extraction and wrangling then eda business acumen and storytelling then i have data engineering which can be you know a separate roadmap in itself as well then we have statistics and mathematics and machine learning at the bottom uh which goes at the end of it so you start off from the left top and go down in that left lane and then you come to the right top and then you go down uh in that particular order so if you want to like skip so data engineering is something that you can take up on the side i want to cover it here so i've added it here as well so it's part of like if you are applying for a full stack data scientist job i think they expect you to know a little about you know cloud services how you're going to productionize your model deploy your model on production servers and all of those things so i want to cover it here so let's take a look at each one of these uh one by one so first of all we have programming so you are required to have sound programming skills so every data science job description would ask for programming expertise in at least one of the languages so uh you know most of them are using python or are i personally prefer python for most of my tasks so this whole you know the roadmap has been constructed in such a way so it's biased towards python so you should be good with the python scripting or r scripting you know you should have sound sql scripting skills as well you should be able to write complex queries you should be able to write functions you know uh logic func conditional logic uh conditional flow uh you should know about list and dictionary comprehension uh learn about object-oriented programming work with external libraries fundamental algorithms and so on and so forth also i have uh you know i've also written a blog over it so in case you feel that you know you are not able to find out the right resource for where to uh you know learn all of these skills or all of these topics so i've covered uh you know in detail like what all you should cover and also the resources uh you can say uh so this these all resources i provided from you know coursera data cam kaggle a few of the good resources this is not a be all and all list but i mean this is something that these are like really good resources that i followed at some point in my career as well and uh and some of these like i want to cover so it's not that i've covered all of these all right so coming back uh so make sure that you are able to write fundamental algorithms searching and sorting algorithms you know trees graphs uh make sure that you are actually uh good with all of these basic data structures and algorithms now once that is done uh you should also practice you know with get github uh learn how to handle command line interface uh and uh how will you measure so that's uh another part of the roadmap so you'll have to solve a lot of problems on hacker rank and lead code so do all of those problems you can build projects like maybe extracting a data you know from a website like scraping from uh scraping data from a website that would actually require you to you know first find out legitimate uh or websites that uh allow you to scrape data from there uh and then you can build like games like rock paper scissors tic-tac-toe you know hangman and so on and so forth you can also build some web application like youtube video downloader uh you know and website blocker and all of those then uh moving on the next part is data extraction and wrangling a significant part of the data science work is centered around finding appropriate data that can help you solve your problem and you can collect data from different legitimate sources like scraping if the website allows apis a lot of websites like you know applications like twitter tmdb uh quandl some many you know financial instrument applications allow you to query data from their apis and you'll have to just set up your accounts and do [Music] write your scripts to extract data so once you extract data so most of the time people are facing issues with uh finding it very hard to collect the right data set for the problem that they're solving so that's one task in itself then you'll have to write uh you know some scripts to extract data from different sources next step is to format the data clean it type conversion basically so once you have the data in hand uh an analyst will often uh you know would find herself for himself cleaning data frames working with multi-dimensional areas doing scientific computing using descriptive computations as well and then there's a lot of data manipulation aggregation is that's required that goes in uh so you'd have to learn to use these two very important libraries called pandas and numpy then data transformation requires you to so basically this is again using pandas joining slicing subsetting indexing you should be good and you should feel comfortable doing all of these operations then handling missing values so you basically you'll have to master these two libraries pandas and numpy and when it comes to project so basically i've added over here so you can look at a few resources that i provided to learn or master these two libraries pandas and numpy a few of them from kaggle then i have added a free code cam course as well on youtube which is free of cost again so project ideas collect data from any website api there are many public repositories as well that you can find find them and start working on the data set so you'll have to do some initial exploration uh define some questions and then get down to crunching uh those data sets so that is the second second step and then again you can contribute to any open source repository as well in their analysis or uh maybe start a new project at any or any of the organization that you're working at so that is that can be another initiative that you can take now next up we have eda uh business acumen and storytelling so uh basically drawing insights from the data and then communicating the same to the management in simple concise terms and using the right visualizations is the core responsibility of a data analyst or marketing marketing analyst and there's another job profile that is data product manager which requires you to have a good knowledge of your business that you're associated with so defining uh you know business focused questions that is very important because most of time you are just you go into that rabbit hole where you are just trying to find out answers to the questions that you have you know uh that are that would be good for your coding but not actually for uh the entire business so make sure that you define questions that have metrics directly related or directly correlated with the uh business outcome so then studying data distributions learning about outliers handling the outline outliers and moving on to univariate multivariate analysis learn to perform these analysis and there are many other methods that you'll have to deep dive into so i provided courses for uni merit and multivariate analysis how do there are many uh many lower level uh techniques algorithms that you'll have to learn about in order to handle let's say numerical numerical data or numerical categorical data or categorical categorical so they're different combinations that you'll have to learn to address then data visualization is a very important aspect of data analysis and being able to you know choose the right to plot the right chart in order to convey what the data is actually telling to actually draw out insights then building dashboards you can you know a very underrated skill is to do all of the analysis in excel or tableau or you know so you most of the time most of the time people spend uh start off with the writing python code or start doing all this crunching but actually it's just you know a task of a few clicks in these really great tools so you should learn to use excel tableau or power bi any of the tool that you feel comfortable with i've also provided the list of resources here so again you can look at these resources where i have added a few so you can see look at this one data visualization in spreadsheet excel tableau master any one of them so there's a huge demand for that as well in the market then writing uh concise and insightful reports and business acumen learn about business acumen you know the product in itself i've added a few books so measure what matters decode and conquer and cracking the product manager interview so most of these are so it would suit a profile uh which would be you know which would require which is in the uh in between uh data analysis the code data analysis and product manager so there's another profile called data product manager that you can actually target after this one then we have data engineering so data engineering again it's a very cool technical profile so you would have to go down and learn to use you know first of all you should have like very strong programming skills you should have a really good command over python or whichever language that you're using then second thing is working with cli you'd be required to work a lot with the terminal with the shell so working with linux based operating systems that is again a very important prerequisite over here then building data extraction transformation loading or etl pipelines so that is like the core responsibility of a data engineer just to make sure the data is coming in very smoothly in clean formatted way and then they you should be able to support the data team where scientists and analysts are working and utilizing the data coming in from the data warehouses that you are actually maintaining then using tools using uh spark kafka airflow and there are many more open source tools as well as their tools you know that are offered by cloud services like aws gcp so you should try and master at least one of them uh any one of the cloud services so aws is something that that has really good uh community support and customer support as well so you can pick up aws tcp is coming up azure is also in progress so yeah pick any one of your choice then algorithms you should be familiar with mapreduce yarn and then deploying and there are many other models so i've just highlighted the high level picture but the resources that i provided they would deep dive into each of these so deploying ml models in production so that's another one of the tasks that you would be responsible for so once let's say the data scientist or the machine learning engineer or people in the data team have come up with a model it would be your responsibility to actually go ahead and you know deploy it in the production then you can also so let's say you have gone through it and you are ready to appear for a certification program you can do that so there's aws offers this certified machine learning uh program a certification and then there is one offered by gcp which is called professional data engineer you can take up these examinations it's not required but it does add weight to your profile and it doesn't guarantee any job but yeah this is something that you can actually do to measure your knowledge or assess yourself on this level then uh comes the very important part which is statistics and mathematics especially if you are interested in applying for a job uh as a data scientist or a quantitative analyst so you are required to be thorough with concepts like descriptive statistics which require you to go through you know be able to summarize data using mean median mode standard deviation trimmed mean weighted mean weighted average all of those and then it deep dives into inferential statistics and experiment design which is again a very crucial part of data science so you should be good with the designing hypothesis test eb testing analyzing a b testing experiment results uh learning about what is confidence interval how do you how do you uh you know make a conclusion out of the p-value and an over analysis so anova is basically analysis of variance and then chi square test all of these are actually uh you know deep core statistical uh methods that would that you would actually need in order to make sense of the results that you are actually deriving uh you know or collecting uh during those experiments then you should be thorough with sampling data distributions t tests linear algebra when it comes to mathematics so you should actually know at least this much in order to move ahead and start with machine learning so i've added the resources here so you can look at these books that i have recommended uh practical statistics naked statistics just to have so this one is for uh you know uh it's non-technical uh but yeah it gives you a general picture of how you how you can you know utilize statistics in the daily uh world and then practical statistics deep dives into each of these topics that i've talked about and then there are two different courses by udacity these are free of cost again you can look up and then i've added a few project ideas that you can take up from here as well now the next part is of course machine learning so we have supervised unsupervised reinforcement so in supervise you you know you have to master classification and regression algorithms and then unsupervised clustering pca dimensionality reduction so you'll have to master all of these uh in order to be able to actually you know stand out or you know stand a chance to pass those machine learning interviews and reinforcement learning is something that you you might want to skip because not they're very very few organization that would actually ask you to you know know reinforcement learning so based on whatever you are applying for so go ahead and do that then uh you should be very good with the choosing the right performance metrics and how to uh you know make sense of them so root mean square accuracy confusion metrics when it comes to classification auc roc so learn all of the performance metrics uh associated with that particular algorithm that you are working with then hyper parameter tuning how to optimize your algorithm how to optimize your model uh then statistical ml includes k n decision trees bagging boosting uh you know uh that comes with uh optimizing and ensemble modeling so ensemble models uh like a random forest voting class if i had a boost you should be thorough with all of these models uh build a lot of them train a lot of them and see how they are different and how they optimize or you know enhance the performance of your model in your entire system in solving the problem again uh provided these very important resources over here that i found useful again you can add or look for any other resource if you want uh this is just a list so then at the end i've added this deep learning uh course uh basically uh you can take up this specialization very you know very popular specialization offered by deep learning dot ai nwng again as the author of this course and at the end i have provided a learning tracker for all of you so this learning tracker is basically just listing down all the resources and you know the project assignment or any remark or notes that you would want to add so this is something that you can follow you can add the status to completed in progress you can make i'll provide the link in the description uh you can duplicate it and use it to your need customize it to your needs and then go ahead from there so i've added all the lists uh all the tables uh here as you can see programming data extraction ada and so on and so forth you can use this notion template and make one for your self as well so this is a pretty wide spectrum of data science that i've covered over here everything that comes under the data domain i have it over here except for deep learning which requires special attention so i'll create a dedicated roadmap for deep learning separately in some other video but yeah i think you are not required to cover every branch every skill out there and it also depends on the kind of job that you want to apply for the kind of organization that you want to apply to for example there are organizations that do not want you to do any machine learning model they just simply want you to understand data from a statistical point of view so if you're good with statistics you're good to go for that particular job so but there are areas that i personally want to you know enhance and work upon so for example uh data engineer is something that i've worked on uh in the past but uh cloud services provider or maybe using a particular tool of aws that is something that i want to go back and you know learn more about programming for example very basic but i might want to go back and solve some more uh lead code problems or learn about best practices in order to be able to contribute to open source and machine learning is something that still requires you know uh requires you to be up to date as there are new things coming up every now and then and you know statistics is something that i'm working on i have been working on a course uh for you know on statistics which will be out soon uh but yeah i think uh i hope you find this video or and the tools that i have provided the tracker the road map and the resources i hope you find all of them useful and if you do please share it with your peers as well do uh like this video and subscribe to this channel help us grow and one of the most important things that i want to say is comment down below uh if you think i should add something to any of the branches i if i should add any new skill or if you want me to create a series or a tutorial or do a project video on any one of these topics feel free to comment down below and yeah i'll catch you guys in the next one
Original Description
This is just a high-level overview of the wide spectrum of data science and you might want to deep dive into each of these topics and create a low-level concept-based plan for each of the categories.
Blog: https://www.freecodecamp.org/news/data-science-learning-roadmap/
Notion: https://www.notion.so/Data-Science-learning-tracker-0d3c503280d744acb1b862a1ddd8344e
You can also connect with me on:
LinkedIn: https://www.linkedin.com/in/tyagiharshit/
Twitter: https://twitter.com/dswharshit
Instagram: https://www.instagram.com/upgradewithharshit
Medium: https://dswharshit.medium.com/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Harshit Tyagi · Harshit Tyagi · 30 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
▶
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Your PATH to learning Data Science
Harshit Tyagi
Ideal Python environment setup for Data Science projects - Unix shell, Anaconda and Git.
Harshit Tyagi
Building COVID-19 interactive dashboard from Jupyter Notebook | No frontend/backend coding required.
Harshit Tyagi
Introduction to Jupyter Notebooks - Interface | Ipython Kernel | Sharing | GitHub
Harshit Tyagi
Python fundamentals for Data Science - Part 1 | Data types | Strings | Lists
Harshit Tyagi
Python fundamentals for Data Science - Part 2 Dictionaries | Conditionals | Loops | Functions
Harshit Tyagi
Python fundamentals for Data Science - Part 3 OOPS | Working with External Libraries & Modules
Harshit Tyagi
NumPy Essentials for Data Science - part-1 | One Dimensional Array
Harshit Tyagi
NumPy Essentials for Data Science - part-2 | Multi-Dimensional Array
Harshit Tyagi
Math For Data Science | Practical reasons to learn math for Machine/Deep Learning
Harshit Tyagi
Linear Algebra Ep 1 | Introduction to Vectors, Matrices and Tensors using NumPy
Harshit Tyagi
Linear Algebra Ep 2 | Dot Product in Linear Algebra for Data Science
Harshit Tyagi
Python vs R | The BEST programming language for your Data Science Project
Harshit Tyagi
Linear Algebra for Data Science Ep3 | Identity and Inverse Matrices | NumPy
Harshit Tyagi
The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account
Harshit Tyagi
Google Certified TensorFlow Developer | Learning Plan, Tips, FAQs & my Journey
Harshit Tyagi
Speeding up your Data Analysis | Hacks & Libraries
Harshit Tyagi
How to build an Effective Data Science Portfolio
Harshit Tyagi
End-to-End Machine Learning Project Tutorial - Part 1
Harshit Tyagi
Data Preparation with Sci-kit learn and Pandas | End-to-End ML Project Tutorial - Part 2
Harshit Tyagi
Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3
Harshit Tyagi
Deploying a Trained ML model via Flask on Heroku | End-to-End ML Project Tutorial - Part 4
Harshit Tyagi
Three Decades of Practising Data Science | Interview with Dean Abbott
Harshit Tyagi
Calculating Vector Norms - Linear Algebra for Data Science - IV
Harshit Tyagi
Ep1 - Getting Started | Zero to Hero in Computer Vision with TensorFlow
Harshit Tyagi
Ep3 - Designing Data Experiments to enhance your Product | Rapido's Data Science Lead, Pramod N
Harshit Tyagi
Building projects with fastai - From Model Training to Deployment
Harshit Tyagi
October AI - Video Calling with One-Tenth of Internet Bandwidth
Harshit Tyagi
November AI - Breakthrough in biology after 50 years | Datasets, books, research papers and more...
Harshit Tyagi
Data Science learning roadmap for 2021
Harshit Tyagi
Talk is cheap, BUILD - Microsoft Software Engineer | Interview with Abhirath Batra
Harshit Tyagi
Building a Habit of Reading Research Papers | Ft. Anurag Ghosh(Microsoft Researcher)
Harshit Tyagi
Tableau vs Python - Building a COVID tracker dashboard
Harshit Tyagi
[Explained] What is MLOps | Getting started with ML Engineering
Harshit Tyagi
Dmitry Petrov - Creator of DVC | ML Systems, Teams, Scaling challenges, and Learning Data Science
Harshit Tyagi
Five hard truths about building a career in Data Science
Harshit Tyagi
Computing gradients using TensorFlow | Training a Linear Regression model from scratch.
Harshit Tyagi
Foundations for Data Science & ML - First steps for every beginner!
Harshit Tyagi
Course Outline - Foundations for Data Science & ML
Harshit Tyagi
How Machine Learning uses Linear Algebra to solve data problems
Harshit Tyagi
Calculus for ML - How much you should know to get started
Harshit Tyagi
Building a buzzing stocks news feed using NLP and Streamlit | Named Entity Recognition & Linking
Harshit Tyagi
AI Engineer - The next big tech role!
Harshit Tyagi
AI researcher vs AI engineer | The next big tech role!
Harshit Tyagi
Reviewing LLMs for content creation
Harshit Tyagi
Building a chatGPT-like bot on WhatsApp #coding #chatgpt #engineering
Harshit Tyagi
High Signal AI - the most action-oriented newsletter on the web! #ai
Harshit Tyagi
Building an AI-powered Discord Chatbot Locally for FREE using Ollama
Harshit Tyagi
Build a second brain with Khoj 🧠 #ai #obsidian #plugins #productivity #engineering #notes
Harshit Tyagi
Summarising YouTube Videos using Ollama on Discord | Becoming an AI Engineer - Ep 2
Harshit Tyagi
Watch the full video on my channel - Roadmap to become an AI Engineer.
Harshit Tyagi
Mesop - Python-based UI framework from Google!
Harshit Tyagi
How I automated my YouTube | Gumloop tutorial | No Code
Harshit Tyagi
ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark
Harshit Tyagi
Microsoft's Autogen vs CrewAI - tested on a diverse range of use cases
Harshit Tyagi
Claude #AI artifacts are just amazing!
Harshit Tyagi
OpenAI releases CriticGPT to correct GPT-4's mistakes | Read the paper with me
Harshit Tyagi
Day in my life | Vlog #1
Harshit Tyagi
How to add AI Copilot to your application using CopilotKit | Tutorial
Harshit Tyagi
Quick Questions with an AI Founder - Anudeep Yegireddi
Harshit Tyagi
More on: Data Literacy
View skill →Related Reads
📰
📰
📰
📰
GuardFall: When Decades-Old Shell Injection Tricks Beat Modern AI Safety Guardrails
Dev.to · Cor E
What 116 court judgments taught me about the limits of AI
Medium · AI
Your ChatGPT History Is a Liability. I Fixed That With a $80 Chip and a Pi5.
Medium · AI
Your Skepticism About AI Is an Asset. Here’s How to Use It.
Medium · Programming
🎓
Tutor Explanation
DeepCamp AI