Foundations for Data Science & ML - First steps for every beginner!

Harshit Tyagi · Beginner ·🛡️ AI Safety & Ethics ·4y ago

Key Takeaways

The video covers the foundations for data science and machine learning, including programming, data engineering, machine learning, deep learning, mathematics, and statistics, with a focus on Python programming and essential libraries such as NumPy and Pandas.

Full Transcript

hello everyone i am back after a very long break of three months now what was i doing in these three months this video is all about that i was actually working on a very important topic on a very important course and it has finally come to completion or you can say i'm just about to complete in another 10 to 15 days so what is it about why have i started it let's find out in this video [Music] [Applause] [Music] so at the beginning of this year i published a roadmap on data science learning so it was widely accepted the article was translated in many different languages there is a video on my channel as well which has close to like 10 000 views or something so the road map was widely accepted people thanked me for publishing it students were asking questions everything was really good but there were a few students who pointed out that the article or the video was loaded with resources there were resources on every topic programming data engineering machine learning deep learning mathematics statistics but most of those resources were paid some of them were free as well some of them were really good and free but one of the flaw of that article or the video is that i didn't talk about the foundation like people asked me what all foundations one should work on when they are aspiring to learn data science or diving deep into machine learning now here i am going to talk about the three pillars of data science and machine learning that will provide you the very solid foundation for your career as a data scientist as a data analyst machine learning engineer practitioner researcher whatever you want to be so let's talk about the first pillar of data science or machine learning which is programming always start off by learning programming first and i personally prefer python over any other language just because of its versatility as well as ease of learning you can develop end-to-end projects using python so my personal go-to language for beginners would be python now what all concepts you should master how much you should learn let's talk about that so if we head down to the curriculum here you see in the programming section you should first focus on how to set up environment how to work with jupyter notebooks or google collaboratory notebooks how to do analysis in them and then you should start learning introduction to programming basically what are variables what are data types then strings python lists control flow loops how to iterate over those loops how to iterate over different data structures dictionaries iterating over a dictionary list comprehension sets tuples functions and then you move on to object-oriented programming learn about classes objects and further on move to python scripts so here by now you would be very comfortable with jupyter notebooks but there are python scripts they are modules their libraries that are written in vs code or any other text editor learn how to work with external libraries learn how to work with the files how to read files how to write two files best practices and lastly you should be able to extract data or collect data from different apis or databases now in the numpy module you see two lectures as of now but there are 15 to 18 more lectures that i'm working on and they would be published by the end of this month both for the numpy module as well as the pandas module but here you should be able to handle multi-dimensional areas indexing slicing transposing broadcasting creating pseudo random numbers and performing vectorized operations using you know scientific computing then for pandas you should be able to manipulate data you should know how to create series how to create data frames indexing in a data frame comparisons boolean indexing merging data frames mapping and applying functions and then data cleaning and wrangling as well so by the end of your pandas module you would have a really good understanding of how to analyze and crunch data lastly data visualization now visualization plays a crucial role as well you should know the matplotlib api hierarchy you should know how to add styles colors markers to applaud you should have a very good understanding of what kind of plots are used in what scenarios so line plots bar plots scatter plots histograms box plots and you should be familiar with all of these concepts when it comes to programming for data science the second pillar of data science for machine learning is mathematics now this is again a very controversial topic some people do not want to learn mathematics they say that they can do just well without learning mathematics i personally do not agree with it i think that you should really be familiar with essential mathematics in order to understand how the algorithm works and if something very custom or something you know very specific a niche problem comes your way you won't be able to handle it if you do not really understand how those algorithms are working and the back end of all of those algorithms is computational you must have seen that all of these job descriptions of data scientists machine learning engineers and analysts as well they require you to come from a computational background they want people who have done ms or phd in physics in mathematics these people are really good in mathematics and if you think that you can just you know build on top of a very brief or very superficial understanding of very simple mathematics then i don't think you would have a really long lasting career in this particular domain at least and i'm not saying you have to be a gold medalist or anything don't go too deep into it just learn enough linear algebra enough basic algebra learn enough calculus and some of the important functions that we use in common mathematics some high school mathematics so on and so forth and you can then build on top of what the algorithm does now after learning these topics you would have a really good base to understand all of those you know really heavy machine learning algorithms or deep learning algorithms you would understand okay how back propagation works how chain rule supports back propagation you would understand how partial derivatives help you compute those gradients so those are very important topics that one must actually pay attention to and now comes the last or the third pillar of data science and machine learning specifically data science i would say is statistics now every organization you know wants to be data driven uh they want people who can actually drive decision making data scientists who can actually crunch numbers and help them make decisions that could actually help the organization grow and statistics plays a very crucial role across every stage of this whole process now data scientists are required to explain data describe data they need to look at how the data is distributed they need to design experiments they need to quantify risk they need to quantify uncertainty they need to understand how metrics work so statistics i would say is a must game every interview every data science interview is going to grill you grill you on statistics so that is something that is absolutely essential and this course right here talks about those essential topics that one must actually start off with you can always keep building on top of it there are like two branches that i basically teach one is the descriptive statistics and the second one is inferential statistics inferential statistics basically talks about hypothesis testing different measurements significance testing and different types of other tests now the third branch of statistic that's very important is probability probability helps you quantify uncertainty it helps you quantify risk and all organizations all businesses want to learn how much risk is involved in a particular decision so that's what you are able to do once you understand the importance of these topics conditional probability probability distributions pdf cdf pmf probability mass functions all these things are very important in order to have a very long lasting query in order to have a really good foundation now i feel that there is a need of a very compact course that actually talks about the first steps for learning data science or machine learning that talks about developing that foundation that is required now if you take an example of google's machine learning course here is the prerequisites and pre-work that is actually required before you start learning machine learning this is what google has recommended you take the example of andrew anger's very very famous course on machine learning again a very good and free resource now on youtube and the thing is it requires mathematics as well you should be familiar with partial derivatives you should be familiar with linear algebra you should be familiar with important basic algebra important functions all of those things but there aren't enough resources that actually give you that compact course that goes just deep enough to complete and cover all of those topics and tell you how those are related to artificial intelligence or data science so this is why i have actually started this academy called viplane and here my aim is just to help you master data science or ai it might take time it is a slow process i know but you gotta pick a domain you gotta pick a field and then dive deep into it after building a very solid foundation now when students reached out to me after going through my article on the data science learning roadmap they asked me what would be the first steps what would be the foundational concepts that they can start off with and to be honest i was unable to find any particular compact yet affordable course that actually covers all of these topics and in the right amount of depth so here i present you viplane.com which is basically wip lane so that's the lane where you want to toggle yourself into work in progress so the biplane academy is all about mastering data science and ai and i would be publishing courses you know on a regular basis on this platform people can enroll there would be a community there would be discord channels coming up really soon now the important thing is i will not just keep publishing courses on this platform i will be updating these courses every month based on the inputs of the students and here i present you the first course which is foundations for data science and machine learning now these basically cover the essentials of programming mathematics and statistics all of these concepts all of these topics that i have just talked about is covered in this course and after completing this course you would be able to start doing projects on data analysis data science you will need to learn a little bit more about machine learning algorithms and basically i would say first do data analysis projects learn how to crunch data and then move on to machine learning side of things so data analysis always comes first but you would be in a very good position to actually understand these concepts really quickly now the course not only covers the essential programming or you know prerequisites or pre-work that is required for data science or machine learning you actually cover every topic computationally as well as programmatically so we learn how to program or code those concepts as well be it any topic from mathematics or statistics you would be coding a lot in this course you would be working on assignments you would be working on some projects some exercises to get comfortable with each of those topics now i am currently in the process of finalizing this entire course there are a few videos that are left for the numpy module and pandas module and matplotlib but mathematics calculus linear algebra descriptive statistics programming all of those are actually complete and i've marked some of the videos as free so you can actually preview all of those videos and find out whether you whether the course actually meets your expectations or not now i'm actually releasing this course for pre-sales so i'm pre-selling this course now it would actually be completed by the first week of september so right now the course is actually priced at a very affordable 35 us dollars or if you are an indian it's basically 2500 rupees and after 30th of august it will be 50 so that would be the price i personally feel for the amount of content that has been put into it as well as some cost that has incurred in order to set up this whole platform it's something that i would have to charge and also one thing that i personally feel is if you do not pay for something you're not that sincere or serious about learning so that kind of does the job as well so there's a whole lot out there to help you build a solid foundation for data science and machine learning now it's on you whether you want it or not i'll catch you guys in the next one

Original Description

Check out the course: https://www.wiplane.com/p/foundations-for-data-science-ml You can follow me on: Newsletter: https://dswharshit.substack.com/ LinkedIn: https://www.linkedin.com/in/tyagiharshit/ Medium: https://dswharshit.medium.com/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Harshit Tyagi · Harshit Tyagi · 38 of 60

1 Your PATH to learning Data Science
Your PATH to learning Data Science
Harshit Tyagi
2 Ideal Python environment setup for Data Science projects - Unix shell, Anaconda and Git.
Ideal Python environment setup for Data Science projects - Unix shell, Anaconda and Git.
Harshit Tyagi
3 Building COVID-19 interactive dashboard from Jupyter Notebook | No frontend/backend coding required.
Building COVID-19 interactive dashboard from Jupyter Notebook | No frontend/backend coding required.
Harshit Tyagi
4 Introduction to Jupyter Notebooks - Interface | Ipython Kernel | Sharing | GitHub
Introduction to Jupyter Notebooks - Interface | Ipython Kernel | Sharing | GitHub
Harshit Tyagi
5 Python fundamentals for Data Science - Part  1 | Data types | Strings | Lists
Python fundamentals for Data Science - Part 1 | Data types | Strings | Lists
Harshit Tyagi
6 Python fundamentals for Data Science - Part 2 Dictionaries | Conditionals | Loops | Functions
Python fundamentals for Data Science - Part 2 Dictionaries | Conditionals | Loops | Functions
Harshit Tyagi
7 Python fundamentals for Data Science - Part 3 OOPS | Working with External Libraries & Modules
Python fundamentals for Data Science - Part 3 OOPS | Working with External Libraries & Modules
Harshit Tyagi
8 NumPy Essentials for Data Science - part-1 | One Dimensional Array
NumPy Essentials for Data Science - part-1 | One Dimensional Array
Harshit Tyagi
9 NumPy Essentials for Data Science - part-2 | Multi-Dimensional Array
NumPy Essentials for Data Science - part-2 | Multi-Dimensional Array
Harshit Tyagi
10 Math For Data Science | Practical reasons to learn math for Machine/Deep Learning
Math For Data Science | Practical reasons to learn math for Machine/Deep Learning
Harshit Tyagi
11 Linear Algebra Ep 1 | Introduction to Vectors, Matrices and Tensors using NumPy
Linear Algebra Ep 1 | Introduction to Vectors, Matrices and Tensors using NumPy
Harshit Tyagi
12 Linear Algebra Ep 2 | Dot Product in Linear Algebra for Data Science
Linear Algebra Ep 2 | Dot Product in Linear Algebra for Data Science
Harshit Tyagi
13 Python vs R | The BEST programming language for your Data Science Project
Python vs R | The BEST programming language for your Data Science Project
Harshit Tyagi
14 Linear Algebra for Data Science Ep3 | Identity and Inverse Matrices | NumPy
Linear Algebra for Data Science Ep3 | Identity and Inverse Matrices | NumPy
Harshit Tyagi
15 The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account
The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account
Harshit Tyagi
16 Google Certified TensorFlow Developer | Learning Plan, Tips, FAQs & my Journey
Google Certified TensorFlow Developer | Learning Plan, Tips, FAQs & my Journey
Harshit Tyagi
17 Speeding up your Data Analysis | Hacks & Libraries
Speeding up your Data Analysis | Hacks & Libraries
Harshit Tyagi
18 How to build an Effective Data Science Portfolio
How to build an Effective Data Science Portfolio
Harshit Tyagi
19 End-to-End Machine Learning Project Tutorial - Part 1
End-to-End Machine Learning Project Tutorial - Part 1
Harshit Tyagi
20 Data Preparation with Sci-kit learn and Pandas | End-to-End ML Project Tutorial - Part 2
Data Preparation with Sci-kit learn and Pandas | End-to-End ML Project Tutorial - Part 2
Harshit Tyagi
21 Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3
Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3
Harshit Tyagi
22 Deploying a Trained ML model via Flask on Heroku | End-to-End ML Project Tutorial - Part 4
Deploying a Trained ML model via Flask on Heroku | End-to-End ML Project Tutorial - Part 4
Harshit Tyagi
23 Three Decades of Practising Data Science | Interview with Dean Abbott
Three Decades of Practising Data Science | Interview with Dean Abbott
Harshit Tyagi
24 Calculating Vector Norms - Linear Algebra for Data Science - IV
Calculating Vector Norms - Linear Algebra for Data Science - IV
Harshit Tyagi
25 Ep1 - Getting Started | Zero to Hero in Computer Vision with TensorFlow
Ep1 - Getting Started | Zero to Hero in Computer Vision with TensorFlow
Harshit Tyagi
26 Ep3 - Designing Data Experiments to enhance your Product | Rapido's Data Science Lead, Pramod N
Ep3 - Designing Data Experiments to enhance your Product | Rapido's Data Science Lead, Pramod N
Harshit Tyagi
27 Building projects with fastai - From Model Training to Deployment
Building projects with fastai - From Model Training to Deployment
Harshit Tyagi
28 October AI - Video Calling with One-Tenth of Internet Bandwidth
October AI - Video Calling with One-Tenth of Internet Bandwidth
Harshit Tyagi
29 November AI - Breakthrough in biology after 50 years | Datasets, books, research papers and more...
November AI - Breakthrough in biology after 50 years | Datasets, books, research papers and more...
Harshit Tyagi
30 Data Science learning roadmap for 2021
Data Science learning roadmap for 2021
Harshit Tyagi
31 Talk is cheap, BUILD - Microsoft Software Engineer | Interview with Abhirath Batra
Talk is cheap, BUILD - Microsoft Software Engineer | Interview with Abhirath Batra
Harshit Tyagi
32 Building a Habit of Reading Research Papers | Ft. Anurag Ghosh(Microsoft Researcher)
Building a Habit of Reading Research Papers | Ft. Anurag Ghosh(Microsoft Researcher)
Harshit Tyagi
33 Tableau vs Python - Building a COVID tracker dashboard
Tableau vs Python - Building a COVID tracker dashboard
Harshit Tyagi
34 [Explained] What is MLOps | Getting started with ML Engineering
[Explained] What is MLOps | Getting started with ML Engineering
Harshit Tyagi
35 Dmitry Petrov - Creator of DVC | ML Systems, Teams, Scaling challenges, and Learning Data Science
Dmitry Petrov - Creator of DVC | ML Systems, Teams, Scaling challenges, and Learning Data Science
Harshit Tyagi
36 Five hard truths about building a career in Data Science
Five hard truths about building a career in Data Science
Harshit Tyagi
37 Computing gradients using TensorFlow | Training a Linear Regression model from scratch.
Computing gradients using TensorFlow | Training a Linear Regression model from scratch.
Harshit Tyagi
Foundations for Data Science & ML - First steps for every beginner!
Foundations for Data Science & ML - First steps for every beginner!
Harshit Tyagi
39 Course Outline - Foundations for Data Science & ML
Course Outline - Foundations for Data Science & ML
Harshit Tyagi
40 How Machine Learning uses Linear Algebra to solve data problems
How Machine Learning uses Linear Algebra to solve data problems
Harshit Tyagi
41 Calculus for ML - How much you should know to get started
Calculus for ML - How much you should know to get started
Harshit Tyagi
42 Building a buzzing stocks news feed using NLP and Streamlit | Named Entity Recognition & Linking
Building a buzzing stocks news feed using NLP and Streamlit | Named Entity Recognition & Linking
Harshit Tyagi
43 AI Engineer - The next big tech role!
AI Engineer - The next big tech role!
Harshit Tyagi
44 AI researcher vs AI engineer | The next big tech role!
AI researcher vs AI engineer | The next big tech role!
Harshit Tyagi
45 Reviewing LLMs for content creation
Reviewing LLMs for content creation
Harshit Tyagi
46 Building a chatGPT-like bot on WhatsApp #coding  #chatgpt #engineering
Building a chatGPT-like bot on WhatsApp #coding #chatgpt #engineering
Harshit Tyagi
47 High Signal AI - the most action-oriented newsletter on the web! #ai
High Signal AI - the most action-oriented newsletter on the web! #ai
Harshit Tyagi
48 Building an AI-powered Discord Chatbot Locally for FREE using Ollama
Building an AI-powered Discord Chatbot Locally for FREE using Ollama
Harshit Tyagi
49 Build a second brain with Khoj 🧠  #ai #obsidian #plugins #productivity #engineering #notes
Build a second brain with Khoj 🧠 #ai #obsidian #plugins #productivity #engineering #notes
Harshit Tyagi
50 Summarising YouTube Videos using Ollama on Discord | Becoming an AI Engineer - Ep 2
Summarising YouTube Videos using Ollama on Discord | Becoming an AI Engineer - Ep 2
Harshit Tyagi
51 Watch the full video on my channel - Roadmap to become an AI Engineer.
Watch the full video on my channel - Roadmap to become an AI Engineer.
Harshit Tyagi
52 Mesop - Python-based UI framework from Google!
Mesop - Python-based UI framework from Google!
Harshit Tyagi
53 How I automated my YouTube | Gumloop tutorial | No Code
How I automated my YouTube | Gumloop tutorial | No Code
Harshit Tyagi
54 ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark
ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark
Harshit Tyagi
55 Microsoft's Autogen vs CrewAI - tested on a diverse range of use cases
Microsoft's Autogen vs CrewAI - tested on a diverse range of use cases
Harshit Tyagi
56 Claude #AI artifacts are just amazing!
Claude #AI artifacts are just amazing!
Harshit Tyagi
57 OpenAI releases CriticGPT to correct GPT-4's mistakes | Read the paper with me
OpenAI releases CriticGPT to correct GPT-4's mistakes | Read the paper with me
Harshit Tyagi
58 Day in my life | Vlog #1
Day in my life | Vlog #1
Harshit Tyagi
59 How to add AI Copilot to your application using CopilotKit | Tutorial
How to add AI Copilot to your application using CopilotKit | Tutorial
Harshit Tyagi
60 Quick Questions with an AI Founder - Anudeep Yegireddi
Quick Questions with an AI Founder - Anudeep Yegireddi
Harshit Tyagi

This video provides an introduction to the foundations of data science and machine learning, covering essential topics such as programming, data engineering, and mathematics. It emphasizes the importance of building a strong foundation in these areas to succeed in data science and machine learning. The video also highlights the importance of practicing with assignments, projects, and exercises to reinforce learning.

Key Takeaways
  1. Install Python and essential libraries such as NumPy and Pandas
  2. Set up a programming environment with Jupyter Notebooks or Google Collaboratory Notebooks
  3. Learn basic programming concepts such as data types, control flow, loops, and object-oriented programming
  4. Practice data manipulation techniques with Pandas
  5. Create data visualizations with Matplotlib
  6. Study mathematics and statistics concepts essential for data science and machine learning
💡 Building a strong foundation in programming, data engineering, and mathematics is crucial for success in data science and machine learning.

Related Reads

📰
GuardFall: When Decades-Old Shell Injection Tricks Beat Modern AI Safety Guardrails
Decades-old shell injection tricks can bypass modern AI safety guardrails, highlighting the need for more robust security measures
Dev.to · Cor E
📰
What 116 court judgments taught me about the limits of AI
Learn about the limitations of AI in professional settings through an analysis of 116 court judgments and a personal project using consumer AI tools
Medium · AI
📰
Your ChatGPT History Is a Liability. I Fixed That With a $80 Chip and a Pi5.
Protect your ChatGPT history from being used as evidence with a simple hardware solution using a $80 chip and a Pi5
Medium · AI
📰
Your Skepticism About AI Is an Asset. Here’s How to Use It.
Learn to leverage skepticism about AI to improve its adoption and implementation in your team and organization, and why it matters for responsible AI development
Medium · Programming
Up next
Containers Don't Make Your AI Agent Safe
Web Dev Simplified
Watch →