What is machine learning, and how does it work?

Data School · Beginner ·📐 ML Fundamentals ·11y ago

Key Takeaways

This video series covers the basics of machine learning using scikit-learn in Python, starting with the definition and categories of machine learning, and then diving into supervised and unsupervised learning, with a focus on supervised learning and its applications.

Full Transcript

have you heard about this concept called machine learning and you're trying to figure out exactly what that means or maybe you've checked out a few machine learning competitions on kaggle.com but you don't know how to get started if so I'm here to help my name is Kevin Markham and I'm a data science instructor in Washington DC this is my brand new video series about how to use the p learn library in Python for machine learning this is material that I love to teach and I can't wait to share with you in this series I'm going to cover s kit learn from the basics all the way through Advanced Techniques I'm not going to presume any familiarity with machine learning and in fact we're going to spend the next few videos talking about machine learning before we write any code the reason being there's really no point to using scikit learn if you don't know how to do proper machine learning you will need to have at least minimal experience with the Python programming language but I'll suggest some resources in the next video if you don't yet know python so with that let's get started in this video I'll be covering the following topics what is machine learning what are the two main categories of machine learning what are some examples of machine learning and how does machine learning work so what exactly is machine learning there's no Universal definition but at a high level I would Define machine learning as the semi-automated extraction of Knowledge from data let's break that down into three component parts first machine learning always starts with data and your goal is to extract knowledge or Insight from that data you have a question you're trying to answer and you hypothesize that your question might be answerable using the data second machine learning involves some amount of automation rather than trying to gather your insights from the data manually you're applying some process or algorithm to the data using a computer so that the computer can help to provide the Insight third machine learning is not a fully automated process as any practitioner can tell you machine learning requires you to make many smart decisions in order for the process to be successful we'll cover many of those decisions throughout this video series next let's talk about the two main categories of machine learning which are supervised learning and unsupervised learning supervised learning also known as predictive modeling is the process of making predictions using data for example if my data set is a series of email messages my supervised learning task might be to predict whether each email message is Spam or non-spam which is also known as ham this is supervised learning because there is a specific outcome we are trying to predict namely ham or spam in contrast unsupervised learning is the process of extracting structure from data or learning how to best represent data for example if my data set was the characteristics and purchasing behavior of Shoppers at a grocery store my unsupervised learning task might be to segment the Shoppers into groups or clusters that exhibit similar behaviors I might find that college students parents with young children and older adults have characteristic shopping behaviors that are similar within each group but dissimilar from the other two groups this is an unsupervised learning task because there is no right or wrong answer about how many clusters can be found in the data which people belong in which cluster or even how to describe each cluster let's do a quick quiz this is the kaggle website which is a popular platform for machine learning competitions this is their well-known Titanic competition and the goal is to predict which passengers survived the tragic sinking of the Titanic is this supervised or unsupervised learning this is supervised learning because your goal is to predict a specific outcome namely survival for each passenger in this video series I'm going to primarily Focus on supervised learning though I may cover unsupervised learning in later videos we've talked about what supervised learning is but we haven't yet talked about how it works so how does it actually work at a very high level here are the two main steps of supervised learning first you train a machine learning model using your existing labeled data labeled data is data which has been labeled with the outcome which in the case of the email example is whether each message is ham or spam this is called Model training because the model is learning the relationship between the attributes of the data and the outcome these attributes might include the message text the number of embedded links LS the length of the message and so on second you make predictions on new data for which you don't know the true outcome in other words when a new email message arrives you want your trained model to accurately predict whether the email is ham or spam without a human examining it to summarize these two steps you could say that the model is learning from past examples made up of inputs and outputs and then applying what it is learned to Future inputs in order to predict future outputs because you're making predictions on unseen data which is data that was not used to train the model it is often said that the primary goal of supervised learning is to build models that generalize in other words you want to build machine learning models that accurately predict the labels of your future emails rather than accurately predicting the labels of emails you've already received this simplified description of machine learning might raise some questions in your mind such as how do I choose which attributes of my data to include in the model how do I choose which model to use how do I optimize this model for best performance how do I ensure that I'm building a model that will generalize to unseen data can I estimate how well my model is likely to perform on unseen data these are excellent questions and hint at the complexity of doing effective machine learning all of these issues will be addressed later in the video series if you'd like a more in-depth introduction to machine learning there are two resources that I recommend that I've linked to below the video the First Resource is my favorite book on machine learning an introduction to statistical learning by Trevor Hasty and Rob tipani it's available as a free pdf download and section 2.1 introduces machine learning in a thorough yet accessible way the second resource I recommend is a 13-minute video from caltech's learning from data course which uses some excellent examples to compare supervised and unsupervised learning and also introduces another type of machine learning called reinforcement learning in the next video in the series I'll be covering the benefits and the drawbacks of psychic learn as well as my recommended way to set up python for machine learning in the meantime I'd love to hear from you in the YouTube comments if you have a question about machine learning or if you just have a cool example of machine learning that you like to share please do subscribe on YouTube if you'd like to hear the moment my next video comes out thanks for watching and I'll see you soon

Original Description

Have you heard of "machine learning", and you're trying to figure out exactly what that means? I'll give you my definition, provide some examples of machine learning, and explain at a high level how machine learning "works". Download the notebook: https://github.com/justmarkham/scikit-learn-videos An Introduction to Statistical Learning: http://www-bcf.usc.edu/~gareth/ISL/ Learning Paradigms: http://work.caltech.edu/library/014.html WANT TO GET BETTER AT MACHINE LEARNING? HERE ARE YOUR NEXT STEPS: 1) WATCH my scikit-learn video series: https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A 2) SUBSCRIBE for more videos: https://www.youtube.com/dataschool?sub_confirmation=1 3) JOIN "Data School Insiders" to access bonus content: https://www.patreon.com/dataschool 4) ENROLL in my Machine Learning course: https://www.dataschool.io/learn/ 5) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data School · Data School · 13 of 60

1 Setting up Git and GitHub
Setting up Git and GitHub
Data School
2 Navigating a GitHub Repository - Part 1
Navigating a GitHub Repository - Part 1
Data School
3 Forking a GitHub Repository
Forking a GitHub Repository
Data School
4 Creating a New GitHub Repository
Creating a New GitHub Repository
Data School
5 Copying a GitHub Repository to Your Local Computer
Copying a GitHub Repository to Your Local Computer
Data School
6 Committing Changes in Git and Pushing to a GitHub Repository
Committing Changes in Git and Pushing to a GitHub Repository
Data School
7 Syncing Your GitHub Fork
Syncing Your GitHub Fork
Data School
8 Allstate Purchase Prediction Challenge on Kaggle
Allstate Purchase Prediction Challenge on Kaggle
Data School
9 Troubleshooting: Updates Rejected When Pushing to GitHub
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
10 Hands-on dplyr tutorial for faster data manipulation in R
Hands-on dplyr tutorial for faster data manipulation in R
Data School
11 ROC Curves and Area Under the Curve (AUC) Explained
ROC Curves and Area Under the Curve (AUC) Explained
Data School
12 Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
What is machine learning, and how does it work?
What is machine learning, and how does it work?
Data School
14 Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
15 Getting started in scikit-learn with the famous iris dataset
Getting started in scikit-learn with the famous iris dataset
Data School
16 Training a machine learning model with scikit-learn
Training a machine learning model with scikit-learn
Data School
17 Comparing machine learning models in scikit-learn
Comparing machine learning models in scikit-learn
Data School
18 Data science in Python: pandas, seaborn, scikit-learn
Data science in Python: pandas, seaborn, scikit-learn
Data School
19 Selecting the best model in scikit-learn using cross-validation
Selecting the best model in scikit-learn using cross-validation
Data School
20 How to find the best model parameters in scikit-learn
How to find the best model parameters in scikit-learn
Data School
21 How to evaluate a classifier in scikit-learn
How to evaluate a classifier in scikit-learn
Data School
22 What is pandas? (Introduction to the Q&A series)
What is pandas? (Introduction to the Q&A series)
Data School
23 How do I read a tabular data file into pandas?
How do I read a tabular data file into pandas?
Data School
24 How do I select a pandas Series from a DataFrame?
How do I select a pandas Series from a DataFrame?
Data School
25 Why do some pandas commands end with parentheses (and others don't)?
Why do some pandas commands end with parentheses (and others don't)?
Data School
26 How do I rename columns in a pandas DataFrame?
How do I rename columns in a pandas DataFrame?
Data School
27 How do I remove columns from a pandas DataFrame?
How do I remove columns from a pandas DataFrame?
Data School
28 How do I sort a pandas DataFrame or a Series?
How do I sort a pandas DataFrame or a Series?
Data School
29 How do I filter rows of a pandas DataFrame by column value?
How do I filter rows of a pandas DataFrame by column value?
Data School
30 How do I apply multiple filter criteria to a pandas DataFrame?
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
31 Your pandas questions answered!
Your pandas questions answered!
Data School
32 How do I use the "axis" parameter in pandas?
How do I use the "axis" parameter in pandas?
Data School
33 How do I use string methods in pandas?
How do I use string methods in pandas?
Data School
34 How do I change the data type of a pandas Series?
How do I change the data type of a pandas Series?
Data School
35 When should I use a "groupby" in pandas?
When should I use a "groupby" in pandas?
Data School
36 How do I explore a pandas Series?
How do I explore a pandas Series?
Data School
37 How do I handle missing values in pandas?
How do I handle missing values in pandas?
Data School
38 What do I need to know about the pandas index? (Part 1)
What do I need to know about the pandas index? (Part 1)
Data School
39 What do I need to know about the pandas index? (Part 2)
What do I need to know about the pandas index? (Part 2)
Data School
40 How do I select multiple rows and columns from a pandas DataFrame?
How do I select multiple rows and columns from a pandas DataFrame?
Data School
41 Machine Learning with Text in scikit-learn (PyCon 2016)
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
42 When should I use the "inplace" parameter in pandas?
When should I use the "inplace" parameter in pandas?
Data School
43 How do I make my pandas DataFrame smaller and faster?
How do I make my pandas DataFrame smaller and faster?
Data School
44 How do I use pandas with scikit-learn to create Kaggle submissions?
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
45 More of your pandas questions answered!
More of your pandas questions answered!
Data School
46 How do I create dummy variables in pandas?
How do I create dummy variables in pandas?
Data School
47 How do I work with dates and times in pandas?
How do I work with dates and times in pandas?
Data School
48 How do I find and remove duplicate rows in pandas?
How do I find and remove duplicate rows in pandas?
Data School
49 How do I avoid a SettingWithCopyWarning in pandas?
How do I avoid a SettingWithCopyWarning in pandas?
Data School
50 How do I change display options in pandas?
How do I change display options in pandas?
Data School
51 How do I create a pandas DataFrame from another object?
How do I create a pandas DataFrame from another object?
Data School
52 How do I apply a function to a pandas Series or DataFrame?
How do I apply a function to a pandas Series or DataFrame?
Data School
53 Getting started with machine learning in Python (webcast)
Getting started with machine learning in Python (webcast)
Data School
54 Q&A about Machine Learning with Text (online course)
Q&A about Machine Learning with Text (online course)
Data School
55 Your pandas questions answered! (webcast)
Your pandas questions answered! (webcast)
Data School
56 Machine Learning with Text in scikit-learn (PyData DC 2016)
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
57 Write Pythonic Code for Better Data Science (webcast)
Write Pythonic Code for Better Data Science (webcast)
Data School
58 Web scraping in Python (Part 1): Getting started
Web scraping in Python (Part 1): Getting started
Data School
59 Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Data School
60 Web scraping in Python (Part 3): Building a dataset
Web scraping in Python (Part 3): Building a dataset
Data School

This video introduces machine learning, its categories, and supervised learning, and explains how machine learning works, with a focus on supervised learning and its applications.

Key Takeaways
  1. Define machine learning and its categories
  2. Explain supervised and unsupervised learning
  3. Describe the process of supervised learning
  4. Discuss the importance of generalization in supervised learning
💡 Machine learning is the semi-automated extraction of knowledge from data, and supervised learning is a type of machine learning that involves making predictions using labeled data.

Related Reads

📰
What Is MLIR and Why Does It Exist?
Learn about MLIR, a intermediate representation for machine learning models, and its purpose in optimizing ML workflows
Dev.to · Fedor Nikolaev
📰
Why Choosing the Right Machine Learning Development Company Matters More Than the AI Model
Choosing the right machine learning development company is crucial for turning AI investments into measurable results, as it can make or break the success of AI projects
Medium · Machine Learning
📰
Data privacy in AI training: federated learning, differential privacy, and synthetic data
Learn how federated learning, differential privacy, and synthetic data preserve data privacy in AI training, and why they matter for secure machine learning
Dev.to AI
📰
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Learn to preprocess data by encoding and scaling features for better machine learning model performance
Medium · Machine Learning
Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →