What is machine learning, and how does it work?
Key Takeaways
This video series covers the basics of machine learning using scikit-learn in Python, starting with the definition and categories of machine learning, and then diving into supervised and unsupervised learning, with a focus on supervised learning and its applications.
Full Transcript
have you heard about this concept called machine learning and you're trying to figure out exactly what that means or maybe you've checked out a few machine learning competitions on kaggle.com but you don't know how to get started if so I'm here to help my name is Kevin Markham and I'm a data science instructor in Washington DC this is my brand new video series about how to use the p learn library in Python for machine learning this is material that I love to teach and I can't wait to share with you in this series I'm going to cover s kit learn from the basics all the way through Advanced Techniques I'm not going to presume any familiarity with machine learning and in fact we're going to spend the next few videos talking about machine learning before we write any code the reason being there's really no point to using scikit learn if you don't know how to do proper machine learning you will need to have at least minimal experience with the Python programming language but I'll suggest some resources in the next video if you don't yet know python so with that let's get started in this video I'll be covering the following topics what is machine learning what are the two main categories of machine learning what are some examples of machine learning and how does machine learning work so what exactly is machine learning there's no Universal definition but at a high level I would Define machine learning as the semi-automated extraction of Knowledge from data let's break that down into three component parts first machine learning always starts with data and your goal is to extract knowledge or Insight from that data you have a question you're trying to answer and you hypothesize that your question might be answerable using the data second machine learning involves some amount of automation rather than trying to gather your insights from the data manually you're applying some process or algorithm to the data using a computer so that the computer can help to provide the Insight third machine learning is not a fully automated process as any practitioner can tell you machine learning requires you to make many smart decisions in order for the process to be successful we'll cover many of those decisions throughout this video series next let's talk about the two main categories of machine learning which are supervised learning and unsupervised learning supervised learning also known as predictive modeling is the process of making predictions using data for example if my data set is a series of email messages my supervised learning task might be to predict whether each email message is Spam or non-spam which is also known as ham this is supervised learning because there is a specific outcome we are trying to predict namely ham or spam in contrast unsupervised learning is the process of extracting structure from data or learning how to best represent data for example if my data set was the characteristics and purchasing behavior of Shoppers at a grocery store my unsupervised learning task might be to segment the Shoppers into groups or clusters that exhibit similar behaviors I might find that college students parents with young children and older adults have characteristic shopping behaviors that are similar within each group but dissimilar from the other two groups this is an unsupervised learning task because there is no right or wrong answer about how many clusters can be found in the data which people belong in which cluster or even how to describe each cluster let's do a quick quiz this is the kaggle website which is a popular platform for machine learning competitions this is their well-known Titanic competition and the goal is to predict which passengers survived the tragic sinking of the Titanic is this supervised or unsupervised learning this is supervised learning because your goal is to predict a specific outcome namely survival for each passenger in this video series I'm going to primarily Focus on supervised learning though I may cover unsupervised learning in later videos we've talked about what supervised learning is but we haven't yet talked about how it works so how does it actually work at a very high level here are the two main steps of supervised learning first you train a machine learning model using your existing labeled data labeled data is data which has been labeled with the outcome which in the case of the email example is whether each message is ham or spam this is called Model training because the model is learning the relationship between the attributes of the data and the outcome these attributes might include the message text the number of embedded links LS the length of the message and so on second you make predictions on new data for which you don't know the true outcome in other words when a new email message arrives you want your trained model to accurately predict whether the email is ham or spam without a human examining it to summarize these two steps you could say that the model is learning from past examples made up of inputs and outputs and then applying what it is learned to Future inputs in order to predict future outputs because you're making predictions on unseen data which is data that was not used to train the model it is often said that the primary goal of supervised learning is to build models that generalize in other words you want to build machine learning models that accurately predict the labels of your future emails rather than accurately predicting the labels of emails you've already received this simplified description of machine learning might raise some questions in your mind such as how do I choose which attributes of my data to include in the model how do I choose which model to use how do I optimize this model for best performance how do I ensure that I'm building a model that will generalize to unseen data can I estimate how well my model is likely to perform on unseen data these are excellent questions and hint at the complexity of doing effective machine learning all of these issues will be addressed later in the video series if you'd like a more in-depth introduction to machine learning there are two resources that I recommend that I've linked to below the video the First Resource is my favorite book on machine learning an introduction to statistical learning by Trevor Hasty and Rob tipani it's available as a free pdf download and section 2.1 introduces machine learning in a thorough yet accessible way the second resource I recommend is a 13-minute video from caltech's learning from data course which uses some excellent examples to compare supervised and unsupervised learning and also introduces another type of machine learning called reinforcement learning in the next video in the series I'll be covering the benefits and the drawbacks of psychic learn as well as my recommended way to set up python for machine learning in the meantime I'd love to hear from you in the YouTube comments if you have a question about machine learning or if you just have a cool example of machine learning that you like to share please do subscribe on YouTube if you'd like to hear the moment my next video comes out thanks for watching and I'll see you soon
Original Description
Have you heard of "machine learning", and you're trying to figure out exactly what that means? I'll give you my definition, provide some examples of machine learning, and explain at a high level how machine learning "works".
Download the notebook: https://github.com/justmarkham/scikit-learn-videos
An Introduction to Statistical Learning: http://www-bcf.usc.edu/~gareth/ISL/
Learning Paradigms: http://work.caltech.edu/library/014.html
WANT TO GET BETTER AT MACHINE LEARNING? HERE ARE YOUR NEXT STEPS:
1) WATCH my scikit-learn video series:
https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A
2) SUBSCRIBE for more videos:
https://www.youtube.com/dataschool?sub_confirmation=1
3) JOIN "Data School Insiders" to access bonus content:
https://www.patreon.com/dataschool
4) ENROLL in my Machine Learning course:
https://www.dataschool.io/learn/
5) LET'S CONNECT!
- Newsletter: https://www.dataschool.io/subscribe/
- Twitter: https://twitter.com/justmarkham
- Facebook: https://www.facebook.com/DataScienceSchool/
- LinkedIn: https://www.linkedin.com/in/justmarkham/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Data School · Data School · 13 of 60
1
2
3
4
5
6
7
8
9
10
11
12
▶
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Setting up Git and GitHub
Data School
Navigating a GitHub Repository - Part 1
Data School
Forking a GitHub Repository
Data School
Creating a New GitHub Repository
Data School
Copying a GitHub Repository to Your Local Computer
Data School
Committing Changes in Git and Pushing to a GitHub Repository
Data School
Syncing Your GitHub Fork
Data School
Allstate Purchase Prediction Challenge on Kaggle
Data School
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
Hands-on dplyr tutorial for faster data manipulation in R
Data School
ROC Curves and Area Under the Curve (AUC) Explained
Data School
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
What is machine learning, and how does it work?
Data School
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
Getting started in scikit-learn with the famous iris dataset
Data School
Training a machine learning model with scikit-learn
Data School
Comparing machine learning models in scikit-learn
Data School
Data science in Python: pandas, seaborn, scikit-learn
Data School
Selecting the best model in scikit-learn using cross-validation
Data School
How to find the best model parameters in scikit-learn
Data School
How to evaluate a classifier in scikit-learn
Data School
What is pandas? (Introduction to the Q&A series)
Data School
How do I read a tabular data file into pandas?
Data School
How do I select a pandas Series from a DataFrame?
Data School
Why do some pandas commands end with parentheses (and others don't)?
Data School
How do I rename columns in a pandas DataFrame?
Data School
How do I remove columns from a pandas DataFrame?
Data School
How do I sort a pandas DataFrame or a Series?
Data School
How do I filter rows of a pandas DataFrame by column value?
Data School
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
Your pandas questions answered!
Data School
How do I use the "axis" parameter in pandas?
Data School
How do I use string methods in pandas?
Data School
How do I change the data type of a pandas Series?
Data School
When should I use a "groupby" in pandas?
Data School
How do I explore a pandas Series?
Data School
How do I handle missing values in pandas?
Data School
What do I need to know about the pandas index? (Part 1)
Data School
What do I need to know about the pandas index? (Part 2)
Data School
How do I select multiple rows and columns from a pandas DataFrame?
Data School
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
When should I use the "inplace" parameter in pandas?
Data School
How do I make my pandas DataFrame smaller and faster?
Data School
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
More of your pandas questions answered!
Data School
How do I create dummy variables in pandas?
Data School
How do I work with dates and times in pandas?
Data School
How do I find and remove duplicate rows in pandas?
Data School
How do I avoid a SettingWithCopyWarning in pandas?
Data School
How do I change display options in pandas?
Data School
How do I create a pandas DataFrame from another object?
Data School
How do I apply a function to a pandas Series or DataFrame?
Data School
Getting started with machine learning in Python (webcast)
Data School
Q&A about Machine Learning with Text (online course)
Data School
Your pandas questions answered! (webcast)
Data School
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
Write Pythonic Code for Better Data Science (webcast)
Data School
Web scraping in Python (Part 1): Getting started
Data School
Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Data School
Web scraping in Python (Part 3): Building a dataset
Data School
More on: Supervised Learning
View skill →Related Reads
📰
📰
📰
📰
What Is MLIR and Why Does It Exist?
Dev.to · Fedor Nikolaev
Why Choosing the Right Machine Learning Development Company Matters More Than the AI Model
Medium · Machine Learning
Data privacy in AI training: federated learning, differential privacy, and synthetic data
Dev.to AI
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Machine Learning
🎓
Tutor Explanation
DeepCamp AI