My top 50 scikit-learn tips
If you already know the basics of scikit-learn, but you want to be more efficient and get up-to-date with the latest features, then THIS is the video for you.
My name is Kevin Markham, and I've been teaching Machine Learning in Python with scikit-learn for more than 8 years. Over the next 3 hours, I'm going to share with you my top 50 scikit-learn tips.
Each tip ranges from 2 to 8 minutes, and you can use the timestamp links below to skip along if you're already familiar with a particular tip.
👩💻 Code: https://github.com/justmarkham/scikit-learn-tips
🤖 Learn ML from me: https://courses.…
Watch on YouTube ↗
(saves to browser)
Chapters (31)
Introduction
1:03
1. Transform data with ColumnTransformer
4:19
2. Seven ways to select columns
8:18
3. "fit" vs "transform"
10:53
4. Don't use "fit" on new data!
15:05
5. Don't use pandas for preprocessing!
19:00
6. Encode categorical features
24:07
7. Handle new categories in testing data
27:16
8. Chain steps with Pipeline
30:19
9. Encode "missingness" as a feature
33:12
10. Why set a random state?
35:40
11. Better ways to impute missing values
41:22
12. Pipeline vs make_pipeline
44:08
13. Inspect a Pipeline
47:03
14. Handle missing values automatically
49:47
15. Don't drop the first categorical level
54:15
16. Tune a Pipeline
1:01:09
17. Randomized search vs grid search
1:05:42
18. Examine grid search results
1:08:10
19. Logistic regression tuning parameters
1:12:41
20. Plot a confusion matrix
1:15:37
21. Plot multiple ROC curves
1:17:21
22. Use the correct Pipeline methods
1:18:59
23. Access model coefficients
1:20:11
24. Visualize a decision tree
1:23:57
25. Improve a decision tree by pruning it
1:25:23
26. Use stratified sampling when splitting data
1:29:40
27. Impute missing values for categoricals
1:32:10
28. Save a model or Pipeline
1:33:47
29. Add multiple text columns to a model
1:35:35
30. More ways to inspect a Pip
Playlist
Uploads from Data School · Data School · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
What is the difference between Git and GitHub?
Data School
Setting up Git and GitHub
Data School
Navigating a GitHub Repository - Part 1
Data School
Navigating a GitHub Repository - Part 2
Data School
Forking a GitHub Repository
Data School
Creating a New GitHub Repository
Data School
Copying a GitHub Repository to Your Local Computer
Data School
Committing Changes in Git and Pushing to a GitHub Repository
Data School
Syncing Your GitHub Fork
Data School
Additional Resources for Learning Git and GitHub
Data School
Allstate Purchase Prediction Challenge on Kaggle
Data School
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
Hands-on dplyr tutorial for faster data manipulation in R
Data School
ROC Curves and Area Under the Curve (AUC) Explained
Data School
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
What is machine learning, and how does it work?
Data School
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
Getting started in scikit-learn with the famous iris dataset
Data School
Training a machine learning model with scikit-learn
Data School
Comparing machine learning models in scikit-learn
Data School
Data science in Python: pandas, seaborn, scikit-learn
Data School
Selecting the best model in scikit-learn using cross-validation
Data School
How to find the best model parameters in scikit-learn
Data School
How to evaluate a classifier in scikit-learn
Data School
What is pandas? (Introduction to the Q&A series)
Data School
How do I read a tabular data file into pandas?
Data School
How do I select a pandas Series from a DataFrame?
Data School
Why do some pandas commands end with parentheses (and others don't)?
Data School
How do I rename columns in a pandas DataFrame?
Data School
How do I remove columns from a pandas DataFrame?
Data School
How do I sort a pandas DataFrame or a Series?
Data School
How do I filter rows of a pandas DataFrame by column value?
Data School
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
Your pandas questions answered!
Data School
How do I use the "axis" parameter in pandas?
Data School
How do I use string methods in pandas?
Data School
How do I change the data type of a pandas Series?
Data School
When should I use a "groupby" in pandas?
Data School
How do I explore a pandas Series?
Data School
How do I handle missing values in pandas?
Data School
What do I need to know about the pandas index? (Part 1)
Data School
What do I need to know about the pandas index? (Part 2)
Data School
How do I select multiple rows and columns from a pandas DataFrame?
Data School
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
When should I use the "inplace" parameter in pandas?
Data School
How do I make my pandas DataFrame smaller and faster?
Data School
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
More of your pandas questions answered!
Data School
How do I create dummy variables in pandas?
Data School
How do I work with dates and times in pandas?
Data School
How do I find and remove duplicate rows in pandas?
Data School
How do I avoid a SettingWithCopyWarning in pandas?
Data School
How do I change display options in pandas?
Data School
How do I create a pandas DataFrame from another object?
Data School
How do I apply a function to a pandas Series or DataFrame?
Data School
Getting started with machine learning in Python (webcast)
Data School
Q&A about Machine Learning with Text (online course)
Data School
Your pandas questions answered! (webcast)
Data School
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
Write Pythonic Code for Better Data Science (webcast)
Data School
DeepCamp AI