Python Tutorial: Decision-Tree for Regression
Want to learn more? Take the full course at https://learn.datacamp.com/courses/machine-learning-with-tree-based-models-in-python at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
Welcome back! In this video, you'll learn how to train a decision tree for a regression problem.
Recall that in regression, the target variable is continuous. In other words, the output of your model is a real value.
Let's motivate our discussion of regression by introducing the automobile miles-per-gallon dataset from the UCI Machine Learning Repository.
This dataset consists of 6 features corresponding to the characteristics of a car and a continuous target variable labeled mpg which stands for miles-per-gallon.
Our task is to predict the mpg consumption of a car given these six features.
To simplify the problem, here the analysis is restricted to only one feature corresponding to the displacement of a car. This feature is denoted by displ.
A 2D scatter plot of mpg versus displ shows that the mpg-consumption decreases nonlinearly with displacement. Note that linear models such as linear regression would not be able to capture such a non-linear trend.
Let's see how you can train a decision tree with scikit-learn to solve this regression problem.
Note that the features X and the labels y are already loaded in the environment.
First, import DecisionTreeRegressor from sklearn-dot-tree and the functions train_test_split() from sklearn-dot-model_selection and mean_squared_error as MSE() from sklearn-dot-metrics.
Then, split the data into 80%-train and 20%-test using train_test_split.
You can now instantiate the DecisionTreeRegressor() with a maximum depth of 4 by setting the parameter max_depth to 4. In addition, set the parameter min_sample_leaf to 0-dot-1 to impose a stopping condition in which each leaf has to contain at least 10% of the training data.
Now fit dt to the training set and predict the test set labels.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from DataCamp · DataCamp · 45 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
▶
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
SQL Server Tutorial: Date manipulation
DataCamp
R Tutorial: Intermediate Interactive Data Visualization with plotly in R
DataCamp
R Tutorial: Adding aesthetics to represent a variable
DataCamp
R Tutorial: Moving Beyond Simple Interactivity
DataCamp
Python Tutorial: Why use ML for marketing? Strategies and use cases
DataCamp
Python Tutorial: Preparation for modeling
DataCamp
Python Tutorial: Machine Learning modeling steps
DataCamp
R Tutorial: The prior model
DataCamp
R Tutorial: Data & the likelihood
DataCamp
R Tutorial: The posterior model
DataCamp
R Tutorial: An Introduction to plotly
DataCamp
R Tutorial: Plotting a single variable
DataCamp
R Tutorial: Bivariate graphics
DataCamp
Python Tutorial: Customer Segmentation in Python
DataCamp
Python Tutorial: Time cohorts
DataCamp
Python Tutorial: Calculate cohort metrics
DataCamp
Python Tutorial: Cohort analysis visualization
DataCamp
R Tutorial: Building Dashboards with flexdashboard
DataCamp
R Tutorial: Anatomy of a flexdashboard
DataCamp
R Tutorial: Layout basics
DataCamp
R Tutorial: Advanced layouts
DataCamp
Python Tutorial: Time Series Analysis in Python
DataCamp
Python Tutorial: Correlation of Two Time Series
DataCamp
Python Tutorial: Simple Linear Regressions
DataCamp
Python Tutorial: Autocorrelation
DataCamp
R Tutorial: The gapminder dataset
DataCamp
R Tutorial: The filter verb
DataCamp
R Tutorial: The arrange verb
DataCamp
R Tutorial: The mutate verb
DataCamp
R Tutorial: What is cluster analysis?
DataCamp
R Tutorial: Distance between two observations
DataCamp
R Tutorial: The importance of scale
DataCamp
R Tutorial: Measuring distance for categorical data
DataCamp
Python Tutorial: Plotting multiple graphs
DataCamp
Python Tutorial: Customizing axes
DataCamp
Python Tutorial: Legends, annotations, & styles
DataCamp
Python Tutorial: Introduction to iterators
DataCamp
Python Tutorial: Playing with iterators
DataCamp
Python Tutorial: Using iterators to load large files into memory
DataCamp
SQL Tutorial: Introduction to Relational Databases in SQL
DataCamp
SQL Tutorial: Tables: At the core of every database
DataCamp
SQL Tutorial: Update your database as the structure changes
DataCamp
Python Tutorial: Classification-Tree Learning
DataCamp
Python Tutorial: Decision-Tree for Classification
DataCamp
Python Tutorial: Decision-Tree for Regression
DataCamp
Python Tutorial: Census Subject Tables
DataCamp
Python Tutorial: Census Geography
DataCamp
Python Tutorial: Using the Census API
DataCamp
R Tutorial: A/B Testing in R
DataCamp
R Tutorial: Baseline Conversion Rates
DataCamp
R Tutorial: Designing an Experiment - Power Analysis
DataCamp
R Tutorial: Introduction to qualitative data
DataCamp
R Tutorial: Understanding your qualitative variables
DataCamp
R Tutorial: Making Better Plots
DataCamp
SQL Tutorial: OLTP and OLAP
DataCamp
SQL Tutorial: Storing data
DataCamp
SQL Tutorial: Database design
DataCamp
Python Tutorial: Introduction to spaCy
DataCamp
Python Tutorial: Statistical Models
DataCamp
Python Tutorial: Rule-based Matching
DataCamp
Related AI Lessons
⚡
⚡
⚡
⚡
Day 19 Part 2: Hashtag Trends & Discovery + Buffer API Lifecycle Pattern Discovery
Medium · Machine Learning
How to Build a Professional Grade Calculator in C Language [Full Source Code Included]
Dev.to AI
Beyond Overfitting: My Idea of a “Governor AI” That Supervises Learning Systems
Medium · Machine Learning
Mixed Integer Goal Programming for Personalized Meal Optimization with User-Defined Serving Granularity
ArXiv cs.AI
🎓
Tutor Explanation
DeepCamp AI