Entropy & Information Gain Explained (Decision Trees Math + Python) #machinelearning #datascience
Skills:
Supervised Learning61%
Entropy and Information Gain are the core mathematical concepts behind Decision Trees.
They help answer one question:
๐ Which feature should we split on?
Used in:
โ Decision Trees
โ Random Forest
โ Feature selection
โ Information theory
โ ML interviews
๐น 1. What is Entropy? (Uncertainty Measure)
Entropy measures how random or impure data is.
Formula:
H = โ ฮฃ p(x) ยท logโ(p(x))
Where:
p(x) = probability of class x
Example:
Dataset:
[Yes, Yes, No, No]
Probabilities:
p(Yes) = 2/4 = 0.5
p(No) = 2/4 = 0.5
Entropy:
H = โ (0.5 logโ 0.5 + 0.5 logโ 0.5)
H = 1
๐ Maximum uncertainty
Another Example:
[Yes, Yes, Yes, Yes]
H = 0
๐ No uncertainty (pure data)
๐น 2. What is Information Gain?
Information Gain tells:
๐ How much uncertainty is reduced after a split
Formula:
IG = H(parent) โ ฮฃ (weight ร H(child))
Example:
Parent entropy = 1
After split:
Left entropy = 0.9
Right entropy = 0.5
Weighted entropy = 0.7
IG = 1 โ 0.7 = 0.3
๐ Higher IG = better split
๐น 3. Why Decision Trees Use Entropy
At each node, tree tries to:
โ Maximize Information Gain
โ Reduce randomness
โ Create pure groups
This builds a structured decision path.
๐น 4. Python Code Explanation
In this code we:
โ Calculated entropy manually
โ Used logโ (important for information theory)
โ Computed information gain
โ Simulated dataset splitting
Tools used:
numpy
collections.Counter
๐น 5. Real-World Use Cases
Entropy & Information Gain are used in:
โ Credit risk prediction
โ Fraud detection
โ Medical diagnosis
โ Customer segmentation
โ Recommendation systems
๐น 6. Key Insight (Very Important)
Entropy:
High โ random
Low โ predictable
Information Gain:
High โ good split
Low โ useless split
๐ฏ INTERVIEW QUESTIONS (WITH ANSWERS)
Q1. What does entropy measure in ML?
A1. The uncertainty or impurity of a dataset.
Q2. Why is log base 2 used in entropy?
A2. To measure information in bits.
Q3. What is the goal of Information Gain?
A3. To reduce entropy after a split.
Q4. Wha
Watch on YouTube โ
(saves to browser)
Sign in to unlock AI tutor explanation ยท โก30
More on: Supervised Learning
View skill โRelated AI Lessons
โก
โก
โก
โก
Super Mario is mathier than you think
MIT Technology Review
A Geometry Puzzle With 3 Circles
Medium ยท Data Science
The Consecutive Integers Divisibility Trick
Medium ยท Programming
The Mayans Invented Zero Before Most of the World โ Here Is Their Number System in Python
Medium ยท Python
๐
Tutor Explanation
DeepCamp AI