Stratified Sampling: The Pollster Who Predicted a Landslide by Accidentally Surveying Only One Neighborhood
📰 Dev.to · Sachin Kr. Rajput
You randomly split your data 80/20. Your training set has 12% fraud cases. Your test set has 2%. Your model looks great in training, disasters in testing. The problem? Random sampling doesn't guarantee representative splits. Stratified sampling does. Here's why it's non-negotiable for imbalanced data.
DeepCamp AI