Precision and recall > .90 on holdout data

📰 Reddit r/datascience

I'm running ML models (XGBoost and elastic net logistic regression) predicting a 0/1 outcome in a post period based on pre period observations in a large unbalanced dataset. I've undersampled from the majority category class to achieve a balanced dataset that fits into memory and doesn't take hours to run. I understand sampling can distort precision or recall metrics. However I'm testing model performance on a raw holdout dataset (no sampling or rebalanc

Published 6 Apr 2026
Read full article → ← Back to Reads