Lightning Talk: Why Logging Isn’t Enough: Making PyTorch Training Regressions Vi... Sahana Venkatesh

PyTorch · Intermediate ·📊 Data Analytics & Business Intelligence ·2w ago
Lightning Talk: Why Logging Isn’t Enough: Making PyTorch Training Regressions Visible in Practice - Sahana Venkatesh, Wayve PyTorch teams often log rich training metrics, yet still discover training regressions late after significant developer time and GPU budget have already been spent. In this talk, I’ll share a practical pattern we used to turn PyTorch training metrics into an operational guardrail for large-model training. The approach combines scheduled short and long training runs, standardized performance and stability metrics (throughput, memory, loss, divergence), and simple statistical baselines to automatically surface regressions via alerts without hard gates or complex infrastructure. I’ll focus on why logging alone is insufficient, how we chose what to monitor, and what tradeoffs we encountered (false positives, alert fatigue, baseline drift). The goal is not a tool demo, but a reusable pattern other PyTorch teams can adapt to catch training regressions earlier and make retraining more predictable.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Data From Cars: The Hidden Information Modern Vehicles Collect — Joseph Sides
Modern cars collect vast amounts of data, including location, driving habits, and personal info, raising privacy concerns and potential uses for safety, maintenance, and convenience.
Medium · AI
Building a Video Analytics Dashboard with Go and Chart.js
Learn to build a video analytics dashboard with Go and Chart.js to serve multi-region analytics
Dev.to · ahmet gedik
Why Your Area Calculation Is Wrong in GeoPandas (And How to Fix It)
Learn why GeoPandas area calculations may be incorrect and how to fix them for accurate GIS analysis
Medium · Data Science
Why Your Area Calculation Is Wrong in GeoPandas (And How to Fix It)
Learn how to correctly calculate area in GeoPandas and fix common errors
Medium · Python
Up next
Real Estate Finance & Credit Analysis
Coursera
Watch →