Why Experiment Tracking is Crucial to OpenAI
Key Takeaways
The video discusses the importance of experiment tracking in AI research, specifically in the context of OpenAI's robotics team, using tools like Weights & Biases to track and compare results across the team.
Full Transcript
I work on the robotics team at open AI where we try to build learning-based robots that can eventually do anything that humans should be able to do I worked on everything from figuring out the right algorithms to power these robots to building the equivalent of the sensory systems for these robots one of the things we've been working on in the project is to get a robotic hand to manipulate real objects so you can put a block in the hand and we can orient it to any orientation and this is a kind of a problem that had eluded the robot community for decades what that involves is programming computers such that they learn from the real world or say in arcades allowed in simulated worlds just as humans - you know as as children and adults - when we want to do new tasks if there's a learning process you won't get everything right on the first strike and I'm programming robots to have this more human-like learning based behavior before we started using weights and biases everybody had care of their own little setup of how they would get the results and so on like some people would be using tensor flow with tensor board some people would be using their own kind of homebrew version of some visualization tool and so on so everything was very fragile like if I want to share a piece of results with someone else the best I could usually hope for was a screenshot of my graph and like then paste it and send it to them in some some way over slack or over email what has changed now is that since we have like a common place where all our results are I can take the results of my colleague Lillian for example I can take whatever she has trained and I can compare that with what I trained we can create a quick report with that I can download the model that she had trained I can go in and look at other metrics very easily since I have all the raw data that I got asked her to make me a new screenshot it's reduced a lot of the overhead in communication to make us really focus on the on the communication that really matters about like what should we work on and what what are the most important things now rather than like what did your results look like two weeks ago that's a waste of time we use weights and biases with continuous integration a lot it's extremely important to see that your model don't regress you know it gives you a kind of sense of the pulse of the team of how quickly you're moving and so on but it's also an extremely good way of just having transparency in the work that you're doing with other people we have like 10 to 20 people working with our code base so at any point in time somebody could commit a change that breaks something the worst thing that can happen is that you find out after a few weeks that you have a regression and then you have like two weeks of commits to go through and figure out what went wrong then you lose easily a week or two of work thanks to Vice advisors I've just saved lots of lots of money just comparing results in general is much faster when I have all the data in one place in some ways kind of like a shared logbook for the team of our progress we do this a lot in our workflows comparing against old baselines and so on so we can kind of keep on having old runs available and compare against those over and over and over again it's a very transparent way of seeing how much your utilization is of your resources like do you use 10 percent or 90 percent of your CPU or GPU and you know we want to be at as close to hundred percent as possible so it's been a very very useful tool for us for just like saving money and you know it's you can call up your friend like why are you only using 10% of the GPU you can be running ten times as many experiments we're trying to build the robot brain the brain that could work with any robotic incarnation so I think it's it's kind of an enormous positive impact on the world to build general-purpose robots I want to be part of a figure out how to do that [Music]
Original Description
Peter Welinder from OpenAI Robotics talks about his research training a robotic hand to manipulate objects. He shares some details about his process using Weights & Biases to track his team's massive distributed training runs.
https://www.wandb.com/blog/why-experiment-tracking-is-crucial-to-openai
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Weights & Biases · Weights & Biases · 7 of 60
1
2
3
4
5
6
▶
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
0. What is machine learning?
Weights & Biases
1. Build Your First Machine Learning Model
Weights & Biases
Intro to ML: Course Overview
Weights & Biases
2. Multi-Layer Perceptrons
Weights & Biases
3. Convolutional Neural Networks
Weights & Biases
Weights & Biases at OpenAI
Weights & Biases
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
4. Autoencoders
Weights & Biases
5. Sentiment Analysis
Weights & Biases
6. Recurrent Neural Networks [RNNs]
Weights & Biases
7. Text Generation using LSTMs and GRUs
Weights & Biases
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
Introducing Weights & Biases
Weights & Biases
10. Seq2Seq Models
Weights & Biases
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
14. Data Augmentation | Keras
Weights & Biases
15. Batch Size and Learning Rate in CNNs
Weights & Biases
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
Organizing ML projects — W&B walkthrough (2020)
Weights & Biases
Brandon Rohrer — Machine Learning in Production for Robots
Weights & Biases
Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Weights & Biases
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
Testing Machine Learning Models with Eric Schles
Weights & Biases
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
Rachael Tatman — Conversational AI and Linguistics
Weights & Biases
Reformer by Han Lee
Weights & Biases
Sequence Models with Pujaa Rajan
Weights & Biases
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Weights & Biases
Jack Clark — Building Trustworthy AI Systems
Weights & Biases
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
Made with ML - Goku Mohandas
Weights & Biases
Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases
More on: Experiment Tracking
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
DevOps Took 10 Years to Mature.
Medium · DevOps
Praesto: A Kubernetes Operator for Node-Local ML Model Caching with CSI
Medium · DevOps
Beyond `ollama run`: Production-Ready DeepSeek R1 Deployment with vLLM and Nginx
Dev.to · Shannon Dias
MCP Health Check: Building Production Monitoring for Your MCP Server — What I Learned After 84 Production Outages
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI