Why Experiment Tracking is Crucial to OpenAI

Weights & Biases · Advanced ·🏭 MLOps & LLMOps ·7y ago

Skills: Experiment Tracking90%ML Pipelines70%AI Systems Design60%

Key Takeaways

The video discusses the importance of experiment tracking in AI research, specifically in the context of OpenAI's robotics team, using tools like Weights & Biases to track and compare results across the team.

Full Transcript

I work on the robotics team at open AI where we try to build learning-based robots that can eventually do anything that humans should be able to do I worked on everything from figuring out the right algorithms to power these robots to building the equivalent of the sensory systems for these robots one of the things we've been working on in the project is to get a robotic hand to manipulate real objects so you can put a block in the hand and we can orient it to any orientation and this is a kind of a problem that had eluded the robot community for decades what that involves is programming computers such that they learn from the real world or say in arcades allowed in simulated worlds just as humans - you know as as children and adults - when we want to do new tasks if there's a learning process you won't get everything right on the first strike and I'm programming robots to have this more human-like learning based behavior before we started using weights and biases everybody had care of their own little setup of how they would get the results and so on like some people would be using tensor flow with tensor board some people would be using their own kind of homebrew version of some visualization tool and so on so everything was very fragile like if I want to share a piece of results with someone else the best I could usually hope for was a screenshot of my graph and like then paste it and send it to them in some some way over slack or over email what has changed now is that since we have like a common place where all our results are I can take the results of my colleague Lillian for example I can take whatever she has trained and I can compare that with what I trained we can create a quick report with that I can download the model that she had trained I can go in and look at other metrics very easily since I have all the raw data that I got asked her to make me a new screenshot it's reduced a lot of the overhead in communication to make us really focus on the on the communication that really matters about like what should we work on and what what are the most important things now rather than like what did your results look like two weeks ago that's a waste of time we use weights and biases with continuous integration a lot it's extremely important to see that your model don't regress you know it gives you a kind of sense of the pulse of the team of how quickly you're moving and so on but it's also an extremely good way of just having transparency in the work that you're doing with other people we have like 10 to 20 people working with our code base so at any point in time somebody could commit a change that breaks something the worst thing that can happen is that you find out after a few weeks that you have a regression and then you have like two weeks of commits to go through and figure out what went wrong then you lose easily a week or two of work thanks to Vice advisors I've just saved lots of lots of money just comparing results in general is much faster when I have all the data in one place in some ways kind of like a shared logbook for the team of our progress we do this a lot in our workflows comparing against old baselines and so on so we can kind of keep on having old runs available and compare against those over and over and over again it's a very transparent way of seeing how much your utilization is of your resources like do you use 10 percent or 90 percent of your CPU or GPU and you know we want to be at as close to hundred percent as possible so it's been a very very useful tool for us for just like saving money and you know it's you can call up your friend like why are you only using 10% of the GPU you can be running ten times as many experiments we're trying to build the robot brain the brain that could work with any robotic incarnation so I think it's it's kind of an enormous positive impact on the world to build general-purpose robots I want to be part of a figure out how to do that [Music]

Original Description

Peter Welinder from OpenAI Robotics talks about his research training a robotic hand to manipulate objects. He shares some details about his process using Weights & Biases to track his team's massive distributed training runs. https://www.wandb.com/blog/why-experiment-tracking-is-crucial-to-openai

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 7 of 60

← Previous Next →

0. What is machine learning?

0. What is machine learning?

Weights & Biases

1. Build Your First Machine Learning Model

1. Build Your First Machine Learning Model

Weights & Biases

Intro to ML: Course Overview

Intro to ML: Course Overview

Weights & Biases

2. Multi-Layer Perceptrons

2. Multi-Layer Perceptrons

Weights & Biases

3. Convolutional Neural Networks

3. Convolutional Neural Networks

Weights & Biases

Weights & Biases at OpenAI

Weights & Biases at OpenAI

Weights & Biases

Why Experiment Tracking is Crucial to OpenAI

Why Experiment Tracking is Crucial to OpenAI

Weights & Biases

4. Autoencoders

4. Autoencoders

Weights & Biases

5. Sentiment Analysis

5. Sentiment Analysis

Weights & Biases

6. Recurrent Neural Networks [RNNs]

6. Recurrent Neural Networks [RNNs]

Weights & Biases

7. Text Generation using LSTMs and GRUs

7. Text Generation using LSTMs and GRUs

Weights & Biases

8. Text Classification Using Convolutional Neural Networks

8. Text Classification Using Convolutional Neural Networks

Weights & Biases

9. Hybrid LSTMs [Long Short-Term Memory]

9. Hybrid LSTMs [Long Short-Term Memory]

Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Weights & Biases

Weights and Biases - Developer Tools for Deep Learning

Weights and Biases - Developer Tools for Deep Learning

Weights & Biases

Introducing Weights & Biases

Introducing Weights & Biases

Weights & Biases

10. Seq2Seq Models

10. Seq2Seq Models

Weights & Biases

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

Weights & Biases

12. One-shot learning for teaching neural networks to classify objects never seen before

12. One-shot learning for teaching neural networks to classify objects never seen before

Weights & Biases

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

Weights & Biases

14. Data Augmentation | Keras

14. Data Augmentation | Keras

Weights & Biases

15. Batch Size and Learning Rate in CNNs

15. Batch Size and Learning Rate in CNNs

Weights & Biases

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Weights & Biases

Grading Rubric for AI Applications with Sergey Karayev (2019)

Grading Rubric for AI Applications with Sergey Karayev (2019)

Weights & Biases

16. Video Frame Prediction using CNNs and LSTMs (2019)

16. Video Frame Prediction using CNNs and LSTMs (2019)

Weights & Biases

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Weights & Biases

17. Build and Deploy an Emotion Classifier (2019)

17. Build and Deploy an Emotion Classifier (2019)

Weights & Biases

Applied Deep Learning - Data Management with Josh Tobin (2019)

Applied Deep Learning - Data Management with Josh Tobin (2019)

Weights & Biases

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Weights & Biases

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Weights & Biases

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Weights & Biases

Designing a Machine Learning Project with Neal Khosla (2019)

Designing a Machine Learning Project with Neal Khosla (2019)

Weights & Biases

Lukas Beiwald on ML Tools and Experiment Management (2019)

Lukas Beiwald on ML Tools and Experiment Management (2019)

Weights & Biases

Building Machine Learning Teams with Josh Tobin (2019)

Building Machine Learning Teams with Josh Tobin (2019)

Weights & Biases

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Weights & Biases

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Weights & Biases

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Weights & Biases

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Weights & Biases

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Weights & Biases

Organizing ML projects — W&B walkthrough (2020)

Organizing ML projects — W&B walkthrough (2020)

Weights & Biases

Brandon Rohrer — Machine Learning in Production for Robots

Brandon Rohrer — Machine Learning in Production for Robots

Weights & Biases

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Weights & Biases

My experiments with Reinforcement Learning with Jariullah Safi

My experiments with Reinforcement Learning with Jariullah Safi

Weights & Biases

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Weights & Biases

Testing Machine Learning Models with Eric Schles

Testing Machine Learning Models with Eric Schles

Weights & Biases

How Linear Algebra is not like Algebra with Charles Frye

How Linear Algebra is not like Algebra with Charles Frye

Weights & Biases

Predicting Protein Structures using Deep Learning with Jonathan King

Predicting Protein Structures using Deep Learning with Jonathan King

Weights & Biases

Rachael Tatman — Conversational AI and Linguistics

Rachael Tatman — Conversational AI and Linguistics

Weights & Biases

Reformer by Han Lee

Reformer by Han Lee

Weights & Biases

Sequence Models with Pujaa Rajan

Sequence Models with Pujaa Rajan

Weights & Biases

GitHub Actions & Machine Learning Workflows with Hamel Husain

GitHub Actions & Machine Learning Workflows with Hamel Husain

Weights & Biases

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Weights & Biases

Jack Clark — Building Trustworthy AI Systems

Jack Clark — Building Trustworthy AI Systems

Weights & Biases

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Weights & Biases

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Weights & Biases

Antipatterns in open source research code with Jariullah Safi

Antipatterns in open source research code with Jariullah Safi

Weights & Biases

Attention for time series forecasting & COVID predictions - Isaac Godfried

Attention for time series forecasting & COVID predictions - Isaac Godfried

Weights & Biases

Made with ML - Goku Mohandas

Made with ML - Goku Mohandas

Weights & Biases

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Weights & Biases

Deep Learning Salon by Weights & Biases

Deep Learning Salon by Weights & Biases

Weights & Biases

The video highlights the importance of experiment tracking in AI research, using tools like Weights & Biases to improve collaboration, transparency, and resource utilization. The speaker shares their experience working on OpenAI's robotics team and how experiment tracking has improved their workflow.

Key Takeaways

Set up experiment tracking using Weights & Biases
Integrate with continuous integration
Compare results across the team
Optimize resource utilization
Use transparency to improve collaboration

💡 Experiment tracking is crucial for improving collaboration, transparency, and resource utilization in AI research, especially in distributed teams.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Experiment Tracking

View skill →

Reproducing Machine Learning Experiments with W&B

Reproducing Machine Learning Experiments with W&B

Weights & Biases

Track Your Keras Machine Learning Experiments with Weights & Biases

Track Your Keras Machine Learning Experiments with Weights & Biases

Weights & Biases

An Experiment Tracking Tutorial with Mlflow and Keras

An Experiment Tracking Tutorial with Mlflow and Keras

Automata Learning Lab

Trackio Tutorial: Hugging Face's new, FREE experiment tracking library

Trackio Tutorial: Hugging Face's new, FREE experiment tracking library

Track Your PyTorch Geometric Machine Learning Experiments with Weights & Biases

Track Your PyTorch Geometric Machine Learning Experiments with Weights & Biases

Weights & Biases

Exploring W&B workspace

Exploring W&B workspace

Weights & Biases

Related AI Lessons

DevOps Took 10 Years to Mature.

MLOps is distinct from DevOps and solves unique problems, requiring a different approach

Medium · DevOps

Praesto: A Kubernetes Operator for Node-Local ML Model Caching with CSI

Learn how Praesto, a Kubernetes Operator, optimizes ML model caching for Node-Local storage with CSI, reducing costs and improving performance

Medium · DevOps

Beyond `ollama run`: Production-Ready DeepSeek R1 Deployment with vLLM and Nginx

Learn to deploy DeepSeek R1 with vLLM and Nginx for production-ready environments, moving beyond local development

Dev.to · Shannon Dias

MCP Health Check: Building Production Monitoring for Your MCP Server — What I Learned After 84 Production Outages

Learn to build production monitoring for your MCP server to minimize outages and ensure smooth operation

Pole Pruner How A Rope Lever Shears High Branches

Innoforge Studio