deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

DeepLearningAI · Beginner ·📐 ML Fundamentals ·8y ago

Skills: LLM Foundations80%Fine-tuning LLMs70%Prompt Craft60%Advanced Prompting50%Prompt Systems Engineering50%

Key Takeaways

The video features Pieter Abbeel discussing his work in deep reinforcement learning, including autonomous helicopter flight and robot learning, as well as the challenges and open questions in the field, such as exploration, credit assignment, and safety, and recommends self-study and trying out frameworks like TensorFlow and PyTorch to get started in AI.

Full Transcript

so thanks a lot Peter for joining me today um I think a lot of people know you as a well known machine learning and deep learning and robotics researcher like to have people here but about your story how did you end up doing the work that you do yeah it's a it's a good question and actually if you would have asked me as a as a 14 year old what I was aspiring to do it probably would not have been this in fact at the time I thought being a professional basketball player would be the right way to go I don't think I was able to achieve it I feel that machine let me left out that DeVos ball didn't work out yeah that didn't work out it was a lot of fun playing basketball but it didn't work out to try to make it into a career so what I really liked in school was physics and math and so from there it seemed pretty natural to study engineering which is applying physics and math in the real world and I should them after my undergrad in electrical engineering a she wasn't so sure what the Duke is which we anything engineering seemed interesting to me like understanding how anything works seems interesting trying to build anything is interesting and in some sense artificial intelligence went out because it seemed like it could somehow help all disciplines in some way and also it seems somehow a little more at the core of everything like you think about how a machine can think then maybe that's more the core of everything else than picking any specific discipline and saying you know a is the new electricity sounds like the fourteen-year-old here's know if you had an earlier version of that event you know in the past few years you've done a lot of work in deep reinforcement learning what's happening why is deeper enforcement learning suddenly taking off before I worked in deep reinforcement learning and I work a lot in reinforcement learning actually with you and ER at Stanford of course and so we worked on autonomous helicopter flight then later at Berkeley with some of my students who worked on getting a robot to learn to fall laundry and kind of what characterized the work was a combination of learning that enabled things that would not be possible without learning but also a lot of domain expertise in with the learning to get this to work and it it was very interesting because you needed the main expertise which is fun to acquire but at the same time was very time-consuming for every new application you wanted to succeed you need domain expertise plus machine learning expertise and for me was in 2012 with the imagenet breakthrough results from Jeff Hinton's group in Toronto Alex Ned showing that supervised learning all of a sudden could be done with far less engineering for the domain at hand there was very low engineering by a vision in Alex net made me think we really should revisit reinforcement learning under the same kind of viewpoint and see if we can get the deep version of reinforcement learning to work and do equally interesting things as had just happened in the supervised learning and so you know sounds like you saw earlier than most people the potential of deeper enforcement of learning so now looking into the future what do you see next what your prediction so that makes several ways to come in deeper also learning so I think what's interesting about deep reinforcement learning is down in some soon as there is many more questions none in supervised learning in supervised learning is about learning an input-output mapping with reinforcement learning there is the notion of where does the data even come from so that's the exploration problem when you have data how do you do credit assignment how do you understand what actions you took early on got you the reward later and then there's issues of safety when you have a system of times like collecting data essentially rather dangerous in most situations imagine the self-driving car company that says we're just gonna run deep reinforcement learning it's pretty likely that car will get into a lot of accidents before it does anything useful you need two negative examples of those oh right you do need some negative examples somehow yeah send positive ones hopefully so I think there's still a lot of challenges in deep reinforcement learning in terms of working out some of the specifics of how to get these things to work so the deep part is the representation within the reinforcement learning itself still has a lot of questions and what I feel is that with the advanced advances in deep learning somehow one part of the puzzle in reinforcement learning has been largely addressed which is the representation part so if if there is a pattern we can probably represent it with a deep network and capture that pattern and then how that tease apart the pattern is still a big challenge in reinforcement learning so I think big challenges are how to get systems to reason over long time horizons so right now a lot of the successes in deep reinforcement learning are very short horizon there are problems where if you act well over a five second horizon you act well over the entire problem and so a five second skill was something very different from a day-long skill or the ability to live a life as a robot or some software agent so I think there's a lot of challenges there I think safety has a lot of challenges in terms of how do you learn learn safely and also how do you keep learning once you're already pretty good so give an example again that a lot of people would be familiar with self-driving cars for a self-driving car to be better than a human driver I should that human drivers may begin to accidents bad accidents every three million miles or something and so that takes a long time to see the negative data once you're as good as a human driver but you want your cell down in car to be better than a human driver and so at that point the data collection becomes really really difficult to get that interesting data that makes your system improve there's a lot of challenges related to exploration that tie into that but one of the things I'm actually most excited about right now is seeing if we can actually take a step back and also learn the reinforcement learning algorithm so reinforcement is very complex credits and very complex explorations very complex and so maybe just like how deep learning for supervised learning was able to replace a lot of domain expertise maybe we can have programs that are learned that are reinforcement learning programs and the do all this instead of us designing the details the river-water functional during the whole program so this would be learning the entire reinforcement learning program so it would be imagine you have a reinforcement learning program whatever it is and you you throw it out some problem and then you see how long it takes to learn and then you say well that took a while now let another program modify this ring for learning program after the modification see how fast it learns if it learns more quickly that was a good modification and maybe you keep it and improve from there Wow I see right yeah this is direction yeah it's I think it's a lot to do with maybe the amount of compute that's becoming available so the more this would be running reinforcement learning in the inner loop whereas right now we were unreinforced millenials the final thing and so the more compute we get the more it becomes possible to maybe run something like reinforcement learning in the inner loop of a bigger algorithm so you know starting from the 14 euro you you've worked in the iPhone maybe what some 20-plus years now so so tell me a bit about how your understanding of AI has evolved over this at this time yeah so when I started looking at AI sorry interesting cuz it really coincided with coming to Stanford to do my master's degree there and there were some icons there like John McCarthy who I got to talk with but who had a very different approach to and in the year 2004 what most people are doing at the time but also talking with Daphne Koller and I think a lot of my initial thinking of AI was shaped by Daphne's thinking her AI class her brow was the graphical models class and kind of really being intrigued by how simply a distribution over many random variables and then being able to condition on some subsets of variables and throwing conclusions about others could actually give you so much if you can somehow make it computationally intractable which was definitely the challenge to make it computable and then from there when I start my PhD and her you you arrived at Stanford and I think you gave me a really good reality check that's that's not the right metric to evaluate your your work by and to really try to see the the connection from what you're working on to what impact it can can really have what change it can make rather than what's the math that happened to to be in your work right this doesn't mean I I that I did not realize that we got through that yeah it's actually one of the things that most often to people people asking them once if you're gonna cite only one thing that has stuck with you from Andrews advice it's it's making sure you can see the connection to where it's actually gonna do something um you know you've had and you're continuing to have an amazing career in AI so for some of the people you know listening to you on video now if they want to also enter or pursue a career in AI what what advice do you have for them I think it's a really good time to get into artificial intelligence it's if you look at the demand for for people it's so high there is so many so many job opportunities so many things you can do research wise build new companies and so forth so I would say yes it's definitely a smart decision in terms of actually getting going a lot of it you can self study whether you're in school or not there is a lot of online courses in your machine learning course there is also for example Andrea Kerr posses deep learning course which has videos online which is a great way to get started at Berklee do is a deep reinforced learning course which is all the lectures online so those are all good places to get started I think a big part of what what's important is to to make sure you try things yourself so not just read things or and watch videos but try things out with frameworks like tensorflow chainer Theano pi torch and so forth and then whatever is your favorite just it's very easy to get going and get something up and running very quickly to get the practice yourself very good implementing and see what works and see what doesn't work so this past week there was an article in Mashable about a 16 year old and United Kingdom who is one of the leaders on Carroll competitions and he just said he just went out and learn things found things online learned everything himself and never actually took any formal course per se and there is a 16 year old just being very competitive in Carroll competitions so it's definitely possible yeah we live in good times very people that one has learned absolutely one question I bet you get all sometimes is if someone wants to you know enter AI machine learning deep learning should they apply for a ph.d program or should they get the job as big company I think a lot of it has to do with maybe how much mentoring you can get so in a ph.d program you're essentially guaranteed the job of the professor is who is your advisor is to look out for you try to do everything they can to kind of shape you help you become stronger at whatever you want to do for example AI and so there's a very clear dedicated person sometimes you have to advise it and that's that's literally a job and that's why they are professors that's most of what they like about being professors often is helping shape students to become more capable at things now it doesn't mean it's not possible at companies and many companies have really good mentors and have people who love to help educate people who come in and strengthen them in so forth it's just it might not be as much of a guarantee and a given compared to actually enrolling in a ph.d program where that's the crux of the program is that you're gonna learn and somebody is there to help you learn yes so it really depends on the company and depends on the ph.d program absolutely yeah but I think it is key that then you can learn a lot on your own but I think you can learn a lot faster if you have somebody who's more experienced to is actually taking it up as their responsibility to spend time with you and help accelerate your progress so you know you've been one of the most visible leaders in deep reinforcement learning so what are the things that deep reinforcement learning is already working really well at I think if you look at some deep reinforce learning successes it's it's very very intriguing for example learning to play Atari games from pixels processing this pixels which is just numbers that are being processed somehow and turning to joystick actions then for example some of the work we did at Berkeley where we have a simulated robot inventing walking and the reward that it's given is as simple as the further you go north the better and the less hard you impact with the ground the better and somehow it decides that walking / running is the thing to invent whereas nobody showed it while walking is or running is or robot playing with children poison learned to kind of put them together put a block into a matching opening and so forth and so I think I think it's really just alert from raw sensory inputs all the way to raw controls for example torques at the motors but at the same time so it's very interesting that you can have a single algorithm for example no trustees and policy items is you can learn can have a robot learn to run can ever robot learn to stand up can have instead of a two legged robot now you're swapping a four-legged robot you run the same reinforce learning algorithm and it still learns to run and so there's no changing the reinforcement algorithm it's very very general same for the Atari games dqn was the same dqn for every one of the games but then when it actually starts hitting the frontiers of what's not yet possible as well it's it's it's nice it learns from scratch for each one of these tasks but it would be even nicer if it could reuse things that's learned in the past to learn even more quickly for the next task and that's something that that's still at the frontier and not yet possible it always starts from scratch essentially how quickly do you think you see deeper for learning get deployed in the robots around us or the robots in you know they're getting deployed in it well today I think in practice the realistic scenario is one where it's it starts with supervised learning behavioral cloning humans do them do the work and I think actually a lot of businesses will be built that way where it's a human behind the scenes doing a lot of the work imagine facebook messenger assistant a system like that could be built with a human behind the curtains doing a lot of the work machine learning matches up with what the human does and starts making suggestions to the humans sodium has a small number of options available you can just to click and select and then over time as it gets pretty good you start infusing some reinforcement learning where you give it actual objectives not just matching the human behind the curtains but give it objectives of achievement like maybe how fast were these two people able to plan their their meeting or how fast were they able to book their flight or things like that how long did it take how happy were they with it but it would probably be bootstrapped of a lot of behavioral cloning of humans showing how this could be done so saw the behavioral clothing just supervised learning to mimic whatever the person is doing and then gradually layer on the reinforcement learning to have it think about longer time horizons is that a fair summary I'd say so yeah just because straight up reinforcement learning from scratch is is really fun to watch it's it's super intriguing and very few things more fun to watch than a reinforced learning robot starting from nothing and inventing things but it's just time consuming and it's not always safe thank you very much that was fascinating but I'm really glad we had the chance to chat well and ER thank you for having me very much appreciate it

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 5 of 60

← Previous Next →

Forward and Backward Propagation (C1W4L06)

Forward and Backward Propagation (C1W4L06)

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

Using an Appropriate Scale (C2W3L02)

Using an Appropriate Scale (C2W3L02)

Gradient Checking (C2W1L13)

Gradient Checking (C2W1L13)

Gradient Checking Implementation Notes (C2W1L14)

Gradient Checking Implementation Notes (C2W1L14)

Learning Rate Decay (C2W2L09)

Learning Rate Decay (C2W2L09)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Mini Batch Gradient Descent (C2W2L01)

Mini Batch Gradient Descent (C2W2L01)

The Problem of Local Optima (C2W3L10)

The Problem of Local Optima (C2W3L10)

Exponentially Weighted Averages (C2W2L03)

Exponentially Weighted Averages (C2W2L03)

Tuning Process (C2W3L01)

Tuning Process (C2W3L01)

Understanding Exponentially Weighted Averages (C2W2L04)

Understanding Exponentially Weighted Averages (C2W2L04)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Gradient Descent With Momentum (C2W2L06)

Gradient Descent With Momentum (C2W2L06)

Normalizing Activations in a Network (C2W3L04)

Normalizing Activations in a Network (C2W3L04)

Hyperparameter Tuning in Practice (C2W3L03)

Hyperparameter Tuning in Practice (C2W3L03)

Adam Optimization Algorithm (C2W2L08)

Adam Optimization Algorithm (C2W2L08)

RMSProp (C2W2L07)

RMSProp (C2W2L07)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Why Does Batch Norm Work? (C2W3L06)

Why Does Batch Norm Work? (C2W3L06)

Batch Norm At Test Time (C2W3L07)

Batch Norm At Test Time (C2W3L07)

Softmax Regression (C2W3L08)

Softmax Regression (C2W3L08)

Deep Learning Frameworks (C2W3L10)

Deep Learning Frameworks (C2W3L10)

Neural Network Overview (C1W3L01)

Neural Network Overview (C1W3L01)

Training Softmax Classifier (C2W3L09)

Training Softmax Classifier (C2W3L09)

Why Deep Representations? (C1W4L04)

Why Deep Representations? (C1W4L04)

Gradient Descent For Neural Networks (C1W3L09)

Gradient Descent For Neural Networks (C1W3L09)

Neural Network Representations (C1W3L02)

Neural Network Representations (C1W3L02)

TensorFlow (C2W3L11)

TensorFlow (C2W3L11)

Activation Functions (C1W3L06)

Activation Functions (C1W3L06)

Explanation For Vectorized Implementation (C1W3L05)

Explanation For Vectorized Implementation (C1W3L05)

Getting Matrix Dimensions Right (C1W4L03)

Getting Matrix Dimensions Right (C1W4L03)

Understanding Dropout (C2W1L07)

Understanding Dropout (C2W1L07)

Building Blocks of a Deep Neural Network (C1W4L05)

Building Blocks of a Deep Neural Network (C1W4L05)

Why Non-linear Activation Functions (C1W3L07)

Why Non-linear Activation Functions (C1W3L07)

Computing Neural Network Output (C1W3L03)

Computing Neural Network Output (C1W3L03)

Backpropagation Intuition (C1W3L10)

Backpropagation Intuition (C1W3L10)

Train/Dev/Test Sets (C2W1L01)

Train/Dev/Test Sets (C2W1L01)

Deep L-Layer Neural Network (C1W4L01)

Deep L-Layer Neural Network (C1W4L01)

Random Initialization (C1W3L11)

Random Initialization (C1W3L11)

Other Regularization Methods (C2W1L08)

Other Regularization Methods (C2W1L08)

Normalizing Inputs (C2W1L09)

Normalizing Inputs (C2W1L09)

Derivatives Of Activation Functions (C1W3L08)

Derivatives Of Activation Functions (C1W3L08)

Parameters vs Hyperparameters (C1W4L07)

Parameters vs Hyperparameters (C1W4L07)

Vectorizing Across Multiple Examples (C1W3L04)

Vectorizing Across Multiple Examples (C1W3L04)

What does this have to do with the brain? (C1W4L08)

What does this have to do with the brain? (C1W4L08)

Dropout Regularization (C2W1L06)

Dropout Regularization (C2W1L06)

Vanishing/Exploding Gradients (C2W1L10)

Vanishing/Exploding Gradients (C2W1L10)

Basic Recipe for Machine Learning (C2W1L03)

Basic Recipe for Machine Learning (C2W1L03)

Bias/Variance (C2W1L02)

Bias/Variance (C2W1L02)

Forward Propagation in a Deep Network (C1W4L02)

Forward Propagation in a Deep Network (C1W4L02)

Weight Initialization in a Deep Network (C2W1L11)

Weight Initialization in a Deep Network (C2W1L11)

Numerical Approximations of Gradients (C2W1L12)

Numerical Approximations of Gradients (C2W1L12)

Regularization (C2W1L04)

Regularization (C2W1L04)

Why Regularization Reduces Overfitting (C2W1L05)

Why Regularization Reduces Overfitting (C2W1L05)

This video features Pieter Abbeel discussing his work in deep reinforcement learning and provides an introduction to the field, including its challenges and open questions, and recommends self-study and trying out frameworks like TensorFlow and PyTorch to get started in AI. The video covers topics such as representation, safety, and exploration in reinforcement learning, and provides examples of Abbeel's work in autonomous helicopter flight and robot learning. By watching this video, viewers can

Key Takeaways

Watch the video to understand Pieter Abbeel's work in deep reinforcement learning
Try out frameworks like TensorFlow and PyTorch to get started in AI
Explore the challenges and open questions in deep reinforcement learning, such as exploration, credit assignment, and safety
Learn about the applications of deep reinforcement learning, such as autonomous helicopter flight and robot learning
Practice designing effective prompts for machine learning models

💡 Deep reinforcement learning is a powerful tool for learning complex tasks, but it still has many open questions and challenges, such as exploration, credit assignment, and safety, and requires careful design of prompts and models to achieve good performance.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for AI development

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for advancing AI research

Medium · Data Science

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Explore the geometric assumptions underlying neural networks and their implications on manifold learning and projections

Medium · Deep Learning

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Learn about the hidden assumptions of neural geometry and how manifolds and projections impact neural network performance

Machine Learning Project for Final Year Students | ML Project Idea @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB