Deep Robotic Learning with Sergey Levine - #37

The TWIML AI Podcast with Sam Charrington · Beginner ·🤖 AI Agents & Automation ·8y ago

Key Takeaways

Sergey Levine discusses deep robotic learning, including techniques for allowing machines to autonomously acquire complex behavioral skills and the use of tools such as Bonsai and Wise. at GE digital, with a focus on industrial AI and machine learning platform space.

Full Transcript

[Music] hello and welcome to another episode of twiml talk the podcast R interview interesting people doing interesting things in machine learning and artificial intelligence I'm your host Sam charington this week we continue our industrial AI series with Sergey LaVine an assistant professor at UC Berkeley whose research focus is deep robotic learning Sergey is part of the same research team as a couple of our previous guests in this series Chelsea Finn and Peter reel and if the response we've seen to those shows is any indication you're going to love this episode sergey's research interests and our discussion focus in on how robotic learning techniques can be used to allow machines to autonomously acquire complex behavioral skills we really dig into some of the details of how this is done and I found that my conversation with Sergey filled in a lot of gaps for me from the interviews with Peter and Chelsea by the way this is definitely a nerd alert episode before we jump into the show I'd like to thank everyone who's taken the time to enter our AI conference giveaway you all know that one of my favorite things to do is to give away free stuff to listeners and we've been fortunate to be able to give away tickets to the O'Reilly AI conference to Lucky twiml listeners since the the very first event in the series last year well we've got a couple of exciting updates for those of you who want in on this opportunity first we're making it even easier to enter our ticket giveaway for the San Francisco event and second we're giving away two tickets now not just one to enter the contest in 30 seconds or less just hit pause right now and visit twim ai.com aisf right from your phone finally a quick thank you to our sponsors for the industrial AI series Bonsai and wise. at GED digital by now you know a bit about Bonsai right you've heard me mention their AI platform which lets Enterprises build and deploy intelligence systems well I actually spent some time in the Bonsai offices in Berkeley last week learning more about that platform and recording an interview with their co-founder and CEO Mark Hammond it was a great conversation and I'm really looking forward to getting it on the podcast in a few weeks in the meantime I'll reiterate that if you're trying to build AI powered applications focus on optimizing and controlling the physical systems in your Enterprise whether robots or HVAC systems or Supply chains you should take a look at what they're up to they've got a unique approach to building AI models that lets you model the real world Concepts in your application automatically generate train and evaluate low-l models for your project using technology IES like reinforcement learning and easily integrate those models into your applications and systems using apis you can check them out at bonds. a/ twiml aai and definitely let them know you appreciate their support of the podcast and this series last week I announced wise. at geed digital as a sponsor for this series as well wise. was among the first companies I began following in what I call the machine learning platform space back in 2012 2013 I've since interviewed co-founder Josh Bloom here on the show and mentioned the company's subsequent acquisition by GE digital at geed digital the wise. team is focused on creating technology and solutions that enable Advanced capabilities for the industrial internet of things making infrastructure more intelligent and advancing the industry's critical to the world we live in I want to give a hearty thanks and shout out to the team at wise. GED digital for supporting my industrial AI research and this podcast series of course you can check them out at wise. and now on to the show hey everyone I am on the line with Sergey LaVine Sergey is an assistant professor at UC Berkeley in the eecs department and I'm super excited to have him on on the show hi Sergey hello how are you doing I'm doing well wonderful wonderful how about we start by having you introduce yourself and talk a little bit about your background and how you got interested in your current area of research and what that is sure so I actually started off in graduate school working on computer graphics and particularly in computer Graphics I was really interested in simulating Virtual humans simulating virtual characters and the trouble is that if you want to simulate very realistic virtual humans one of the things you have to do is you have to simulate intelligence because humans are intelligent and machines by default aren't so a lot of my work turned out to be essentially artificial intelligence work in computer Graphics to get these virtual characters to behave in ways that looked plausible so from there I decided that well if I have some methods that work reasonably well in computer Graphics I can create some plausibly realistic virtual humans perhaps those are methods that are also applicable for example to robotics so I did a post talk after in robotics turns out that a lot of the stuff works well for robots as well and a lot of that led to my current work in reinforcement learning and deep learning fantastic I noticed on your website that you're you've got a paper accepted or you're speaking at a computer animation conference are you still fairly active in the video domain not as much in recent years so I think my my last paper there was in 2012 I am giving a guest lecture this summer actually at SCA that's the Symposium on computer animation to talk about some of the recent progress in deep reinforcement learning so actually since I moved to robotics actually a lot of this technology has made actually a big impact in graphics and that's really right about now and this past year that's been registering a lot so they invited me to come give a talk to them about how some of the stuff is going fantastic fantastic so as you know we recently had on the show Peter reel and Chelsea Finn who are your colleagues there at Berkeley and the conversations I had with those guys were really really interesting and let's maybe take a minute to talk about the research that you're doing in a little bit more detail and we can dive in deeper sure so the the area that I work in can be broadly categorized as robotic learning so I'm interested in developing algorithms and models that can allow robots to autonomously learn very large and complex repertoir of behaviors so that they can take on more and more the functionality that we associate with with intelligent human beings so that they can do all the things that are dangerous unpleasant or for other reasons undesirable for people to do themselves and to me this problem is not just a problem that has a lot of interesting practical implications it's also something that I think can serve as as a really valuable lens on artificial intelligence because in the end we have only one proof of of existence of of true intelligence that's human beings and human beings are embodied so we don't just exist sort of in ether thinking abstract thoughts we actually have a body we interact with the world and the nature of that interaction is very Central to shaping who we are and how and how we reason about things so I think that dealing with systems that are embod systems like robots give us a very valuable perspective in understanding how we might be able to construct artificial intelligence H so more so than some of the non-physical applications of machine learning in AI including other deep learning applications like gameplay well so the the thing about other applications of AI is that often times especially in in things like computer vision speech recognition and so on we work with just the the perception half of the equation so we think about how we can take in data and produce a particular answer the nature of intelligence is much more complex than that it's about taking in information reasoning about it making decisions thinking about the outcomes of those of those decisions and so on and so on now you mentioned gam playing which has some some elements of this but one thing that gam playing won't let you do is it won't let you tackle the full complexity and diversity of the real world because the real world is characterized not just by its sequential nature but but also by its diversity by the sheer number of unexpected things that might happen in a natural interaction which computer vision has dealt with for decades but without handling the decision- making and the gam playay handles the decision- making but without handling so much of diversity so to what degree is your research in robotic learning kind of Integrative across all these different fields are you specifically focused on pulling together you know some of the state-of-the-art research from these various fields or is your domain within robotic learning kind of established and you're you're heading down a path that way I don't know if that question makes any sense but if you if you kind of get a sense from where I'm going I think I see where you're going this is actually a very good question and something that for robotics has been sort of one of these tensions over the years is that it's often been very tempting for researchers to think of Robotics as fundamentally a systems or integration exercise so if you have let's say a very effective computer vision system and you have a very effective let's say planning system well maybe building an intelligent robot is just a matter of welding those pieces together connecting up the wires and and seeing it work and a lot of people have hoped for exactly this that by making progress independently in different domains we'll get closer and closer to intelligent robots unfortunately reality hasn't quite panned out that way and a lot of Robotics will actually lament that if they take sort of the latest image net train model and put it on the robot and try to use it for object detection in the wild it'll actually do a pretty terrible job because the biases that are present in the kind of data sets that those models are trained on don't really reflect what a robot will see from its cameras in natural environments so I actually think that in order to really get this right we need to draw on the lessons in the state-of-the-art models in you know game playing vision and so on but at some point we have to kind of do a lot of that ourselves we have to take the lessons but not necessarily the the technical components themselves and for that reason I've actually been a really big advocate of endtoend training for robotic learning where we set up models that include both perception and control and actually training together to perform the particular tasks the robot needs to handle instead of relying on integration of existing components in taking a look at your research I came across a really interesting example of the effect your descri desing the particular research was where you were training a robot arm I think it was a backer robot to tie knots in a rope and some of the comments associated with the research on the I think there was a GitHub page about it was that hey we trained this system on a I think it was a red rope and you know we're working hard to make it work with a white rope also that's a little bit stiffer and we trained it on a a background that was a green background and you know that doesn't we found that that doesn't generalize to other backgrounds this is a a conversation point that came up in the with Peter as well this notion of Mastery versus generalization can you talk a little bit about that and how your research is taking that issue on yeah absolutely so the backer paper that you're referring to there what we did is we actually had a robot practice tying knots but of course it was one robot and was practicing tying knots knots in one particular rope so the resulting system could do really well at tying knots in that rope it could kind of tie knots in ropes that looked a little similar and it pretty much broke down if you gave it something you know a rope that was too thick or too thin or something like that but here's the thing that in robotics there's like often times when we run experiments the experiment is the entirety of the data collection process so if you imagine an experiment in computer vision you take all of image net you train your model on it and you you show its performance in Rob an experiment basically amounts to generating an entire new data set training your model on it and then observing its performance so of course if you're train generating an entire data set every time if you have one robot just a little bit of time it's not going to generalize very far we did actually try to study at one point what would happen is if we scaled up the style of Technique we did this actually in partnership with Google which has quite a quite a bit more resources as far as deploying large numbers of robots and we we tried to see actually like if we if we run data collection at the scale of something like imag net can we actually get robotic skills that generalize effectively mhm so what we did there is we set up we call this the arm Farm by analogy to server Farm we set up a cluster of about 14 robots and we had them basically working day and night to practice grasping objects so we chose grasping because it's it's a it's something you can do to pretty much any object and it's also very important for a lot of other robotic manipulation tasks and we had them running day and night like this and they collected about 800,000 grasps each grasp had maybe 5 to 10 images so the total size of the data set was about on the same order of magnitude as imag net and there we did find that actually the resulting networks that you train on that really large data set they do actually generalize effectively to new objects that are completely different than what they've seen before in fact when you do learning at this larger scale you can observe some really interesting em emerging Behavior one of the things that that we were thinking as we did this work is well you know grasping is a very geometric Behavior so probably the first thing that these systems will learn about is the the geometry of objects in the world so they'll learn that you know you need to put put the finger on one side put the finger on the other side and so on what we saw which surprised us a little bit is that in the earlier stages of training when you have maybe 100,000 grasps before we collected the full data set of of a million in the early stages of training the network actually didn't pay as much attention to Geometry but what it did do is it paid a lot of attention to material properties it recognized right away that if something was really soft then it could pinch it and pick it up easily but if something was rigid then it couldn't do that and this is completely different from how conventional manually designed grasping systems tend to work because you know when you manually design a grasping system you're going to use some sort of geometric motion planning and you're going to completely ignore the material properties so that was really really interesting to us and that that sort of underscored I think the value that you get from using learning through trial and error because you actually learn about the patterns that are really present in the world rather than the ones that your your analytic model thinks are important h are there any other emergent behaviors that you observed in that set of experiments let me see so that was the only one that we could pin down in the sense that we could actually measure it like we could we could actually put different objects in front of it and quantify that yes it was really employing the strategy informally there were a few things that that it did tend to tend to do pretty consistently that I can kind of speculate a little bit about I just don't have the hard numbers for it it tended to figure out for example that if you have something like a brush that you should pick up the brush by the the stiff part rather than the flexible bristles which is nice intended to figure out that Center of masses of objects really matter especially for awkwardly shaped objects so those were some of the things that that it picked up on there were also a few mistakes that it actually made that were kind of amusing so it just so happened that a lot of the soft things in our in our training objects were brightly colored because we bought you know we wanted to buy small items of clothing and small items of clothing are children's clothing and children's clothing T to be brightly colored so it had this Association that things were brightly colored were soft and in our test set of objects we had a pink stapler and that pink stapler was just impossible for it to pick up because it was just convinced that this pink stapler was a soft fuzzy thing and it could just pinch it so that's that's a good example actually of the kind of funny data set biases that you can get that that will actually affect you even in real world tasks like this interesting interesting when I hear you describe the examples an example like the pink stapler it makes me wonder you know to what extent is it possible to layer the traditional object recognition types of Technologies into a model like this like should it be able to recognize the stapler first and then have some higher level abstraction that we're also training on in addition to just the raw pixels is that something you looked that yeah that's a very good question that's actually something that that we've thought about a lot here so one of the big things that you get out of traditional approaches to to object detection it's not actually the models themselves it's it's it's the it's the data there's very large and extremely diverse data sets label data sets of objects with you know bounding boxes segmentation and so on and it would be really nice to try to use that but at the same time you want to avoid losing the benefit of end to end training so if you simply run a bounding box detector on what the robot is seeing and then ask it to pick things up well it's not just the bounding box that matters for the grasp it has to also understand about what's in that bounding box so you don't want to lose the benefit of the anent training but at the same time you want to somehow get more out of all these auxiliary sources of information one of the things that we've been working on a little bit and this isn't out yet but we will this will be released in probably a couple of weeks is some work on semi-supervised learning of robotic skills where we combine experience from the robot's point of view that includes the actions that it took and the observations that I saw with kind of a weekly labeled image data set and weekly labeled in the sense sense that that data set just tells you does the image contain the object the robot needs to use so if the robot is learning for example how to put a cap on a bottle the weak labels might say does this image contain a bottle or not M and the idea is that the robot itself when it's interacting with the world maybe it only gets to interact with a few instances of those objects so it can use those few instances to understand the physics of the behavior but it's not really enough for it to really generalized to understand what the entire class of objects of this type looks like so the weekly label data is to basically show it what can this skill be applied to and the important thing when incorporating this weekly labeled data is not to lose the benefit of the anent training so the in this technique that we develop we're actually including the weekly label data and the robot's own experience at the same time in a joint training procedure rather than actually splitting things up into components and then trying to wire them up together as in the more kind of conventional systems approach and that turns out to work very well the under the hood the method has kind of an intentional flavor to it so it basically learns what kind of to pay attention to from the weekly label data and then use that attentional mechanism to actually perform the task when it you know at test time how do you express weakness in this model well when I say weakly labeled I just mean that the images have a label that only tells you whether the object you care about is present or not so okay so you can you can think of this as a person telling the robot here are the things that you can execute this skill on so here lots of pictures of the thing that you can you can do this task to and here are all the pictures of things that you cannot do this task to right right and is there a general approach to incorporating in kind of higher level abstractions higher level abstractions into models like this meaning you know in the case of a and going back to the stapler example you know we could do the object detection and determine that hey this is a stapler but there's also you know there are other neural Nets that or other examples that can do geometry detection and things like that and orientation detection and I guess the question that I'm trying to get at is it sounds like the general approach to applying deep learning in this model is you know let's just collect a bunch of data and you know throw it at and train on a bunch of data and if there are important features you know the model will figure it out the network will figure it out and what I'm curious about is is that do we a I guess what are the you know what's the is there an analytical Foundation to that assertion and if not are there other ways that folks are looking at incorporating in abstractions or features into you know these models to help them you know both generalize and train faster so I think there's perhaps a little more to it than that so it used to be that when we thought about kind of the the the previous generation generation of machine learning models the way that we would imagine using them is exactly what you described that we say okay we have some Edge detector we have a POS detector we have some kind of thing that that will analyze local geometry we'll plug that into the downstream module and so on and so on the thing about deep learning is that the model itself you know it it's good for making predictions but there's nothing kind of unique or special about it you can actually have the same model per multiple tasks and that's often not actually that much harder than stapling together two models that each perform those tasks so if you want a model that can you know segment an image and detect poses of objects you could train two separate models and then combine their outputs or you can just train one model that does both of those tasks and the latter is often not actually that much harder but it has a substantial benefit which is when you train a single model to perform multiple tasks it can actually learn internal representations that that share the knowledge that's contain in those two tasks so if you were to ask me how I would consider combining let's say a object pose detector and a grasping system I would much rather train a single model that predicts both pose and grasp than to take a pose predictor and feed its output into a grass predictor and the reason for that is that the data already has all the information there's nothing you know magical that's contained in the model that's not already contained in the data and it's possible to train these joint models so I might as well take both data sets and train one model that will benefit from the shared structure in both of those tasks than trying to completely destroy models and then try to staple them together afterwards right right have you run into situations where there's there are pre-existing models trained on inaccessible data I guess I'm maybe I'm kind of chasing the chasing the tail of the scenario a little bit but it sounds like you know there may be some Corner case where it makes sense to do that if you don't have access to the data but you do have access to the model but I get the point that in general the data is the data and if you can train one model that can build these internal representations it's much more efficient than trying to engineer one model that can solve part of the problem and another model that uses that to solve the thing that you're actually trying to do yeah basically it's a lot easier for us to compose data sets than just to compose models right right so one of the challenges that comes up that you've spent some time looking at is the efficiency of training these deep learning models sample efficiency in particular is one of the ways you talk about that can you talk a little bit about that problem and the things you've done there right so I I assume you're referring specifically to sample efficiency for for deep reinforcement learning algorithms that's correct so deep reinforcement learning algor are kind of a a funny creature you know deep learning like standard deep learning with gradi and descent it's a common perception that it's inefficient and in some sense it is like you know we can build very good object detectors but we need maybe millions of images to train them which might seem like a lot but you know if you consider what that model is really doing it's reasoning about pixels edges everything from those pixels and edges all the way to complex higher level Concepts that's actually pretty sophisticated with deep reinforcement learning though things get a lot worse so if you look at the kind of sample complexity for learning to play Let's a simple video game like pong and there you're going to be looking at you know millions or even tens of millions of images for a task with visual diversity that's nowhere near what we see in conventional let's say computer vision data sets so it's visually it's very simple physically it's very simple but you need a lot of samples to to to learn that that task and those samples involve actively interacting with an environment now it happens to be a simulated environment so you can run it much faster than real time on a server but still something here seems a little out of whack something here is a lot worse than perhaps it should be and what's the intuition for why that is the case there are a couple of reasons for it the short version is that we don't fully understand but the long version is that there are a few things that are being done that could perhaps be done differently now if I knew exactly the answer to this then then of course I would have a much more efficient algorithm to give you but it's possible to to guess a few things here one of the things is that reinforcement learning provides a much weaker signal than supervised learning so in in reinforcement learning even though it's gradient based optimization you don't really have gradients of the thing that you really care about you're sort of estimating them in this very peculiar way depending on the on the reinforcement learning algorithm that you use so you essentially get a lot less information from every gradient step a lot of reinforcement learning algorithms also tightly couple the collection of data in the environment and the updating of the model which is very different from supervised learning so in supervised learning you first collect a large data set and then you take many many gradient steps on that large data set in reinforcement learning you often inter leave collection of data and updating the model because you need to collect data that agrees with your model so if you're learning a policy you'd like to collect the kind of experience that that policy will actually see and you want to do this iteratively so that means that you're often throwing out lots of data from old policies that you can no longer used because your policy has changed and that that prevents you from reusing all data so that that can be very harmful for sample efficiency in fact some of the most inefficient methods methods like policy gradient that are very convenient to use in simulation they're often the most inefficient in the real world because they can't reuse data so we need to look at methods that can reuse old data these are sometimes called off policy algorithms before we go there can you elaborate on the throwing out of the data is this something that that the algorithm is doing as part of the way it's constructed or is this something that we're doing manually tell us a little bit more about what what we mean by that oh so that that's just how a lot of on policy policy gradient algorithms work so these algorithms will operate as following they will collect experience from the current policy they will compute a gradient descend Direction on that on that experience they will take that gradient step update the policy and now they need more data from the latest policy which has now been updated so they have to throw out all the old data and collect a new batch of data so you can kind of if if you want a mental picture of what this looks like if you have a robot that that let's say is learning to walk it'll try to walk a couple of times update its Behavior try to walk a couple more times and so on and so on and that's that's the reinforcement learning process but you have to remember that each time it changes Behavior like that it has to basically collect new experience because it needs to understand how well it's current policy is really doing right so that that can get really really expensive in terms of the amount of time it needs to spend collecting experience so if you're running stuff in a simulator on a on a server Farm somewhere then it's okay you can paralyze all that and everything is is reasonable but if that's a real physical system that's actually executing those trials that can get extremely timec consuming mhm okay and you were about to about to talk about some of the ways we can get Beyond this right so one of the things we can do is we can look at off policy algorithms so these are algorithms that can supplement their training with data from other policies so what what can you learn from other policies well intuitively one of the things that you can learn is you can learn about predicting future events because you know the rules of of physics and so on they will hold true regardless of which policy you're you're you're executing and the kind of future events that you can predict can range all the way from very detailed where you're actually predicting let's say the entirety of your future observations and this is sometimes called modelbased reinforcement learning or all the way to something fairly abstract like the future rewards that you will see and this is actually a type of model-free reinforcement learning that's sometimes referred to as value function estimation or Q learning that that also falls into this category but they're all kind of prediction style methods so on the one extreme you're predicting the entirety of your future sensory observations and On The Other Extreme you're predicting something very abstract like rewards that you will see in the future and that tends to be more efficient because that allows you to incorporate data from other policies including your own past policies and so can you talk a little bit about those policies and how they differ from one another yeah so I I can talk a little bit about the the model based reinforcement learning because I feel like this is something that perhaps hasn't gotten quite as much attention in the re in in the research community in recent years because there's been a lot of excitement about model free reinforcement learning the modelbased reinforcement learning it's perhaps not as far along because the prediction problem that it's trying to solve is a lot harder but it has a lot of promise for dramatically improving sample efficiency for two reasons the first reason is the one I mentioned that you can use data from other policies but the second reason which is perhaps a little more subtle is that in modelbased reinforcement learning every sample has a lot more bits of supervision so if you imagine what you're doing when you're let's say predicting a value function you're predicting one scalar value that's a function of your current observation or state when you're predicting everything that will happen in the future maybe you're predicting future images that you will see there are many more bits of supervision in that in that prediction problem so every single sample actually carries a lot more bit of supervision and that means that your model can learn a lot more from each of those samples now the flip side of the coin is that your model is now trying to solve a much harder problem it doesn't have to predict just a single scal of value it has to predict an entire image so it's sort of a little bit unclear how that shakes out but potentially the benefit in Sample complexity can actually be quite substantial there we've done a little bit of work on modelbased reinforcement learning for vision based tasks actually on real physical robots and this is some work that we that we did that also involved actually paralyzing data collection across multiple robots but at a much smaller scale so with the grasping I mentioned that we needed about 800,000 grasp attempts for the model based reinforcement learning we actually trained a video prediction model for pushing objects around on a table with about 50,000 pushes and that was actually effective for generalizing to new objects and pushing them in New Directions and so on simply by predicting what the robot will see in the future and then taking the actions for which that model predicts the kind of outcomes that you want so that that was already a lot more efficient and and it ran on real physical systems now the the downside is that because the prediction problem there is so hard the predictions were very short range so the robot could only execute behaviors maybe with a horizon of two to three seconds so these weren't complex behaviors and that's because the prediction problem is so hard but hopefully as we get better and better video prediction models which is a very active area of research right now these methods will get better and better is that the inverse reinforcement learning problem no this is this is the model based reinforcement learning problem so when I look at the again going back to the backer robot video it talked a little bit about this inverse RL where you are it sounded like you're doing the same thing you're you've got your rope in one state you have the human move it to another state and then you're looking at the action that the robot action that it would take to get it from one state to another and producing the inverse of that or that that becomes the you know the action that the robot takes to move the Rope into the Rope to a position that the that's required to imitate what the human did so that's the inverse RL how are what you just described sounded very similar to that so I think what you mean is actually inverse Dynamics inverse Dynamics okay so when you when you have a modelbased reinforcement learning problem there's actually different ways that you can represent your predictive model the most common way is to build what's called a forward Dynamics model so forward Dynamics means that you're predicting from the present to the Future so you're looking at your current observation your current action and you're predicting what the next observation will look like inverse Dynamics means you're predicting from the future to the action so that means that you're looking at your current observation your future observation and you're predicting what action will get you from one to the other right so I've got the rope in position a I've got the rope in position B what's the action that's required to get it from A to B exactly so it's it's just another kind of predictive model and they have their and they have different pros and cons so with a forward model you can run it Forward many steps because you can basically recursively apply it to its own predictions but you have to work a little harder to get the action with the inverse model the action comes right out of the model but it's difficult to chain it together because you don't know what the what the following observation will be because the model doesn't predict observations it predicts actions so inverse models are perhaps a little easier to use they're a little easier to train but they're a little harder to use for longer term planning okay okay so in the discussion about sample efficiency one of the things that I came across was miror descent guided policy search can you talk a little bit about that and where that fits in sure so mirror descent guided policy search is a technique for optimizing a very complex policies like deep neural network policies by only using supervised learning to train the policy itself and that sounds a little bit funny because if we're doing reinforcement learning well that's not supervised learning so mirror descend guided policy search sort of plays this game where it tries to figure out what is the supervision that I can give to to a supervised learning algorithm such that when it trains some complex policy that policy will do the right thing so it's it's like if you know that you're that only supervised learning is ever allowed to touch the neural net what can you give to the supervised learning algorithm so that it does the right thing for solving a reinforcement learning problem and the way that the algorithm works is something like this that you're going to basically have a modelbased teacher that's going to generate training data for your supervised learning algorithm so that modelbased teacher is it's a kind of modelbased ARL method but it's not a deep model based IRL method it's just a you can think of it almost like a like a non-parametric algorithm so it'll look at a few different trajectories that you took figure out how to improve each of those individual trajectories by themselves without reasoning about any policies and then that will generate training data so that your your neural network can be trained with supervised learning to do better so instead of reinforcement learning which looks at at the parameters of your model and says how do I change these parameters to be better in this mirror descend guided policy search it actually looks at the trajectories that you executed fits a model figures out how those trajectories should be improved and then adds those improvements as training data for regular supervised learning that way the neural net is only ever trained with supervised learning and standard back promp okay but at at a very high level the reason that this procedure is efficient has a lot to do with why modelbased RL algorithms are efficient because it really is a kind of modelbased RL algorithm it's just one that under the hood uses standard supervised learning to train the the policy their own network mhm so there's another interesting paper I came across and that was the one on policy sketches can you talk a little bit about that work and what your what the goals are and what the results were yeah I'd be happy to talk about that so that was worked by a student named Jacob Andreas together with Professor Dan Klein who's a another professor here at us Berkeley Jacob and Dan they they both work on natural language processing so the premise in that paper is that we'd like to see how symbolic descriptions of tasks you can think of these as very very simplified natural language how symbolic descriptions of tasks can be used to improve learning and the key ingredient there is that we'd like to basically see how symbolic descriptions can improve Learning Without assuming that those symbolic descriptions are grounded so without assuming that the agent already understands what the symbols mean so if you let's say go to a foreign country let's say you don't speak French and you go to France and someone tells you in French how to let's say make a piece of furniture out of wood and then they tell you how to make another piece of furniture of wood and then they tell you how to make a bench out of wood well listening to those descriptions you'll probably notice some common patterns you probably notice that some words repeat and if you hear enough of these descriptions and you actually perform those tasks and you kind of understand physically what it means you'll find those patterns even if you don't actually speak the language and eventually when you hear new phrases describing new items that you can construct out of wood for example you might be able to put the pieces together and figure that out more quickly so that was kind of the idea that we were working with so what what Jacob did is he constructed this sort of simple simpied version of a of the Minecraft video game so it's like a little crafting video game it was simplified because we didn't want to deal with vision we just wanted to deal with kind of simple kind of top down navigation problems and it had these tasks that were like you know pick up the wood or chop down the tree pick up the wood make the the chest for example chop down the tree get the wood make a boat or you know grab the coal put it in the in the oven and so on and there was a long list of these different tasks that the agent could perform that were con constructed out of these symbolic verbs essentially and the agent would be given a set of these tasks it would learn them and the the symbolic descriptions would just be given as an additional input so they would result in some decomposition of the neural net but there's actually different ways you could do that but essentially they would be provided as an input to the agent without telling it what those symbols really mean and just by learning the different tasks with the different symbolic descriptions it could actually figure out how to then use new symbolic descriptions to solve new tasks more quickly interesting yeah it's funny when we talk about learning objects object dete in images you know the the amount of data that is required to train a neural network to figure out what an object represents seems so large compared to our ability as humans to do it this is an example where I would need at least a million examples of the French sentence so not knowing if I didn't know French you know I can imagine needing a ton of examples of training examples for myself to be able to figure out the language and and then how to put that together to make some furniture but you know if you spoke Spanish you'd probably figure it out much more quickly and that's not ah this is true and and I think that actually there's something to that as far as how how the learning based systems can work better I talked before about multitask learning and one of the things that that that distinguishes humans from from these learn models is that humans are actually always doing multitask learning we're always doing multiple things at once we're looking for things in our environment doing some something we're worrying about we're going to have for dinner we're worrying about some other stuff we're observing some interesting you know car that we see on the road over there we're always doing many many things and perhaps a lot of our efficiency is actually down to this fact that we're never learning anything truly from scratch because we're learning so many things all at once any new thing that we have to do we get a a broad basis of knowledge on which to draw to figure out that new thing so in a sense perhaps what we're doing is we're actually extremely broad kind of multitask learner and maybe that's that's a big part of how we get that efficiency and what's the relationship between multitask and transfer learning well multitask learning is one of the ways to get transfer learning right so in multitask learning we're learning multiple things in parallel and in transfer learning we are transfer learning is a broader idea that includes taking pre-trained models and using them applying them to other things I guess the the the direction that you know the Curiosity that has been pequ is like how do we combine all of these things to you know make our ability to train these models even faster right well so one of the things we've been looking at quite a bit actually is how we can use past experience to accelerate future learning so that we've worked on this in the context of reinforcement learning supervised learning and so on there are a number of ways you can you can approach that problem but they all sort of boil down to some version of looking at your past experience breaking it up into you know little pieces of training data and little pieces of validation data trying to build your model such that when it sees that little training data it'll do well in that little validation data and do this many many many times so that you get a model that's basically good at quickly adapting to small training sets there are different ways that you can construct these types of models that you know many other groups and us have looked at but that that's sort of the the big picture setup these are sometimes called metal learning algorithms I think that's actually an extremely promising direction for the future to really take deep learning methods Beyond this regime of of always relying on really gigantic data sets and I think it goes hand inand with multitask learning that basically the that the way that we can get to the kind of efficiency that we see in humans is by solving many tasks solving those tasks in a metal learning context so that we're using the our past experience of solving old tasks to accelerate the solving of new tasks and then when we encounter new tasks that we haden't seen before we'll generalize and quickly adapt to them MH and does does multitask learning necessarily imply a single Network across all of the tasks or are there variations there there are definitely variations so one of the things that we've studied actually as well as several other groups is how we can construct actually modular networks so networks that will have some components that are shared and some components that are distinct between tasks and the nice thing when when you build modular networks actually the policy sketches paper you mentioned is an instance of this that also had modular Networks when you have modular networks one of the things that you can observe is that there will actually be kind of interfaces that emerge naturally between different modules so in a in a robotic context let's say you might have a module for perception and maybe you have one module for a color camera and a different module for lar and then you have a module for actuation for a robot with four links and a different module fractu for a robot with three links and you can mix and match any combination of these so you can say okay here's a liar robot with four links here's a RGB camera robot with Three Links train different combinations of these modules and then you can actually find that you might get generalization to new combinations of sensors and robots and you can figure out that that bottleneck between the two modules actually constitutes a kind of a a learned interface because different modules they have to basically adopt a common interface because they don't know who's going to be Downstream from them or who's going to be upstream and at the systems level what are the implications of that is it then easy to take one of these modules and drop it into another system or does it not quite work like that well I think that's part of the hope so I think we haven't seen that yet but in the long run that's I think one of the really interesting things about modular or network designs is that perhaps it could actually be possible to use this as as as a way to combine the benefit of anend learning with the benefit of modularity to be able to actually you know train up some component let's say on if you're doing autonomous driving you train up a particular Vision component on one car maybe supplemented with image net data and you just drop it into a different car but then that that different car has its own modules for let's say you know controlling the the acceleration or something like that so I think that that's part of the hope we're not quite there yet the work is still in fairly early stages but I think that that's definitely a really exciting place that this kind of stuff could go the scenarios you all you just described were all endtoend trained at least in the Inn initial system they're end to end train as opposed to training mod module at a time is that right right so that that's actually the that's the nice thing about modular neural networks as opposed to modular anything else is that neural networks can be composed so if you have a modular neural network you can still train the whole thing a combination of multiple modules end to end now when I say end to end there could be different ends so end to end could mean that your vision system is simultaneously trained on image net recognition and feeding the right visual representation to a downstream control module to to perform some test task yeah it's interesting so I the general question that I want to get out here is I think the the basis that you've laid out for endtoend robotic learning makes a ton of sense at the same time when I talk to folks in Industry about how they're using neural networks in deep learning and I present this vision of hey we're just going to have this one Uber neural network that can figure everything out invariably I get back some reaction that's like no no no we don't do it like that at all it doesn't work it's too hard how do you account for The Gap there do you see similar things I g

Original Description

This week we continue our Industrial AI series with Sergey Levine, an Assistant Professor at UC Berkeley whose research focus is Deep Robotic Learning. Sergey is part of the same research team as a couple of our previous guests in this series, Chelsea Finn and Pieter Abbeel, and if the response we’ve seen to those shows is any indication, you’re going to love this episode! Sergey’s research interests, and our discussion, focus in on include how robotic learning techniques can be used to allow machines to acquire autonomously acquire complex behavioral skills. We really dig into some of the details of how this is done and I found that our conversation filled in a lot of gaps for me from the interviews with Pieter and Chelsea. By the way, this is definitely a nerd alert episode! Notes for this show can be found at twimlai.com/talk/37 Subscribe! iTunes ➙ https://itunes.apple.com/us/podcast/this-week-in-machine-learning/id1116303051?mt=2 Soundcloud ➙ https://soundcloud.com/twiml Google Play ➙ http://bit.ly/2lrWlJZ Stitcher ➙ http://www.stitcher.com/s?fid=92079&refid=stpr RSS ➙ https://twimlai.com/feed Lets Connect! Twimlai.com ➙ https://twimlai.com/contact Twitter ➙ https://twitter.com/twimlai Facebook ➙ https://Facebook.com/Twimlai Medium ➙ https://medium.com/this-week-in-machine-learning-ai
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from The TWIML AI Podcast with Sam Charrington · The TWIML AI Podcast with Sam Charrington · 41 of 60

1 Engineering Practical Machine Learning Systems with Xavier Amatriain - #3
Engineering Practical Machine Learning Systems with Xavier Amatriain - #3
The TWIML AI Podcast with Sam Charrington
2 How to Build Confidence as an ML Developer with Siraj Raval - #2
How to Build Confidence as an ML Developer with Siraj Raval - #2
The TWIML AI Podcast with Sam Charrington
3 Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1
Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1
The TWIML AI Podcast with Sam Charrington
4 Interactive AI, Plus Improving ML Education with Charles Isbell - #4
Interactive AI, Plus Improving ML Education with Charles Isbell - #4
The TWIML AI Podcast with Sam Charrington
5 Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5
Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5
The TWIML AI Podcast with Sam Charrington
6 Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6
Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6
The TWIML AI Podcast with Sam Charrington
7 Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7
Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7
The TWIML AI Podcast with Sam Charrington
8 Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8
Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8
The TWIML AI Podcast with Sam Charrington
9 Emotional AI: Teaching Computers Empathy with Pascale Fung - #9
Emotional AI: Teaching Computers Empathy with Pascale Fung - #9
The TWIML AI Podcast with Sam Charrington
10 Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10
Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10
The TWIML AI Podcast with Sam Charrington
11 Building AI Products with Hilary Mason - #11
Building AI Products with Hilary Mason - #11
The TWIML AI Podcast with Sam Charrington
12 Reprogramming the Human Genome with AI, w/ Brendan Frey - #12
Reprogramming the Human Genome with AI, w/ Brendan Frey - #12
The TWIML AI Podcast with Sam Charrington
13 Understanding Deep Neural Networks with Dr. James McCaffery - #13
Understanding Deep Neural Networks with Dr. James McCaffery - #13
The TWIML AI Podcast with Sam Charrington
14 Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14
Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14
The TWIML AI Podcast with Sam Charrington
15 Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15
Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15
The TWIML AI Podcast with Sam Charrington
16 Machine Learning in Cybersecurity with Evan Wright - #16
Machine Learning in Cybersecurity with Evan Wright - #16
The TWIML AI Podcast with Sam Charrington
17 Interactive Machine Learning Systems with Alekh Agarwal - #17
Interactive Machine Learning Systems with Alekh Agarwal - #17
The TWIML AI Podcast with Sam Charrington
18 Location-Based Intelligence for Smarter Marketing with Klustera - #18
Location-Based Intelligence for Smarter Marketing with Klustera - #18
The TWIML AI Podcast with Sam Charrington
19 AI-Powered Customer Support with HelloVera - #18
AI-Powered Customer Support with HelloVera - #18
The TWIML AI Podcast with Sam Charrington
20 Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18
Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18
The TWIML AI Podcast with Sam Charrington
21 Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18
Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18
The TWIML AI Podcast with Sam Charrington
22 Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18
Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18
The TWIML AI Podcast with Sam Charrington
23 From Particle Physics to Audio AI with Scott Stephenson - #19
From Particle Physics to Audio AI with Scott Stephenson - #19
The TWIML AI Podcast with Sam Charrington
24 Selling AI to the Enterprise with Kathryn Hume - #20
Selling AI to the Enterprise with Kathryn Hume - #20
The TWIML AI Podcast with Sam Charrington
25 Engineering the Future of AI with Ruchir Puri - #21
Engineering the Future of AI with Ruchir Puri - #21
The TWIML AI Podcast with Sam Charrington
26 Deep Neural Nets for Visual Recognition with Matt Zeiler - #22
Deep Neural Nets for Visual Recognition with Matt Zeiler - #22
The TWIML AI Podcast with Sam Charrington
27 Introducing Psycholinguistics into AI with Dominique Simmons- #23
Introducing Psycholinguistics into AI with Dominique Simmons- #23
The TWIML AI Podcast with Sam Charrington
28 Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24
Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24
The TWIML AI Podcast with Sam Charrington
29 Offensive vs Defensive Data Science with Deep Varma - #25
Offensive vs Defensive Data Science with Deep Varma - #25
The TWIML AI Podcast with Sam Charrington
30 Global AI Trends with Ben Lorica - #26
Global AI Trends with Ben Lorica - #26
The TWIML AI Podcast with Sam Charrington
31 Intelligent Autonomous Robots with Ilia Baranov - #27
Intelligent Autonomous Robots with Ilia Baranov - #27
The TWIML AI Podcast with Sam Charrington
32 Reinforcement Learning Deep Dive with Pieter Abbeel  - #28
Reinforcement Learning Deep Dive with Pieter Abbeel - #28
The TWIML AI Podcast with Sam Charrington
33 Robotic Perception and Control with Chelsea Finn  - #29
Robotic Perception and Control with Chelsea Finn - #29
The TWIML AI Podcast with Sam Charrington
34 Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30
Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30
The TWIML AI Podcast with Sam Charrington
35 The Power of Probabilistic Programming with Ben Vigoda - #33
The Power of Probabilistic Programming with Ben Vigoda - #33
The TWIML AI Podcast with Sam Charrington
36 Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31
Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31
The TWIML AI Podcast with Sam Charrington
37 Video Object Detection at Scale with Reza Zadeh - #34
Video Object Detection at Scale with Reza Zadeh - #34
The TWIML AI Podcast with Sam Charrington
38 Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35
Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35
The TWIML AI Podcast with Sam Charrington
39 Expressive AI-Generated Music With Google's Performance RNN with Doug Eck  - #32
Expressive AI-Generated Music With Google's Performance RNN with Doug Eck - #32
The TWIML AI Podcast with Sam Charrington
40 Smart Buildings & IoT with Yodit Stanton - #36
Smart Buildings & IoT with Yodit Stanton - #36
The TWIML AI Podcast with Sam Charrington
Deep Robotic Learning with Sergey Levine - #37
Deep Robotic Learning with Sergey Levine - #37
The TWIML AI Podcast with Sam Charrington
42 Deep Learning for Warehouse Operations with Calvin Seward - #38
Deep Learning for Warehouse Operations with Calvin Seward - #38
The TWIML AI Podcast with Sam Charrington
43 Cognitive Biases in Data Science with Drew Conway - #39
Cognitive Biases in Data Science with Drew Conway - #39
The TWIML AI Podcast with Sam Charrington
44 Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41
Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41
The TWIML AI Podcast with Sam Charrington
45 Web Scale Engineering for Machine Learning with Sharath Rao - #40
Web Scale Engineering for Machine Learning with Sharath Rao - #40
The TWIML AI Podcast with Sam Charrington
46 Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42
Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42
The TWIML AI Podcast with Sam Charrington
47 Machine Teaching for Better Machine Learning with Mark Hammond - #43
Machine Teaching for Better Machine Learning with Mark Hammond - #43
The TWIML AI Podcast with Sam Charrington
48 LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber  - #44
LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber - #44
The TWIML AI Podcast with Sam Charrington
49 Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup
Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup
The TWIML AI Podcast with Sam Charrington
50 Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46
Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46
The TWIML AI Podcast with Sam Charrington
51 Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47
Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47
The TWIML AI Podcast with Sam Charrington
52 Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online  Meetup
Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online Meetup
The TWIML AI Podcast with Sam Charrington
53 Word2Vec & Friends with Bruno Gonçalves -#48
Word2Vec & Friends with Bruno Gonçalves -#48
The TWIML AI Podcast with Sam Charrington
54 Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan  - #49
Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan - #49
The TWIML AI Podcast with Sam Charrington
55 Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50
Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50
The TWIML AI Podcast with Sam Charrington
56 Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51
Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51
The TWIML AI Podcast with Sam Charrington
57 AI-Powered Conversational Interfaces with Paul Tepper - #52
AI-Powered Conversational Interfaces with Paul Tepper - #52
The TWIML AI Podcast with Sam Charrington
58 Topological Data Analysis with Gunnar Carlsson - #53
Topological Data Analysis with Gunnar Carlsson - #53
The TWIML AI Podcast with Sam Charrington
59 ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54
ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54
The TWIML AI Podcast with Sam Charrington
60 Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55
Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55
The TWIML AI Podcast with Sam Charrington

This video discusses deep robotic learning, including techniques for allowing machines to autonomously acquire complex behavioral skills, with a focus on industrial AI and machine learning platform space. The speaker, Sergey Levine, covers topics such as end-to-end training, reinforcement learning, and model-based reinforcement learning, and introduces tools such as Bonsai and Wise. at GE digital. The video provides a comprehensive overview of the current state of deep robotic learning and its a

Key Takeaways
  1. Train a robot arm to tie knots in a rope using end-to-end training
  2. Collect a large data set of grasps using a cluster of robots
  3. Use model-based reinforcement learning for vision-based tasks on real physical robots
  4. Implement multitask learning and transfer learning for robotic learning
  5. Design modular networks for robotic learning
💡 Deep robotic learning can be used to allow machines to autonomously acquire complex behavioral skills, and techniques such as end-to-end training, reinforcement learning, and model-based reinforcement learning can be used to improve the efficiency and effectiveness of robotic learning.

Related AI Lessons

Stop Blaming the Model. Your AI Agents Need a Control Plane
Learn why a control plane is crucial for AI agents, going beyond just the core agent loop
Medium · Data Science
What 12 failure classes and 30 Billion tokens spent taught us about trusting AI coding agents
Learn from 12 failure classes of AI coding agents to improve trust and reliability in production environments
Dev.to · keesan.eth
Lumo Is a Privacy-Focused AI Chatbot, With Clear Limits
Learn about Lumo, a privacy-focused AI chatbot with no chat logs, and understand its implications on user data protection
Dev.to · Simon Paxton
I Let 5 AI Agents Shop For Me in 2026. It Went About as Well as You’d Expect.
Learn from an experiment where 5 AI agents were used to shop for everyday items, highlighting what works and what doesn't in AI-powered shopping
Medium · AI
Up next
Building Great Agent Skills: The Missing Manual
AI Engineer
Watch →