Google Deepmind RT 2 - Using LLMs to Build Thinking, Learning Robots

Wes Roth · Beginner ·📄 Research Papers Explained ·2y ago

Skills: LLM Foundations90%Prompt Craft80%LLM Engineering80%Fine-tuning LLMs70%Multimodal LLMs70%

Key Takeaways

Google Deepmind's RT 2 model utilizes large language models (LLMs) to enable robots to learn and execute tasks without prior knowledge, leveraging techniques such as Chain of Thought reasoning and retrieval augmented generation. The model demonstrates promise in rapidly adapting to novel situations and environments, with applications in robotics, artificial general intelligence, and multimodal learning.

Full Transcript

so Google and deepmind drop another AI bomb and you need to see this real fast just ask yourself when are you going to have a robot Butler that takes care of all your tasks around the house makes your coffee cleans up after you puts away the laundry Etc it's not specialized in just a few tasks I mean general purpose like chai GPT but for dishes so just think of a number is it 10 years is it more than that is it less where do you think we are at what point a decent percent of the population is going to have a robot in their house like that so there's this like repeating pattern in Tech that software progress tends to move much much faster than Hardware there's this famous quote we wanted flying cars but we got 140 characters it's a testament to that we imagine the future as being full of robots and spaceships and flying cars instead we get online shopping and social media and those weird Tick Tock videos where people eat the ice cream and say thank you what is that but the same thing is happening with AI it seems we thought we would see this rapid rise of robots and automation but large language models like Chad GPT are suggesting that we might see that we might see machines super intelligence before we can figure out how to make it walk right it might be possible that humans never figure out how to make autonomous robot Butlers before you panic I mean that we aren't the ones that are going to figure it out the AI will so research out of deepmind Nvidia and open AI seems to suggest that training robots with AI and also within simulations seems to be much more effective than just trying to code something that's capable of interacting with the physical world so for most of the computers that you and I interact with most of the things we know about they're coded by humans a human wrote out line by line exactly what that thing should do but recently we've seen this rise of AI of neural Nets now they've been around for a while but a lot of recent developments have really pushed the progress forward and we're seeing it applied in more and more places and they're beginning to solve a lot of problems that we as humans have struggled to solve ourselves and this new thing out of Google deepmind is going to show exactly what's possible if we we rely more on the artificial intelligence than necessarily our own let's take a look so this is the robot in question this thing has been haunting various Google kitchens for I think a few years now I think 17 months they said and there isn't just one I think there's a over a dozen of them running around so this thing takes basic commands like bring me a drink and then it goes and it executes that command the important thing to understand here is that nothing is scripted it's not given a location of the object there isn't some map that it can reference it's the Dai is learning more and more about its environment and how to accomplish new tasks that it has not done before so by now you're familiar with the term large language model or llm it's like Chad GPT is the most famous one but there are many others including pom which is Google's own llm so here when they use the word language just think of something like Chan GPT it's a language model it's something they can understand what you're saying to reason about what you're saying to prioritize certain tasks something that can read some text and then make some insights or decisions about it so that's language AKA large language models next we have a high capacity Vision language model vlms that are trained on the web scale data sets making these systems remarkably good at recognizing visual or language patterns and operating across different languages so this is like Chad GPT with vision it's able to see images and recognize them for what they are by the way we were supposed to get vision of gpt4 and we're still waiting I will need to speak with a manager about this so blms are like Chad GPT they can look at images and make sense of them so anyway so a vlm looks at images and makes sense of them you can think of it as also being able to see the outside world as it's kind of moving around it's able to see what's happening out there if it looks on a bunch of objects scattered across the table it can kind of make sense of which objects it's looking at so and then on top of this we have this new thing in our paper and we're going to look at the paper just a second we introduced robotic Transformer 2 rt2 no relation to R2D2 a novel Vision language action model so that's vla that learns from both web and Robotics data and translates this knowledge into generalized instruction for robotic control while retaining web scale capabilities that's the important part right there Vision vla Vision language action model so I think a lot of people believed in the past that Google is going to be the number one manufacturer of robots at some point that's where everything was going I think this is showing that it's going to be a little bit different more on that in just a second so Vision language action so Vision language is Chad GPT that's able to see now obviously of course I'm saying casual PT because that's the thing that most people are familiar with Google has their own version of that called Palm this is what they use here but it's this last part the action part that's the new and exciting Edition so let's say you tell this thing to go get you an apple it looks down it's season apple and makes a plan pick up the apple and hand it to human gently right so all of that is the vision language part it's the large language model and ability to see images understand what they are so it's it's weird but the I almost want to say that's the old thing that we all know about but it's like a few months old at this point and AI I feel like if it's been a monthly it's old news but you get what I'm saying Chad gbt with vision we kind of understand what that is but my point is that large language models with image recognition we've heard of this before we we kind of know about it we've played around with it and so it's able to look at the Apple and make that plan all with that technology but what's the next part so the next part is the action of picking up that apple and how how to move it in the physical environment to hand it to somebody like where do you place it do you show it in their face or do you kind of like put it out in front of you that's the action part so the important thing here is this is you can think about as several different models kind of working together which is similar to what we're seeing with a lot of other research like out of Nvidia for example that we'll get to in a second but here it says more specifically our work used rt1 robots demonstration data that was collected with 13 robots over 17 months in an office kitchen environment so basically this announcement is of the rt2 so they use the rt1 robot that's been that's the thing that we saw probably not a lot of these images running around Google's kitchens and getting things for people and handing them things and so all those they were recording and collecting data that data was used to improve rt2 we're going to see how we'll improve in just a second so we showed that incorporating Chain of Thought reasoning allows rt2 to perform a multi-stage semantic reasoning like deciding which object could be used as an improvised Hammer a rock or which type of drink is best for a tired person an energy drink now I've been trying to convince various members of my family that as soon as this thing comes out we need one in the house they have their concerns one thing I would like Google and the mind to know is examples like this which object could be used as an improvised hammer a rock are not great examples to use from a marketing standpoint that makes my job harder think of things like plucking a flower to bring as a gift for example that would be much better so this is the action part of it to control a robot it must be trained to Output actions we addressed this challenge by representing actions as tokens in a model's output similar to language tokens and describe actions as strings that can be processed by a standard natural language tokenizers shown here so we'll look at the paper in a second one thing to understand is for example the reason that the large language models are able to be so effective at reasoning and at speaking is the way that they sort of save these tokens or words when it comes to large language models is that they kind of store them as clusters in relation to other words so for example a large language model might understand that things like cat dog pet that there's some relationship between them if you give it the words big and small it's going to understand that there's some relationship to it there's a great article called large language models explained with a minimum of math and jargon I'll link it down below really great if you want to have kind of a basic understanding of how these things work but you can think of it as sort of understanding all these words with various clusters and so it understands how they relate to each other so for example Google's word vectors capture a lot of relationships like Swiss as to Switzerland as Cambodian is to Cambodia parishes to France as Berlin is to Germany Mouse is to mice as dollar is to dollars so my understanding as far as I can tell is this is something very similar so just like we know that certain words have certain relationships a lot of the movements that a robot can do is gonna have some relationships picking up an orange and picking up an apple is similar is similar to each other there's a lot of there's going to be a lot of similarities you know picking something off a table that's four feet tall and a table that's five feet tall that's going to have a lot of similar similarities so by doing that the robot the AI is able to generalize a lot like if it sees a table it doesn't have to be the exact same dimensions as all the other tables Etc so what that means is each representation of an action that this robot can take becomes a string this ring is basically a sequence of characters and so they're saying we use the same discrete discretized version of robot actions as in rt1 and show that converting it to a string representation makes it possible to train vlm models on robotic data as the input and output spaces of such models don't need to be changed now you might be asking yourself why is this thing working so hard on picking up a ketchup bottle it was a Boston Dynamic robots they're doing backflips and dancing this thing is dumb maybe Google should have kept Boston Dynamics instead of selling it to SoftBank SoftBank by the way is notorious for losing billions of dollars on various high-tech Investments including wework Uber fare.com Etc slack those are some of the ones I know and why did Google sell Boston Dynamics to SoftBank in 2017. they haven't sold any companies since 2012 they're not strapped for cash so we're gonna look at the paper in just a second but I do want to kind of mention this and what I'm going to say is pure conjecture this is my guess I don't know if any of this is true but I think I have an idea of where Google is going with this so Google sold by Dynamics in 2017. now 2017 is right around the time when that attention is all you need paper came out and Google researchers were the ones behind it and that was the paper that really moved the AI field forward specifically the large language models that address things like attention Transformers it was a big breakthrough that allowed these large language models to kind of not forget what they're talking about so they're able to coherently keep speaking and finishing their sentences they call it attention and that was that paper was probably a big part of why we're having this sort of massive llm explosion right now but my guess is that Google realized that you know building robots is going to be hard expensive and it's going to be hard to scale globally so I think they came up with an idea and I think that's the idea that we're seeing here so they wanted to design one AI model that was based on llms that will be able to hear any command in any language and then carry it out in any environment and this is the important part in any body meaning like its physical body what it looks like like if this thing looked humanoid it would would not have to have a brand new AI model control it the same AI model could jump into a new you could take whatever this learned plug it into a new body it would have to relearn some of the skills but the bulk of it is already in there it's the large language model plus Vision so Google does not have to build physical robots it can provide this as a service any company that wants to make robots or any manufacturing plant that's producing anything whether they're in construction or military or elder care or manufacturing whatever your robot looks like you just plug and play into the Google AI system and this will run it for some fee it's robotics as a service meanwhile each one of those robots will collect data and feed it back into Google which is going to allow them to have more data to have more improvements the more robots they have the faster they can improve them Etc and each Improvement that gets rolled out gets pushed out to all the robots so this means if Google has eventually let's say thousands of robots running around all learning collecting data no one's ever going to be able to replicate that data without having a larger Network without somehow being able to launch even more robots that are running even longer to try to get more data to try to improve on it faster keep in mind data is like the most valuable thing when it comes to building the best and most futuristic most advanced AI models people that control unique data are likely going to be able to build unique AI models that no one else can replicate having a real world unique set of data gives you an unbeatable Advantage so I wouldn't be surprised if we're going to see these things pop up everywhere but they're not necessarily going to look like this the form factor the physical appearance could be vastly different it's going to have some grippers it's going to have some claws or some hand that's going to have an ability to see it's going to have to have an ability to see in here and that's about it but all that is ramblings of a Madman I'm sure none of that is true although it would explain a lot so rt2's ability to transfer information to actions shows promise for robots to more rapidly adapt to novel situations and environments in testing are to teach I keep wanting to say R2D2 in testing rt2 models in more than 6000 robotic trials the team found that rt2 function as well as our previous models rt1 on tasks in its training data or scene tasks and it almost doubled its performance on novel unseen scenarios to 62 from rt1's 32 that's the big thing right there what that means is the more robots you have running around Gathering data the better they're getting and also they're getting better faster in other words rt2 robots are able to learn things like we do transferring learned Concepts to new situations rt2 shows that Vision language models vlms can be transformed into powerful Vision language action models which can directly control a robot by combining vlm pre-training with robotic data it leads to significantly better generalization performance and emergent capabilities rt2 is not only a simple and effective modification over existing vlm models but also shows the promise of building a general purpose physical robot that can reason problem solve and interpret information for performing a diverse range of tasks in the real world and I'm not going to go over it here because I already covered it so many times in previous videos but there are many other examples of similar things like this happening and people that are kind of like opening their minds up to this so for example Andre karpathy uh spoke briefly at a event I think it was probably in the Bay Area in San Francisco somewhere called AGI house and he gave some tips for people that are trying to build autonomous AI agents I have that video posted on this channel I'll post it down below if you want to see it but one of the things that was very interesting that he was saying is that you know he's been has been doing this for a long long time and he was saying that a lot of things that they were doing with reinforcement learning and various autonomous driving vehicles that he was working on he previously worked at Tesla now he's at open AI of course he was saying that a lot of it seemed he didn't say it was wrong or it was a and I forgot how he said but he basically said that that might have been not the correct way to go what he should have been doing this whole time is building large language models and Nvidia is doing the same thing by using GPT for a large language model to create this autonomous agent that runs around Minecraft and is able to learn and code new abilities for itself so basically what I'm saying is before I think we thought that each of these things is going to be like its own AI agent its own it's its own AI model that is going to be unique and separate and now more and more it's seeing like no you just take a large language model this thing that's able to think you just plug it into something else that you wanted to do and then it just does that and yeah you have to give it some tools you have to give it some training you have to you know let it figure out how to do stuff but that's the brain and we're just going to figure out okay how do we hook up all the other things that we need to that brain and then it can function all right so I'll leave these links below for those of you that will not read the full paper looks like they were pulling a lot of these like important things and just putting that in the blog post the other thing that I'm not sure I mentioned but Palm is one of the best large language models for being able to talk in different languages instead of being really good at one language like English and maybe not so good at other ones not only is it pretty good at all of them like the the spread isn't as big as some of the other llms where like English is number one and then it barely speaks some other language this is a little bit better it's it's more passable at more languages but also it's able to just flip back and forth very easily and I think that's going to be here because you know think about the scale of launching something like this Worldwide having to go into you know China versus the various European countries versus latim versus Africa you don't want to be dealing with a lot of different languages and having to build a team that speaks the languages trying to train them up so they can train the other people how to use the robots no it's all in one you just you just speak to the robot in whatever language you're comfortable in you're speaking in natural language you tell go fetch me a drink and it goes if you need to adapt it to a factory then you might have to you know maybe explain in detail kind of what you wanted to do you know you pick up the doll head here and you put it on the body and then you put it back in a conveyor belt or whatever but as soon as this is launched as soon as this is like ready for showtime and we have some sort of robots that are capable of running this or maybe even like just these robots that we're seeing here like there's no other friction in terms of how fast you can scale it globally there's no language friction there's no training friction necessarily because most of it you know you're just gonna talk to it so I think when this launches it could it could scale really fast while a Brute Force approach might entail collecting millions of robotic interaction trials the most capable language and vision language models are trained on billions of tokens and images on the web an amount unlikely to be matched with robot uh with robot data in the near future yeah I mean this is the big breakthrough that this thing has we're really good at this we're we're really good that the LMS Envision not so much at the actual robotic thing so have the llm handle more of the robotic interactions to this end we explore an approach that is both simple and and surprisingly effective and that is training the LM models to to create dialogue to Output the low-level robotic actions when Elon Musk was talking on Twitter spaces about launching the X AI his new AI company he briefly mentioned that there were a lot of things that people were doing on their way to Ai and AGI that based on new knowledge it seems like it will be a lot simpler than a lot of people thought and he didn't explain what he meant by that it was kind of cryptic but I wouldn't be surprised if he's kind of referencing the same exact thing if he's saying that a lot of this is just sticking llms as pilots in various things and making it work for people that are returning to my channel I think I bring up the study like a couple times per video Voyager the Minecraft Ai and I apologize if you've seen this before but I feel I feel like this is going to be the core to a lot of different things so here they use gpt4 as a reasoning engine to make a lot of decisions about how to do stuff and they use another instance of gpt4 again to to write code and test the code code here being little skills that you can use in Minecraft and the results are incredible this thing continues to learn continues to improve and just keeps going by the way this person here I believe linkshi fan that's Dr Jim fan on Twitter a great follow if you're interested in all of this stuff that's his project among everybody else of course on on here but he was the one that actually alerted me to this thing internet scale text image and video data can be used to help the robot develop better common sense no special architecture means the model isn't restricted to specifically formatted robot Control Data the full Firepower of prompt engineering is now unlocked Chain of Thought Vector DB no gradient architectures you name it the simplest approach almost always performs the best in modern AI anyways great person to follow well known in space I believe he worked with Andre carpathy and I'm excited where they're going to take things next very exciting so this is interesting so one of these um so for the people that are listening right now there's an image of this robot completing some tasks and one of them has a bunch of flags on the table you know there's the German flag the American flag and so the prompt on one of them says move banana to Germany and so the robot you know moves the banana to Germany or on top of the German flag rather then another one says put strawberry into the correct bowl and there's several bowls somewhere for you know apples it looks like some of berries so he puts it in the correct bowl and there's a bunch of toys on the table it says pick robot he picks up the robot toy amongst various other toys it is very simple because this is something that's pretty obvious for large language models and this was something that was incredibly difficult to do with with robotics I've been following the space for decades I know some people that are researching some of this stuff and I mean it didn't seem like there was too much momentum I mean it always seemed like I mean there's always progress and always Improvement but it was kind of like incremental it was kind of crawling along now it almost seems like we had this breakthrough with llms and now it's just diffusing across everything and making everything a little bit better on this page of the paper they're showing various they're showing how the rt1 performed versus the rt2 and they're using two different llms so palm e 12 billion parameters shown here in green and there's another one which is the rt2 robot with the pal e x 55 billion parameter model and this is showing that almost doubling of how well it does once they took hold the data from rt1 they fed back into the rt2 and they're testing it how well it performs with tasks it's seen before so if you actually to pick up an apple how well does it perform at picking up an apple then they're testing it with unseen objects meaning new objects unseen backgrounds unseen environments and on average we're seeing that yeah those things do about double I believe they said from 32 to 62 or something like that but it's a near doubling of its performance on unseen novel tasks which is the big metric here if we can if it's able to just figure out how to do stuff that's never done before you know on first glance that's that would make it extremely extremely valuable then they briefly mentioned some limitations of this program so basically saying while it's able to pick up a lot of new skills one things that doesn't seem to be able to acquire right now is performing new motions so various motions that it can do with its hand it doesn't pick up on how to do new motions an exciting direction for future work is to study how this could be acquired with for example videos of humans so it watches you know a billion hours of humans picking stuff up and it's like oh you can move your elbow this way or whatever another thing they mentioned of course they're running this in real time is the computational cost is very high but in the future they're hoping that if they're able to if they're able to explore quantization and distillation techniques they might be able to run it at higher rates or lower cost Hardware I mean with the way things are progressing I think that this will the the price to run stuff like this will keep will keep dropping and if you think about how valuable something like this would be I mean would this be more valuable than than Google right now is worth than most tech companies are right now I mean if these are able to completely automate all human labor not only that but scale it infinitely because we can just create more of these I think Elon was talking about that saying that his bot if it's released could be worth more than full set of drying cars or worth more than Tesla is I think his point was they would completely change the GDP calculation how we calculate the output of a country but I'm gonna wrap it up right there the last thing I do want to mention that one thing they didn't talk about here is that Nvidia and deepmind itself in fact they're utilizing simulations to teach these robots how to do stuff so instead of training them in real time so for example the with the R2 with the R2 D2 or whatever rt2 that was trained in real time so it walked around picked this thing up and that was one piece of data one thing that Nvidia and openai and deepmind themselves have done is they were they would train some of these robots in a simulation where time could obviously run a lot faster so they can you know train 42 years in a span of minutes for example I think that was one of the was one of the numbers that I heard with Nvidia and that has yielded some very interesting very good results because these robots you know you drop them in there short time passes here you know on Earth or whatever and they when they come out they're like they're ready to go and they're very effective and they have all these moves that they've learned and then you take that you plug it into their physical robot body and it tends to work pretty well you know one thing since the real world is a little bit more chaotic there might be little variations here and there like for example even if you think about something like electrical currents right in a simulation if you just set it to be the same all the time it might that might be different from how the real world is maybe there's some like low connection that's misfiring maybe the circuit's a little better the battery's a little low maybe there's some wind maybe there's some friction Etc but they're even getting around that by randomizing a lot of the factors in a simulation so what they're seeing with that is these robots are coming out more robust they're even better able to handle real world conditions because of because of that those random factors that they've witnessed in the simulation which which was not the whole thing in Dragon Ball Z that hyperbolic Time Chamber that they would go in this hyperbolic Time Chamber and they would train there for like 100 years and only like a day would pass and then they would come out and be these like super strong people I mean Dragon Ball that was in the 90s they figured this out in the 90s anyways thank you for watching I'll talk to you next time my name is West Ralph subscribe

Original Description

#deepmind #ai #robotics 🔥 Get my A.I. + Business Newsletter (free): https://natural20.com/ [LINKS] https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action https://robotics-transformer2.github.io/assets/rt2.pdf https://openai.com/research/gpt-4 https://www.understandingai.org/p/large-language-models-explained-with https://voyager.minedojo.org/assets/documents/voyager.pdf https://twitter.com/DrJimFan [MENTIONED VIDEOS] Minecraft AI - SELF-IMPROVING 🤯 autonomous agent: https://www.youtube.com/watch?v=7yI4yfYftfM LLMs as Tool Makers [LATM] - GPT-4 *UPGRADES* lower AI Models. https://www.youtube.com/watch?v=qWI1AJ2nSDY Sam Altman on UBI and Massive Job Losses from AI Automation https://www.youtube.com/watch?v=5Nsqv3FWXio [TIMELINE] [00:00] Intro [02:04] Using LLMs as a "brain" [05:28] Robotic Control [09:03] What's the Point?! [12:40] The Paper [18:15] Dr Jim Fan [19:30] Examples and Limitations [22:59] Training in Simulations

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Wes Roth · Wes Roth · 50 of 60

← Previous Next →

Which Vanguard index fund to buy? (hint: it's the one Warren Buffett recommends)

Which Vanguard index fund to buy? (hint: it's the one Warren Buffett recommends)

What does PALANTIR do - Palantir Stock, Founder, Controversy Explained Simply (plus why I'm BUYING).

What does PALANTIR do - Palantir Stock, Founder, Controversy Explained Simply (plus why I'm BUYING).

Paypal misinformation fine ($2,500) - Close Your Accounts ASAP!

Paypal misinformation fine ($2,500) - Close Your Accounts ASAP!

China Was Just Sent Back to the Dark Ages | US starts aggressively cutting ties

China Was Just Sent Back to the Dark Ages | US starts aggressively cutting ties

ChatGPT Business Ideas - How I Use ChatGPT to make money

ChatGPT Business Ideas - How I Use ChatGPT to make money

ChatGPT Explained - The AI revolution is happening right now... [ chat gpt ]

ChatGPT Explained - The AI revolution is happening right now... [ chat gpt ]

ChatGPT Banned - New York blocking network access to ChatGPT

ChatGPT Banned - New York blocking network access to ChatGPT

ChatGPT Trading - this [INSANE] tool A.I. built for me

ChatGPT Trading - this [INSANE] tool A.I. built for me

Small Business Grants for ChatGPT and A.I. (similar to PPP and EIDL in 2023) |

Small Business Grants for ChatGPT and A.I. (similar to PPP and EIDL in 2023) |

How to Make Passive Income with ChatGPT AI

How to Make Passive Income with ChatGPT AI

OpenAI’s GPT-4 Artificial Intelligence = AGI? TRILLIONS of Parameters Plus THIS

OpenAI’s GPT-4 Artificial Intelligence = AGI? TRILLIONS of Parameters Plus THIS

How Nvidia AI Robot Trained 42 Years In 32 Hours And Did THIS | Google DeepMind AlphaCode

How Nvidia AI Robot Trained 42 Years In 32 Hours And Did THIS | Google DeepMind AlphaCode

John Carmack | AGI by 2030 | Will John Carmack's AI company be the one to make it?

John Carmack | AGI by 2030 | Will John Carmack's AI company be the one to make it?

AI Small Business Grants

AI Small Business Grants

Elon Musk attacks OpenAI - here's Sam Altman's response

Elon Musk attacks OpenAI - here's Sam Altman's response

Bill Gates on ChatGPT and OpenAI "The Age of AI has begun"

Bill Gates on ChatGPT and OpenAI "The Age of AI has begun"

Sparks of AGI | Microsoft Researchers claim GPT-4 Is showing "Artificial General Intelligence"

Sparks of AGI | Microsoft Researchers claim GPT-4 Is showing "Artificial General Intelligence"

Elon Musk and Others Call for Pause on AI as GPT-4 shows signs of AGI.

Elon Musk and Others Call for Pause on AI as GPT-4 shows signs of AGI.

Comparing GPT-4 and Google's Bard AI - Who is getting closer to AGI?

Comparing GPT-4 and Google's Bard AI - Who is getting closer to AGI?

Sam Altman on UBI, OpenAI to $100 TRILLION and Massive Job Losses from AI Automation

Sam Altman on UBI, OpenAI to $100 TRILLION and Massive Job Losses from AI Automation

25 ChatGPTs play a videogame...

25 ChatGPTs play a videogame...

NVIDIA's new AI: Better Games, Art and... better life?

NVIDIA's new AI: Better Games, Art and... better life?

Google AI Documents Leak about "Google and OpenAI"

Google AI Documents Leak about "Google and OpenAI"

PaLM 2 vs GPT-4 | why Google is having a hard time catching up...

PaLM 2 vs GPT-4 | why Google is having a hard time catching up...

How To Access ChatGPT Plugins | They are LIVE! (but hidden)

How To Access ChatGPT Plugins | They are LIVE! (but hidden)

Sam Altman to Congress "America HAS to lead the world in AI"...

Sam Altman to Congress "America HAS to lead the world in AI"...

Sam Altman Opening Statement to Congress on AI Regulation

Sam Altman Opening Statement to Congress on AI Regulation

Sam Altman Congress Hearing "AI is the Biggest Threat to Human Race"

Sam Altman Congress Hearing "AI is the Biggest Threat to Human Race"

Tree of Thoughts - GPT-4 Reasoning is Improved 900%

Tree of Thoughts - GPT-4 Reasoning is Improved 900%

Governance of Superintelligence | OpenAI proposes measures for safe AI development.

Governance of Superintelligence | OpenAI proposes measures for safe AI development.

Model Evaluation For Extreme Risks of AI | Google DeepMind and OpenAI Paper

Model Evaluation For Extreme Risks of AI | Google DeepMind and OpenAI Paper

Minecraft AI - NVIDIA uses GPT-4 to create a SELF-IMPROVING 🤯 autonomous agent.

Minecraft AI - NVIDIA uses GPT-4 to create a SELF-IMPROVING 🤯 autonomous agent.

AI Human Extinction Risk - Experts Warn of "Serious Risk"

AI Human Extinction Risk - Experts Warn of "Serious Risk"

LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply

LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply

99.3% of ChatGPT Performance with OpenSource AI - [QLoRA paper]

99.3% of ChatGPT Performance with OpenSource AI - [QLoRA paper]

AlphaFold2 Explained | Google's DeepMind Solves Protein Folding

AlphaFold2 Explained | Google's DeepMind Solves Protein Folding

Illumina AI - ChatGPT for your genome...

Illumina AI - ChatGPT for your genome...

Text to Video Invasion! Runway AI releases GEN 2 text to video.

Text to Video Invasion! Runway AI releases GEN 2 text to video.

LLMs as Tool Makers [LATM] - GPT-4 *UPGRADES* lower AI Models.

LLMs as Tool Makers [LATM] - GPT-4 *UPGRADES* lower AI Models.

AlphaDev - DeepMind AI Discovers Better Algorithms for Foundational Computing

AlphaDev - DeepMind AI Discovers Better Algorithms for Foundational Computing

OpenAI GPT-4 Function Calling: *HUGE* Potential

OpenAI GPT-4 Function Calling: *HUGE* Potential

GPT-4 leaked! 🔥 All details exposed 🔥 It is over...

GPT-4 leaked! 🔥 All details exposed 🔥 It is over...

Elon Musk announced XAI - the answer to OpenAI = X.AI

Elon Musk announced XAI - the answer to OpenAI = X.AI

Andrej Karpathy GPT - Advice for building AI agents

Andrej Karpathy GPT - Advice for building AI agents

TEST TO SEE IF AI CAN MAKE $1,000,000 (modern Turing test)

TEST TO SEE IF AI CAN MAKE $1,000,000 (modern Turing test)

ChatGPT custom instructions are *POWERFUL* Replace AutoGPT and BabyAGI?

ChatGPT custom instructions are *POWERFUL* Replace AutoGPT and BabyAGI?

WORLDCOIN LAUNCH is starting! Backed by Sam Altman of OpenAI.

WORLDCOIN LAUNCH is starting! Backed by Sam Altman of OpenAI.

WORLDCOIN ORB - I went to L.A. to get my eye scanned for WorldCoin [my experience]

WORLDCOIN ORB - I went to L.A. to get my eye scanned for WorldCoin [my experience]

The Biggest Week of AI News In Months!

The Biggest Week of AI News In Months!

Google Deepmind RT 2 - Using LLMs to Build Thinking, Learning Robots

Google Deepmind RT 2 - Using LLMs to Build Thinking, Learning Robots

AI News is Getting *WEIRD* Human Brain Matter in Chips. OpenAI tutorial. Amazon unleashed it's AI.

AI News is Getting *WEIRD* Human Brain Matter in Chips. OpenAI tutorial. Amazon unleashed it's AI.

GPT 5 release date 🔥 might be closer than we think | OpenAI applies for GPT-5 Trademark in the US.

GPT 5 release date 🔥 might be closer than we think | OpenAI applies for GPT-5 Trademark in the US.

AI Agents Simulate a Town 🤯 Generative Agents: Interactive Simulacra of Human Behavior.

AI Agents Simulate a Town 🤯 Generative Agents: Interactive Simulacra of Human Behavior.

Proof that AI Understands? 👀 Andrew Ng on LLMs building mental models, Othello GPT, Geoffrey Hinton

Proof that AI Understands? 👀 Andrew Ng on LLMs building mental models, Othello GPT, Geoffrey Hinton

OpenAI acquires Biomes 👀 an open-source MMORPG. ChatGPT plus Minecraft? 🔥

OpenAI acquires Biomes 👀 an open-source MMORPG. ChatGPT plus Minecraft? 🔥

OpenAI announces FINETUNING 👀 for ChatGPT

OpenAI announces FINETUNING 👀 for ChatGPT

Autonomous AI Agents - why YOU should be building them... and HOW.

Autonomous AI Agents - why YOU should be building them... and HOW.

ChatGPT Enterprise - OpenAI launches the next BIG thing

ChatGPT Enterprise - OpenAI launches the next BIG thing

HOODWINKED - AI gets away with MURDER 👀 GPT-4 is an effective killer...

HOODWINKED - AI gets away with MURDER 👀 GPT-4 is an effective killer...

Install Open Interpreter in 2 min | The free, open source CODE INTERPRETER!

Install Open Interpreter in 2 min | The free, open source CODE INTERPRETER!

This video discusses Google Deepmind's RT 2 model, which uses LLMs to enable robots to learn and execute tasks without prior knowledge. The model demonstrates promise in rapidly adapting to novel situations and environments, with applications in robotics, artificial general intelligence, and multimodal learning. Viewers can learn how to build and train LLM-based robots, design effective prompts, and develop multimodal LLMs.

Key Takeaways

Build a basic LLM model
Integrate the LLM model with robotics and vision
Train the model with videos of humans
Fine-tune the model for specific tasks
Test and deploy the model in novel environments

💡 The use of LLMs in robotics enables rapid adaptation to novel situations and environments, with potential applications in artificial general intelligence and multimodal learning.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Indians Under House Arrest in America? 😱 Immigration Crisis Explained | SumanTV Classroom

SumanTV Classroom