AI Robot's ChatGPT moment at 2024?

AI Jason · Beginner ·🛠️ AI Tools & Apps ·2y ago

Skills: LLM Foundations80%LLM Engineering80%Fine-tuning LLMs70%Prompt Craft70%Agent Foundations60%

Key Takeaways

The video discusses key Robotic AI breakthroughs and predicts 2024 as the year of Physical AI agents, with tools like GPT 4V, Neuronet, Transformers, and large L models, and concepts like paradigm shift, physical intelligence, and cognitive intelligence.

Full Transcript

2023 has been the year of large larage model where we see multiple different breakthroughs at a weekly basis it really created paradigm shift that AI can also do creative Innovative work rather than just repetitive boring tasks and with all those developments as we be able to see the same level of takeoff in robots and in body a the numerous smartest people in the field are betting big on robotics for 2024 Jim fan who is a senior AI researcher from Nidia predicting we are three years away from the GPT moment for physical AI agents as in 2023 their huge amount of foundation and platforms has been built to enable robotic development from multimodel larger model like GPT 4V that can work with normal robot arms and we start seeing algorithm that can Bridge gap between system one and system 2 syncing for robotic Ai and also insane amount of progress on on robot hardware and Brad at Cook who is the CEO of leading robotic company called Figure also post a tweet a couple days ago where he said last 90 days I'm watching robots perform complex task entirely on neuronet that he didn't think was feasible until the end of decade the beginning of 2022 he believed that we have reliable humanoid Hardware way before we can have reliable AI they can control the robot to complete household tasks but now that thinking has really changed he feels strongly that we likely will have reliable Ai and hardware at the same time and 2024 will be the year of embodi AI on the other hand I'm sure you're all seen this impressive demo from mobile Aloha where they built a impressive mobile robot Hardware with less than 30k which can already complete numerous household tasks like washing clothes cooking egg and shrimp although most of those tasks are actually Clon by human rather than done autom by the robot but we also saw some more autom example from leading AI robot company like figure where robot can actually use coffee machine to make coffee autonomously with voice control as well as Tesla robot where showcase this impressive hand coordination movement so this start got me thinking what has been the key breakthrough in the robotics AI past few months and how far we actually are from Deployable human robots I did a bunch of different research and would love to share this findings with you so let's get it we can break down the robotic AI into two buckets one is physical intelligence including how the robot can do the hand coordination to pick up items or ability to balance and working to certain destination and the other is cognitive intelligence how can robots reasoning and making decisions based on the environment and tasks and both aspects got loads of breakthrough in the past few years especially in terms of generalization which is how can we have general purpose robots that can do multiple different tasks instead of just specialized in one specific task CU at the moment robots are great Specialists but poor generalist typically you have to train a model for each task and environment even just changing one variable like the background or environment can require start from scratch training but as we know the real world environment is always Dynamic and continuous changing and it's incredible and inefficient for us to pre-program all the different edge cases and this paradigm shift from specialist to journalist has been majorly driven by the dment of Transformers and large L model let's take cognitive intelligence as example there were plenty of developments in AI agents that showcase powerful decision making and planning ability for example the research paper called Voyer where they build an embod agent in the digital world Minecraft the agent can continuously explore the world learn new skills and accomplish task at even superhuman pace and all those cognitive ability developed through those digital AI agents can later be transferable as we develop physical AI agents and on the other side we already see company like Boston Dynamic start equipping their robot dog spot with large language model for better cognitive ability and deliver new experience for the end users may I have the pleasure of knowing your names I'm Matt and that's v a pleasure to meet you Matt and vacher shall we commence our journey the charging stations where spot robots rest and recharge is our first point of Interest follow me Gentlemen let's proceed to the Rock P similarly we see lots of breakthrough in terms of robotic control which is how can the robot learns to complete multiple different physical tasks back 2018 the state of art of robotic learning is called QT opt which method that use reinforced learning to learn how to do a specific task well it was quite efficient compared with the previous method but still with this method we B to train new model for a different task which is not efficient but couple years later April 2021 a research paper called Mt op were introduced which is paradigm shift from single to multitask learning so mtop represents multitask robotic reinforced learning Mt Ops Revolution this by enabling single robot to learn an excute multiple task by leveraging Shar learning principles this means instead of learning from scratch for every single new tasks the robot can apply Knowledge from one task to another much like how a chef using skills from making pastry to also Bak bread and this shared learning not only making the training more data and time efficient it also result in robot that can adapt to variety of tasks in Dynamic environment so it was huge step forward but still quite Limited in many ways and real breakthrough happening at December 2022 with where Google introduced rt1 so rt1 adopt Transformer architecture it took a nicing both inputs and outputs which basically transforming the camera image the task instruction it is was given and also the Mortal commands into a language that robot AI can't understand this represents a huge leap in the robotic learning proping us towards highly generalized robotic intelligence because now the robots understanding of the word and its task become deeply integrated with language meing how human prec reive and interact with their environment this allows robots to not just perform tasks that they were directly trained on but to generalize an SQ Tas that they were never seen before it's like giving a robot the ability to read a recipe book and cook meal is never cooked before and a result is that this type of general purpose approach I'll perform all the previous methods in both specialized tasks as well as unseen tasks and this is very similar to the phenomenon that we saw from large language model that a general purpose large language model that train on a diverse and vast data sets can outperform specialized AI model on task that they were specifically train on in 6 months later on July 2023 building on rt1 rt2 was introduced rt2 combine a visual language model that is pre-trained on extensive web scale internet data like videos and image with rt1 which original trained on specifically robotic data and this give artitude a Nuance understanding of visual cues and natural language far beyond is original robotic training data set and with rt2 robots now have cognitive ability to interpret complex commands and perform sematic reasonings they can identify different objects and even use some of the objects as tools to complete task with uning objects Dynamic backgrounds and even Dynamic environments and again at the heart of rt2 is the transformation of actions into tokenized strength which align robotic control with model's linguistic capabilities this means actions are described in a language that robot can understand enabling to adapt actions as easily as we form sentence and the result is that the robot can not only Excel a task that it's been originally trained on but also generalized to new unseen environment with remarkable accuracy r2's real world application showcase a successful rate leap from rt1 32% to 62% making a significant step towards truly intelligent adaptable robotic AI but not only that just 2 months after rt2 was introduced at October 2023 Google announced a massive robotic training data set called open X embodiment data set it is collaboration across 20 different institution to get data from 22 different robot embodiments demonstrating more than 500 skills and and 150,000 task across more than 1 million episodes and as contrast rt1 was trained on only 700 tasks so this is a massive training data set and more importantly it is C from 22 different robot embodiments and then Google introduced RTX where between rt2 and rt1 model with this massive new diverse data set and by just having more diverse and bigger data sets outperform rt2 by 300% in emerging skill evaluation it is crazy that those level of improvement happen just within two months time and most importantly it is following the scaling law which means with bigger more diverse data set you get those Improvement in term of performance for free without any fundamental change this showcasing how important training data is for robotic AI progress and just as I'm making this video Google just released a few new announcements for robot AI one of them is called Auto RT which is a method that can potentially generate huge amount of training data from real word it basically use visual language model and large language model to Ure different robots to continuously complete different tasks and use them as shared training data it is still pretty early but can be huge if it does generate huge amount of training data last but not list even though development of Transformers and large larage model made huge progress on both cognitive intelligence as well as midlevel physical intelligence it often fell short on real low-level dextrous skills like hand manipulation with Q4 2023 there also research advancement that showcase potential for large Lang model like gbd4 to design reward function to train robot lowlevel textural skills at superhuman level it basically used large larage model like GPD 4 to generate reward function to train robot in reinforced learning and reward function is a core part of reinforced learning where it will tell the agent and robot whether they're doing things right or wrong it's pretty similar to how you train a dog to give it a little tra when it does something right so they get GPD 4 to initially design a reward function for certain actions and then they will simulate this robot actions for thousand times and send better results in real time to lar L model as feedback so that it can iterate the reward function and the result is stunning it is able to outperform expert human engineer on designing those reward function 83% of their Benchmark with average Improvement of 52% as this is the first time feel like we are finally breaking morx Paradox it is concept introduced 30 Years Ago by H marvik who was a leading robotic scientist his finding is basically it is comparatively easy to make computers achieve addult level performance on intelligent tasks or playe Checkers but much more difficult or almost impossible to give them a skills of one year old when it comes to perception and mobility in short it basic means problems that feels very easy to human are actually very hard for computers like planning recognizing someone walking or grapping something the problem that feels really hard to human are actually easy for computers like math and Logics and the theory behind this Paradox is that easy problem like walking running hand manipulation it is something that has been deved by human hundreds of thousands years ago they almost became part of intuition and very very hard to translate bad to computers but hard problems like math and Logics are skills that has been recently developed by human and fairly straightforward to translate to computer but with the recent development finally it feels like we are on the path to get away from this Paradox that has been cursed robotic industry for past 30 years so those are key breakthrough from the past few months and with this pace of accelerated development it let me feel really optimistic that robot can possibly have the charb moment in the next 12 to 24 months and many leading robotic company like figure Tesla ectronic or have plan to start deploying robots in Real World scenario like manufacturing or Logistics and once these robots get deployed in real world scenario we will start collecting huge amount of training data they can again speed up the learning curve of robots so it is truly very exciting time so when do you think will be the chat GPT moment for robots am I way too optimistic or too conservative please comment below and let me know I hope you enjoy this video I continue posting interesting purchase and findings in AI so please subscribe if you want to keep updated thank you and I see you next time

Original Description

What were key Robotic AI breakthroughs? Would 2024 be the year of Physical AI agents? I researched & summarised key Robotic AI breakthrough. 🔗 Links - Follow me on twitter: https://twitter.com/jasonzhou1993 - Join my AI email list: https://crafters.ai/ - My discord: https://discord.gg/eZXprSaCDE 👋🏻 About Me My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com #gpt4v #robotics #autogen #gpt4 #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #bestaiagent #chatgpt #agentgpt #agent #babyagi

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI Jason · AI Jason · 30 of 60

← Previous Next →

Build Your Own Auto-GPT Apps without coding Step by Step (Dust.tt Tutorial)

Build Your Own Auto-GPT Apps without coding Step by Step (Dust.tt Tutorial)

AutoGPT tutorial: Build your personal assistant WITHOUT code (Via Relevance AI)

AutoGPT tutorial: Build your personal assistant WITHOUT code (Via Relevance AI)

Create your own AI girlfriend that talks ❤️

Create your own AI girlfriend that talks ❤️

How to build with Langchain 10x easier | ⛓️ LangFlow & Flowise

How to build with Langchain 10x easier | ⛓️ LangFlow & Flowise

I build an autonomous researcher via GPT | Langchain ⛓️ Tutorial

I build an autonomous researcher via GPT | Langchain ⛓️ Tutorial

Smol AI tutorial in 5 mins | Build ENTIRE codebase with a single prompt

Smol AI tutorial in 5 mins | Build ENTIRE codebase with a single prompt

Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps

Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps

How to let GPT control anything & 10x powerful | 8 mins tutorial about GPT funtion calling

How to let GPT control anything & 10x powerful | 8 mins tutorial about GPT funtion calling

Extract data & automate EVERYTHING | 10x GPT function calling power

Extract data & automate EVERYTHING | 10x GPT function calling power

Finally, an AI agent that actually works

Finally, an AI agent that actually works

"okay, but I want GPT to perform 10x for my specific use case" - Here is how

"okay, but I want GPT to perform 10x for my specific use case" - Here is how

"Wait..this AI Agent does research for you 24hrs without hallucination?!" - Here is how

"Wait..this AI Agent does research for you 24hrs without hallucination?!" - Here is how

"How to give GPT my business knowledge?" - Knowledge embedding 101

"How to give GPT my business knowledge?" - Knowledge embedding 101

“Automation 2.0 coming…No more boring data entry job”

“Automation 2.0 coming…No more boring data entry job”

"How to 10x chatbot UX? 🤖 🖼️ " - Add Image Responses to GPT knowledge retrieval apps

"How to 10x chatbot UX? 🤖 🖼️ " - Add Image Responses to GPT knowledge retrieval apps

“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

"Next Level Prompts?" - 10 mins into advanced prompting

"Next Level Prompts?" - 10 mins into advanced prompting

Build AI agent workforce - Multi agent framework with MetaGPT & chatDev

Build AI agent workforce - Multi agent framework with MetaGPT & chatDev

How to scale your AI automation pipeline

How to scale your AI automation pipeline

AI agent manages community 24/7 - Build Agent workforce ep#1

AI agent manages community 24/7 - Build Agent workforce ep#1

Autogen - Microsoft's best AI Agent framework that is controllable?

Autogen - Microsoft's best AI Agent framework that is controllable?

StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?

StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?

AI agent + Vision = Incredible

AI agent + Vision = Incredible

After 7 days letting AI agents control my email inbox... 📮

After 7 days letting AI agents control my email inbox... 📮

How to use New OpenAI DevDay features - GPT4V x TTS demo tutorial

How to use New OpenAI DevDay features - GPT4V x TTS demo tutorial

What is Q* | Reinforcement learning 101 & Hypothesis

What is Q* | Reinforcement learning 101 & Hypothesis

"Research agent 3.0 - Build a group of AI researchers" - Here is how

"Research agent 3.0 - Build a group of AI researchers" - Here is how

GPT4V + Puppeteer = AI agent browse web like human? 🤖

GPT4V + Puppeteer = AI agent browse web like human? 🤖

Real Gemini demo? Rebuild with GPT4V + Whisper + TTS

Real Gemini demo? Rebuild with GPT4V + Whisper + TTS

AI Robot's ChatGPT moment at 2024?

AI Robot's ChatGPT moment at 2024?

GPT5 unlocks LLM System 2 Thinking?

GPT5 unlocks LLM System 2 Thinking?

The REAL cost of LLM (And How to reduce 78%+ of Cost)

The REAL cost of LLM (And How to reduce 78%+ of Cost)

OpenAI's Agent 2.0: Excited or Scared?

OpenAI's Agent 2.0: Excited or Scared?

Real time AI Conversation Co-pilot on your phone, Crazy or Creepy?

Real time AI Conversation Co-pilot on your phone, Crazy or Creepy?

INSANELY Fast AI Cold Call Agent- built w/ Groq

INSANELY Fast AI Cold Call Agent- built w/ Groq

AI Employees Outperform Human Employees?! Build a real Sales Agent

AI Employees Outperform Human Employees?! Build a real Sales Agent

Future of E-commerce?! Virtual clothing try-on agent

Future of E-commerce?! Virtual clothing try-on agent

Unlock AI Agent real power?! Long term memory & Self improving

Unlock AI Agent real power?! Long term memory & Self improving

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

"Make Agent 10x cheaper, faster & better?" - LLM System Evaluation 101

"Make Agent 10x cheaper, faster & better?" - LLM System Evaluation 101

Claude 3.5 struggle too?! The $Million dollar challenge

Claude 3.5 struggle too?! The $Million dollar challenge

Make your agents 10x more reliable? Flow engineer 101

Make your agents 10x more reliable? Flow engineer 101

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

AI process thousands of videos?! - SAM2 deep dive 101

AI process thousands of videos?! - SAM2 deep dive 101

"Wait, I'm using OpenAI Structured Output wrong ?!" - Advanced Structured Output tutorial

"Wait, I'm using OpenAI Structured Output wrong ?!" - Advanced Structured Output tutorial

How to use Cursor AI build & deploy production app in 20 mins

How to use Cursor AI build & deploy production app in 20 mins

Best Cursor Workflow that no one talks about...

Best Cursor Workflow that no one talks about...

This is how I scrape 99% websites via LLM

This is how I scrape 99% websites via LLM

Better than Cursor? Future Agentic Coding available today

Better than Cursor? Future Agentic Coding available today

EASIEST Way to Train LLM Train w/ unsloth (2x faster with 70% less GPU memory required)

EASIEST Way to Train LLM Train w/ unsloth (2x faster with 70% less GPU memory required)

1000x Cursor workflow for building apps

1000x Cursor workflow for building apps

Easiest way to build fancy UI with Cursor/Windsurf/Bolt/Lovable

Easiest way to build fancy UI with Cursor/Windsurf/Bolt/Lovable

From $0 to $4m with just 2 people (ComfyUI Crash-course for E-commerce)

From $0 to $4m with just 2 people (ComfyUI Crash-course for E-commerce)

Deepseek R1 - The Era of Reasoning models

Deepseek R1 - The Era of Reasoning models

Yep, o3-mini is WORTH the money - Build your own reasoning agent

Yep, o3-mini is WORTH the money - Build your own reasoning agent

The ONLY way to run your own Deepseek on mobile...

The ONLY way to run your own Deepseek on mobile...

Those MCP totally 10x my Cursor workflow…

Those MCP totally 10x my Cursor workflow…

MCP = Next Big Opportunity? EASIST way to build your own MCP business

MCP = Next Big Opportunity? EASIST way to build your own MCP business

Gemini 2.0 blew me away - The future of Multimodal Model

Gemini 2.0 blew me away - The future of Multimodal Model

The video discusses the potential for a 'ChatGPT moment' in robotic AI, with recent breakthroughs and advancements in LLMs, robotic learning, and perception and mobility. Viewers can learn about the key concepts and tools in robotic AI and how they can be applied to real-world scenarios.

Key Takeaways

Research key Robotic AI breakthroughs
Understand the concepts of paradigm shift, physical intelligence, and cognitive intelligence
Learn about the tools and techniques used in robotic AI, such as GPT 4V, Neuronet, Transformers, and large L models
Apply LLMs to robotic AI tasks
Fine-tune LLMs for specific tasks
Craft effective prompts for LLMs

💡 The video highlights the potential for a 'ChatGPT moment' in robotic AI, with recent breakthroughs and advancements in LLMs, robotic learning, and perception and mobility, and how these can be applied to real-world scenarios.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

How to prepare TIC teacher exams in Spain with AI (oposiciones 2026)

Prepare for TIC teacher exams in Spain using AI with these actionable steps

Why I built a simple AI provider wrapper (and you might too)

Learn why a simple AI provider wrapper is useful and how to build one for streamlined AI integration

Dev.to · zhongqiyue

This ChatGPT Prompt Replaced 3 Hours of PowerPoint Work

Learn to generate pitch-ready presentation decks in 5 minutes using ChatGPT, replacing hours of manual work

This ChatGPT Prompt Replaced 3 Hours of PowerPoint Work

Learn to generate pitch-ready presentation decks in 5 minutes using ChatGPT, replacing hours of manual work

Medium · ChatGPT

AI in Care - Katie Furey, Pairly.com

The Access Group