Smart Navigation - How AI Robots Understand and Explore Environments
Key Takeaways
Facebook researchers developed a language-guided navigation task where an agent follows language navigation directions given by a user to realistically move in the environment, using a sequence-to-sequence baseline model and a cross-model attention model to achieve this task.
Full Transcript
facebook researchers developed a language-guided navigation task where the agent follows language navigation directions given by a user in order to realistically move in the environment they even made a public release version of the code available for everyone on github let's see how they achieve that and some amazing results [Music] this is what's ai and i share artificial intelligence news every week if you are new to the channel and want to stay up to date please consider subscribing to not miss any further news here's another paper presented in the eccv 2020 it is a language guided navigation in 3d environments from the facebook ai research team as the name says they developed a language-guided navigation task for 3d environments where the agent follows language navigation directions given by a user in order to realistically move in the environment in short the agent is given a first-person vision which they called egocentric and a human-generated instruction such as leave the bedroom and enter the kitchen walk forward and take a left at the couch stop in front of the window then using this input the agent must take a series of simple control actions like move forward for 0.25 meters turn left for 15 degrees to navigate to the goal using such simple actions via ln ce lifts assumptions of the original vln task and aims to bring simulated agents closer to reality just to give a comparison current state-of-the-art approaches move between panoramas and cover an average 2.25 matters including avoiding obstacles for a single action they developed two different models in order to achieve such tasks the first one a is a simple sequence to sequence baseline the second one b is a more powerful cross model attention model which we can both see in this picture the first model takes a visual representation of the observation containing depth and rgb features and instructions for each time step then using this information and the instructions given by the user it predicts a series of actions to take denoted as 80 in this image the rgb frames and depth are respectively encoded using two resnet 50 architectures one pre-trained on imagenet and the other one trained to perform point goal navigation then it uses an lstm to encode the instructions from the user astm is the short for long short term memory which is a recurrent neural networks architecture widely used in natural language processing applications due to its memory capabilities allowing it to use previous words information as well these actions a are then fed into the second model the goal of the second model is to compensate for the lack of visual reasoning in the first model which is super important for this kind of navigation application for example you need a good spatial visual reasoning in order to understand an instruction such as to the left of the table your agent needs to know that it first needs to know where's the table and then go to the left of that table which is done using attention attention is based on the common sensical intuition that we attend to a certain part when processing a large amount of information more specifically it is done using two recurrent networks as you can see in this image one network is tracking observations using the same rgb and depth input as the first model while the other network's role is to make decisions based on the user's fed instructions and visual features this time the user's instructions are encoded using a bi-directional astm then they compute a list of simple instructions which is used to extract both visual and depth features following that the second recurrent network uses a concatenation of all the features discussed including an action encoding as inputs and predicts a final action now let's just watch some amazing examples where the agent is following the instructions written below i invite you to check out the public release version of the code under github which i linked in the description of course this was just a simple overview of this new paper i strongly recommend to read the paper linked in the description for more information please leave a like if you went this far in the video and since there are over 90 of you guys watching that are not subscribed yet consider subscribing to the channel to not miss any further news clearly explained if you would like to start or improve with machine learning i've linked all the best online courses in a reporter in the description thank you for watching [Music] foreign
Original Description
This week my interest was directed towards Language-Guided Navigation. Ask any questions or remarks you have in the comments, I will gladly answer to everything!
Subscribe to not miss any AI news and terms clearly vulgarized! Share this to someone who needs to learn more about Artificial Intelligence! Spread knowledge, not germs!
The paper: https://arxiv.org/pdf/2004.02857.pdf
The project: https://jacobkrantz.github.io/vlnce/?fbclid=IwAR2VO1jwjaq4Uydz2O25ZaLXVFjoD46QirYnW1zNeNAJyNkleA0KS_PDBrE
GitHub with code: https://github.com/jacobkrantz/VLN-CE
Follow me for more AI content:
Instagram: https://www.instagram.com/whats_ai/
LinkedIn: https://www.linkedin.com/in/whats-ai/
Twitter: https://twitter.com/Whats_AI
Facebook: https://www.facebook.com/whats.artificial.intelligence/
Medium: https://medium.com/@whats_ai
The best courses to start and progress in AI:
https://www.omologapps.com/whats-ai
Join Our Discord channel, Learn AI Together:
https://discord.gg/SVse4Sr
Support me on patreon:
https://www.patreon.com/whatsai
Chapters:
0:00 Introduction
0:40 Paper explanation
4:26 Examples
5:10 Conclusion
Song credit: https://soundcloud.com/mattis-rodrigue/sans-titre
#deeplearning #eccv #eccv2020
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from What's AI by Louis-François Bouchard · What's AI by Louis-François Bouchard · 51 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
▶
52
53
54
55
56
57
58
59
60
What is Artificial intelligence? | Artificial Intelligence terms explained for everyone 1
What's AI by Louis-François Bouchard
What is Machine Learning? | Introduction to ML for beginners in a minute 2
What's AI by Louis-François Bouchard
What is Deep Learning | Introduction to DL for beginners in a minute 3
What's AI by Louis-François Bouchard
What is Supervised Learning | Machine Learning basics explained for beginners 4
What's AI by Louis-François Bouchard
What is Unsupervised Learning | Machine Learning basics explained for beginners 5
What's AI by Louis-François Bouchard
What is Semi-Supervised Learning | Machine Learning basics explained for beginners 6
What's AI by Louis-François Bouchard
What is Reinforcement Learning | Machine Learning basics explained for beginners 7
What's AI by Louis-François Bouchard
What is Classification | Introduction to Machine Learning for beginners | The Most Used Terms 8
What's AI by Louis-François Bouchard
What is Regression | Introduction to Machine Learning for beginners | The Most Used Terms 9
What's AI by Louis-François Bouchard
What is Clustering | Introduction to Machine Learning for beginners | The Most Used Terms 10
What's AI by Louis-François Bouchard
What is Backpropagation | Artificial Intelligence & Machine Learning Basics for Beginners 11
What's AI by Louis-François Bouchard
What is NLP ? | Introduction to Natural Language Processing for Beginners | Machine Learning 12
What's AI by Louis-François Bouchard
Comparing AGI and Traditional AI: Now and Beyond
What's AI by Louis-François Bouchard
Demystifying Neural Network: A Beginner's Guide to Machine Learning Fundamentals
What's AI by Louis-François Bouchard
Understanding Computer Vision: An Entry-Level Introduction to ML-Driven CV
What's AI by Louis-François Bouchard
Chatbots for Beginners: A Comprehensive Intro to Machine Learning Applications
What's AI by Louis-François Bouchard
What is Image Segmentation ? | Computer Vision & ML Techniques Explained for Beginners 17
What's AI by Louis-François Bouchard
Object Detection Clearly Explained for Everyone
What's AI by Louis-François Bouchard
What is a RNN ? | Introduction to Recurrent Neural Network FOR EVERYONE 19
What's AI by Louis-François Bouchard
What is Transfer Learning ? | Deep Learning Basics Explained for Beginners 20
What's AI by Louis-François Bouchard
Data Science Demystified - An Essential Introduction
What's AI by Louis-François Bouchard
Demystifying Data Mining - A Clear and Concise Explanation
What's AI by Louis-François Bouchard
Decoding Logistic Regression - A Simple and Comprehensive Explanation
What's AI by Louis-François Bouchard
What is the YOLO algorithm? | Introduction to You Only Look Once, Real Time Object Detection 24
What's AI by Louis-François Bouchard
AI or Human? What is the Turing Test
What's AI by Louis-François Bouchard
Genetic Algorithms Demystified - How Algorithms Evolve
What's AI by Louis-François Bouchard
What is Data Labeling ? | Prepare Your Data for ML and AI | Attaching meaning to digital data 27
What's AI by Louis-François Bouchard
Human Pose Estimation in Machine Learning Explained (2D & 3D)
What's AI by Louis-François Bouchard
What is Self-Supervised Learning ? | Will machines be able to learn like humans ? 29
What's AI by Louis-François Bouchard
What are GANs ? | Introduction to Generative Adversarial Networks | Face Generation & Editing - 30
What's AI by Louis-François Bouchard
Introduction to Energy-Based Learning | Yann LeCun Paper
What's AI by Louis-François Bouchard
The Science Behind Google Translate: Understanding Transformers
What's AI by Louis-François Bouchard
Mastering CNNs in 5 Minutes | ConvNets Explained
What's AI by Louis-François Bouchard
Discover the Power of YOLOv4 - Real-Time Object Detection Simplified
What's AI by Louis-François Bouchard
Learn to Draw Real People using AI: Unveiling Future of Image-to-Image Translation
What's AI by Louis-François Bouchard
AI Powers PAC-MAN - The Game Engine-Free Revolution
What's AI by Louis-François Bouchard
This AI makes blurry faces look 60 times sharper! Introduction to PULSE: photo upsampling
What's AI by Louis-François Bouchard
Facebook's TransCoder: Converting Programming Languages with AI
What's AI by Louis-François Bouchard
Transforming Images to 3D Models with AI - Discover PIFuHD
What's AI by Louis-François Bouchard
Optimize Your ML Models - Avoid Underfitting and Overfitting
What's AI by Louis-François Bouchard
Behind the Scenes - Disney's Secrets to High-Res Face Swaps
What's AI by Louis-François Bouchard
Linear Regression in Machine Learning Explained in 5 Minutes
What's AI by Louis-François Bouchard
Style Transfer Better Than GANs! Swapping Autoencoder Explained
What's AI by Louis-François Bouchard
Use AI to Remove Objects from Videos
What's AI by Louis-François Bouchard
OpenAI's Language Generator: GPT | The first AI Generating Text, Code, Websites...
What's AI by Louis-François Bouchard
Autocomplete Images With AI: image-GPT explained
What's AI by Louis-François Bouchard
Turning Reality into Art - AI That Cartoonizes Your Pictures and Videos
What's AI by Louis-François Bouchard
From Portrait to Cartoon - Discover the Power of FreezeG
What's AI by Louis-François Bouchard
Transfer clothes between photos using AI. From a single image!
What's AI by Louis-François Bouchard
Precise 3D Human Pose and Mesh Estimation from a Single RGB Image
What's AI by Louis-François Bouchard
Smart Navigation - How AI Robots Understand and Explore Environments
What's AI by Louis-François Bouchard
Techfitlab Breaks Down Tesla Autopilot, AI, ML, and DL Complexities
What's AI by Louis-François Bouchard
ECCV 2020 Best Paper Award | RAFT: A New Deep Network Architecture For Optical Flow | WITH CODE
What's AI by Louis-François Bouchard
Maximize Business Efficiency with AI / GPT Technology!
What's AI by Louis-François Bouchard
AI Transforms Google Photos into Real-Life Scenes
What's AI by Louis-François Bouchard
Old Photo Restoration Using Deep Learning | 2020 Novel Approach Explained & Results
What's AI by Louis-François Bouchard
This computer vision algorithm removes the water from underwater images !
What's AI by Louis-François Bouchard
DeepFakes in 5 minutes | Understand how deepfakes work and create your own!
What's AI by Louis-François Bouchard
A new brain-inspired intelligent system can drive a car using only 19 control neurons!
What's AI by Louis-François Bouchard
Toonify: Turn Real Faces into Animated Disney Characters
What's AI by Louis-François Bouchard
More on: Reading ML Papers
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
Chapters (4)
Introduction
0:40
Paper explanation
4:26
Examples
5:10
Conclusion
🎓
Tutor Explanation
DeepCamp AI