El Meme que entrenó a una IA | BITS
Key Takeaways
The video discusses Google's solution to predict the depth of a 2D image using a deep learning model trained on data from the 'mannequin challenge' trend, which provides a unique dataset for inferring three-dimensionality in scenes with moving cameras and people.
Full Transcript
[Music] Problem solving. Imagine you're looking at the following scene and I ask you, "Hey, can you tell me which elements are closer to you and which are further away?" Surely you won't have any problem telling me that this person here is closer than this building here, and that in between there's a path that recedes from the person to the building. No problem. In fact, I could ask you to color each point in the scene, assigning a different intensity to each point depending on its distance, resulting in an image like this: a depth map. Okay, how did you manage to do this? Or better yet, how can we get a computer to learn to do this? You know, to infer the three-dimensionality of a scene, we can opt for the biological option: the separation between your eyes. Stereoscopic vision allows you to observe a scene from two different viewpoints, whose images can be combined to triangulate the position of each of the objects. This way we obtain an interpretation of the scene's depth. Easy. And if we didn't have two cameras and if we were working with a monocular vision system, how many miles away do you think it would be? A trillion. Luckily, our friend Einstein would agree that we could use the temporal dimension as if it were another spatial dimension. That is, if we need two different perspectives of the same object and we only have one camera, we could move the camera over time to obtain captures from different angles. This is interesting because it's actually a simpler scheme, similar to what we encounter in our daily lives when we use, for example, our mobile phone camera. So, perfect, problem solved. That's all for today's video. Subscribe! And wait, there's a problem: the method we just discussed has to meet one more restriction: the objects in the scene we're observing must remain static over time as we move the camera. In other words, the points observed must be the same from here to here in order to correctly infer the three-dimensionality of the scene. Otherwise, the whole thing falls apart. Okay, this is a problem because, for example, how could we use such a system to infer depth in a shot where the protagonist is a person who, well, is usually in motion? Let's go back to the starting point because it's true that these two The methods I 've explained can be very useful for inferring three-dimensionality, but of course, when I initially showed you this image, you were able to solve it without stereoscopic vision. Even though you have two eyes, viewing it from your monitor is a flat, two-dimensional image, and the camera isn't moving since it's static. So how did you do it? The reality is that your life experience of observing the world with stereoscopic vision and learning the three-dimensional shape of each object allows you to learn a mental model that you can work with even in situations with little information. In other words, you already have prior knowledge encoded in your mind about how the world is structured in three dimensions and how objects are distributed in a scene—information that will be useful for solving this problem. So what we can do is train a Deep Link model to learn to encode this prior knowledge of how to estimate the depth of a scene with a person. Okay, okay, I know what you're thinking, Carlos. You 're confusing me; we're going around in circles. It's a vicious cycle. Because how do we train this deep learning system? We'll need data pairs so that for each video we have its equivalent depth map. And how do we get that map if the subject and the camera are moving and we can't triangulate? We need to use three-dimensional cameras like the Kinect. Well, it's an option, but these are usually limited to use in closed environments and would offer very little variety of environments to train our system. Oh my god, everything's wrong! At this point, I hope you understand the whole context surrounding the problem we 're dealing with. We're missing data, data that in this case would be scenes of people in varied environments from which we can infer their three-dimensionality to train our deep learning model. Well, pay attention because the answer given by a Google Research team is simply brilliant, accurate, and ingenious. So much so that last week it deserved an honorable mention at the prestigious VPR 2019 conference because what Google has done is connect the need of the problem with the answer, an answer they found by going back to the year 2016, remember the 'mannequin challenge'? If we jog our memories, we recall a time when the trend was to film oneself in increasingly absurd situations, with the camera moving around the three-dimensional space of a scene while everyone remained motionless, waiting to complete the desired dataset. Well, in this case, for them, just hanging around with the mannequin, moving slowly. But fast forward to 2019. This trend has been key to creating the perfect dataset to solve the problem we were discussing. It's available online, waiting to be used. You see Google's offices in this image? Well, up here is the office of the person who came up with this simply brilliant idea. With this data, a DeepLenin model has been trained, which, in combination with other techniques like optical flow computing, is capable of accurately predicting depth maps with moving cameras and people. This could have direct applications in augmented reality tools, similar to what Apple has achieved. Its third version of Léger Kit, which also manages to estimate depth and segment users in real time, is leaving us with numerous examples of quite spectacular augmented reality applications, both in the background of 2000 videos of people performing the mannequin challenge and the trained models have been made available to the public ( both links at the address). However, from this video, I want you to take away the moral: in a sector where data collection is one of the most costly phases in terms of time and resources, ingenuity is the element that can give you the ideal solution to your problem, something worth valuing, unless, of course, all this were orchestrated by Google, that the marketing challenge was a challenge launched by the company in 2016 with the sole purpose of getting users unfamiliar with their actual task to work on creating this data cell. Do you really think this is possible? In that case, I recommend you go directly to this video here where we talk about 'challenge' data and where I explain some of the ingenious ways in which... Companies like Google take advantage of this to obtain your data without your knowledge. For my part, the only information I'm going to ask for is your feedback on whether you liked this video, and that you eagerly await the next video about artificial intelligence, which, as you know, you'll find here. So, USA, go!
Original Description
¿Sabes cuál ha sido la ingeniosa solución de Google para resolver el problema de predecir la profundidad de una imagen en 2D?
--- DATOS Y MODELOS ENTRENADOS ---
https://google.github.io/mannequinchallenge/www/index.html
--- ¡MÁS DOTCSV! ----
💸 Patreon : https://www.patreon.com/dotcsv
👓 Facebook : https://www.facebook.com/AI.dotCSV/
👾 Twitch!!! : https://www.twitch.tv/dotcsv
🐥 Twitter : https://twitter.com/dotCSV
📸 Instagram : https://www.instagram.com/dotcsv/
--- ¡MI TECNOLOGÍA! ----
** Aquí no está toda mi tecnología, sólo aquella que realmente recomiendo. Usando estos links de Amazon yo me llevaré una comisión por tu compra :) **
[Tecnología básica para Youtube]
💻 Portátil - MSI GP72 7RDX Leopard : https://amzn.to/2CDwvgY
📸 Cámara - Canon EOS 750D : https://amzn.to/2CDPqbi
👁🗨 Objetivo 1 - EF 50 mm, F/1.8 : https://amzn.to/2CH7npx
👁🗨 Objetivo 2 - EF-S 18-135mm : https://amzn.to/2DuhL5t
👁🗨 Objetivo 3 - EF 24 mm, F/2.8 : https://amzn.to/2AYAFQm
🎤 Microfono - Blue Yeti Micro : https://amzn.to/2RItA0I
💡 Foco Luz - Foco LED Neewer : https://amzn.to/2AYCM6K
🌈 Luz Color - Tira ALED Light : https://amzn.to/2B2iY2l
[Mis otros cacharros]
📱 Smartphone - Google Pixel 2 XL : https://amzn.to/2RMuY2v
-- ¡MÁS CIENCIA! ---
🔬 Este canal forma parte de la red de divulgación de SCENIO. Si quieres conocer otros fantásticos proyectos de divulgación entra aquí:
http://scenio.es/colaboradores
#Scenio
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Dot CSV · Dot CSV · 54 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
▶
55
56
57
58
59
60
Lo que YA sabes sobre Inteligencia Artificial | DotCSV
Dot CSV
Evento #MadebyGoogle 2017 y la Inteligencia Artificial | DotCSV
Dot CSV
AlphaGo Zero, el nuevo gran hito de DeepMind! | DATA COFFEE #1
Dot CSV
¿Qué es el Machine Learning?¿Y Deep Learning? Un mapa conceptual | DotCSV
Dot CSV
¿Por qué hay que temer DE VERDAD a la Inteligencia Artificial? | DATA COFFEE #2
Dot CSV
CapsNet : Un nuevo algoritmo de Deep Learning | DATA COFFEE #3
Dot CSV
Modelos para entender una realidad caótica | DotCSV
Dot CSV
Creando caras artificiales con GANs mejoradas | DATA COFFEE #4
Dot CSV
La Inteligencia Artificial de Google crea IA mejores que sus ingenieros: NASNet | DATA COFFEE #5
Dot CSV
Regresión Lineal y Mínimos Cuadrados Ordinarios | DotCSV
Dot CSV
IA NOTEBOOK #1 | Regresión Lineal y Mínimos Cuadrados Ordinarios | Programando IA
Dot CSV
Los mejores avances en Inteligencia Artificial del 2017
Dot CSV
¿Cómo engañar a una RED NEURONAL? Ataques Adversarios | DATA COFFEE #6
Dot CSV
IA NOTEBOOK #2 | Ataques adversarios, cómo romper una RED NEURONAL | Programando IA
Dot CSV
¿Qué es el Descenso del Gradiente? Algoritmo de Inteligencia Artificial | DotCSV
Dot CSV
El Robot Sophia ¿Progreso o fraude? | DotCSV
Dot CSV
IA NOTEBOOK #3 | Descenso del Gradiente (Gradient Descent) | Programando IA
Dot CSV
Q&A sobre Inteligencia Artificial - Especial DIRECTO 5000 subs! - #DotEnDirecto
Dot CSV
¿Qué es una Red Neuronal? Parte 1 : La Neurona | DotCSV
Dot CSV
Noticias de Inteligencia Artificial - Marzo | ¡Nuevos vehículos autónomos!
Dot CSV
Q&A sobre Inteligencia Artificial y Youtube - DotCSV
Dot CSV
Noticias de Inteligencia Artificial - Abril | ¡Avances en la movilidad de bots!
Dot CSV
¿Qué es una Red Neuronal? Parte 2 : La Red | DotCSV
Dot CSV
Noticias de Mayo y Q&A sobre Inteligencia Artificial - DotCSV
Dot CSV
Jugando con Redes Neuronales - Parte 2.5 | DotCSV
Dot CSV
Noticias de Inteligencia Artificial - Junio | ¡Predicción de poses 3D con DensePose!
Dot CSV
¿Qué demonios hago en Corea del Sur? - Deep Learning Camp Jeju 2018
Dot CSV
Noticias de Inteligencia Artificial - Jul. Ago. | ¡Brazos robóticos desarrollan destreza!
Dot CSV
¿Qué es una Red Neuronal? Parte 3 : Backpropagation | DotCSV
Dot CSV
¿Qué es una Red Neuronal? Parte 3.5 : Las Matemáticas de Backpropagation | DotCSV
Dot CSV
IA NOTEBOOK #4 | Programando Red Neuronal desde Cero! | Programando IA
Dot CSV
100 MOTIVOS por los que estudiar INFORMATICA | DotCSV
Dot CSV
Directo Noviembre - Q&A de Inteligencia Artificial
Dot CSV
Noticias de Inteligencia Artificial - Sep. Oct. Nov. | ¡Imágenes realistas creadas artificialmente!
Dot CSV
🕵 ¿TE ESCUCHA Google a través del móvil? - Análisis y Experimento
Dot CSV
Experimento en Directo - ¿Nos escucha Google? | Preguntas y Respuestas IA - Directo Navideño
Dot CSV
El 2018 ha sido ABURRIDO...
Dot CSV
Las Redes Neuronales... ¿Aprenden o Memorizan? - Overfitting y Underfitting - Parte 1
Dot CSV
¿Qué hay detrás del #10YearChallenge? - Facebook, Datos y Captchas | DataCoffee #7
Dot CSV
Montezuma's Revenge - ¿Hito del Aprendizaje Reforzado? | Data Coffee #8
Dot CSV
La Inteligencia Artificial No Debe Ver La Tele | BITS
Dot CSV
Noticias de Inteligencia Artificial - Dic. Ene. | ¡Caras artificiales hiperrealistas!
Dot CSV
Directo Febrero - Q&A de Inteligencia Artificial
Dot CSV
GPT-2 El Impresionante Generador de Texto Censurado | Data Coffee #9
Dot CSV
¿Qué veía Claude Monet mientras pintaba en 1873? - CycleGAN | BITS
Dot CSV
Cómo identificar el OVERFITTING en tu RED NEURONAL - Parte 2
Dot CSV
¿La Inteligencia Artificial que hacía TRAMPAS? | BITS 03
Dot CSV
AlphaStar, la IA que domina el STARCRAFT II | Data Coffee #9
Dot CSV
Noticias de Inteligencia Artificial - Feb. Mar. Abr | ¡Dibujo realista desde bocetos!
Dot CSV
Las CRÍTICAS tras la victoria de AlphaStar - Data Coffee #10
Dot CSV
La IA que dio VIDA a la Mona Lisa - Living Portraits
Dot CSV
IA + Cuántica + Nanotecnología - DIRECTO feat. QuantumFracture & SizeMatters
Dot CSV
Aprende a PROGRAMAR una RED NEURONAL - Tensorflow, Keras, Sklearn
Dot CSV
El Meme que entrenó a una IA | BITS
Dot CSV
¿FaceApp te ROBA los datos?
Dot CSV
Así funciona DeepNUDE, la IA que te desnuda - (cGANs y Pix2Pix)
Dot CSV
Generando FLORES realistas con IA - Pix2Pix | IA NOTEBOOK #5
Dot CSV
¿Por qué NO tenemos COCHES AUTÓNOMOS? - (TESLA vs WAYMO)
Dot CSV
Directo Inteligencia Artificial y Coches Autónomos.
Dot CSV
¿Por qué las GPUs son buenas para la IA? | Data Coffee #12
Dot CSV
More on: CV Basics
View skill →Related Reads
📰
📰
📰
📰
AI Skills Every College Student Must Learn Before Graduation in 2026
Medium · SEO
The Writers AI Replaced in 2026 Already Sounded Like AI
Medium · SEO
The Smartest Thing AI Ever Did Was Convince Me I Didn’t Need to Think
Medium · AI
Unleashing the Future: Business Impact of AI Adoptions in Q2 2026
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI