Using PyTorch for Monocular Depth Estimation Webinar

PyTorch · Beginner ·👁️ Computer Vision ·1y ago

Key Takeaways

This video webinar demonstrates the use of PyTorch for monocular depth estimation, utilizing models such as Midas and DPT Beit Large 512 to infer relative depth from a single image and remove background clutter. The webinar covers the application of monocular depth estimation in various fields, including robotics and self-driving cars.

Full Transcript

using pytorch for monocular depth estimation my name is Susan Kaylor and I work in AI technical product marketing at Intel today I have the pleasure of introducing our speaker Bob cheeseboro Bob is a Senior Solutions architect at Intel Bob's industry experience is in software development and AI solution engineering for Fortune 100 companies and National Laboratories for over three decades he is also a hobbyist who has logged over 800 miles and a th000 hours in the field finding dinosaur bones he and his sons discovered the only known crocodilian fossil from the Jurassic period in New Mexico they have also discovered and logged over 200 bone localities and and even described a new Mass bone bed in New mexic Mexico over to you Bob Susan thank you for that intro that was awesome I'm really excited to be here with you guys I want to show you one of the projects I um accomplished with this monocular depth estimation model from hugging face uh using pytorch and so uh to get started if you want to play this uh at home and follow along with your bingo card you can do that I've uh highlighted the git repo here in the URL it's called dinosaur depth map clipping and and uh I'll show you the QR code in case you that's your favorite way of doing it my QR code here that Susan provided me has a dinosaur in the middle so you'll see the dinosaur connection here directly uh I've also written a medium article that if you want to read more about what I did here you can just follow this link uh the uh idea though is I wanted to describe what are we trying to do with this monocular depth estimation well first of all what is monocular depth estimation mono means one and ocar to do with images or Vision so monocular means single image so monocular depth estimation is inferring relative depth from a single image so you don't need the stereo pairs of images and so the idea here is you know sort of look at each of these rows these are dinosaurs that I took pictures of at the New Mexico Museum of Natural History in Albuquerque and so these are actual dinosaurs I took images of and then there's various uh visual elements from the different angles that I took of that dinosaur uh from different perspectives and I just wanted to kind of see if I could use monocular depth estimation to pull out the dinosaur because I'm more into looking at the comparative anatomy and seeing if a bone that we found matches something on the stegosaurus so I might be looking at a particular rib in the neck or you know something like that or a femur or humorous so uh it's I it's just part of my hobbyist nature to want to be able to pull out first of all the dinosaur so what I've done is I've used the monocular depth estimation and I'll show you where you can grab this uh code on on hugging face and so forth but uh we uh take this um in just a minimal lines of code to be able to do monocular depth estimation I feed it one of these images let's take the one down on the bottom the one down on the bottom is a Stegosaurus at the Museum and in the background there's a mural and the mural has a a painting of a osaurus and an Allosaurus and a riverbed uh but you know the the visual elements that I care about are those things closest to the camera and so what monocular depth estimation allows you to do is to uh have an algorithm that paints in uh numbers 0 through 255 the relative depth from the camera with 255 the bright things being the closest to the camera and the things that are black being the things furthest away from the camera and so you'll see that uh what's amazing about the monocular depth estimation is that it's not fooled let's say by the size of the stegosaurus on the mural down in the lower left um you know it's not fooled by any of those pixels it knows that this uh dinosaur up front that those are the dominant pixels those are the things that that comprise the object and so what I can do is use monocular depth estimation to have that mapping from 0 to 255 of the representation of the dinosaur in the image and then it's just a simple matter of using a clipping algorithm to take a threshold of let's say you know some value 66 for example for you know some gray scale and say anything greater than 66 are the pixels that I'll keep and so then I can apply a clipping mask to actually pull the dinosaur out as I've done in the far right over there so that's the overview of what we're going to be talking about uh let me go into some of the det dets and here's some other images I did with just some pottery but uh uh in the notebook you'll see that uh now these are commented out so you you may want to um uh you know do the control slash to uncomment those and then you would run these cells to do the PIP installs for the um Transformers and so forth so uh anyway there's uh uh here's our our dependency chain right here so this is kind of the the um uh setup the way we start just a little bit more about monocular depth estimation I sort of articulated it um extemporaneously as we were going through the monocular depth estimation model that I'm talking about uh comes from something called Midas and it's something that the Intel Labs team posted to hugging face and it's called um multiple depth estimation accuracy with single Network and so uh the specific model uh that's built on Midas is uh DPT beit large 512 so this is the depth uh estimation uh algorithm now we have multiple sizes of this algorithm we have sizes that range from you know 384 and 256 so the 512 uh uses an internal resolution of 500 by 12 x 512 and so it take it's a little bit slower but it's a little bit more accurate and more precise in doing the depth estimation where the smaller models like 256 those could be done uh quicker to real time uh if if that's your need for for um doing depth estimation in videos and so I just wanted to give attribution where it's due so rer burkel Diana Waf and Matias M Muller from Intel uh created this model and put it on hugging face and you can uh get the code as I've I've shown you how to do you just go to where the model card lives and you can download any of these the large 384 or the swin version 2 tiny 256 but these all do the same thing there's also some videos I wanted to call your attention to uh both for Midas in general so you can click on the link to this YouTube video or an application of this technology to something called L magic which is a language model assisted uh generation of images with coherence and this is also by Intel labs and I'll be kind of hand waving you showing you what is possible by combining uh this depth estimation monocular depth estimation um and you can follow their code but what they did is that they combined that technique with stable diffusion to create a virtual Panorama uh given an image a single image so the the applications of this are much larger than what I'm going to show you I'm just going to get you started with the baby steps so those baby steps include importing torch and Transformers and um pill is what I'm using uh the pillow and I also use some numpy but um what I do is is uh I read the image the original image so this is all fairly straightforward stuff so here's the original image that I'm going to be consuming it's a Stegosaurus with that mural okay and I'm wanting to convert that image to this image of the 3D dinosaur with the mural removed and you'll see that there's a few little artifacts I could probably tweak the thresholds and I could uh fiddle with that and make it maybe slightly better but this is really great for me because this is just in one spelled swoop and in a snap of my fingers I can create these um uh cleaned up images and then I can start using those for comparative anatomy and and whatnot so this is a secret sauce basically you uh uh use these methods from the um uh DPT Library so you use DPT image processor and then you use the DPT for a depth estimation uh methods basically so or those uh um objects and then you use the meth methods called from pre-trained so we're going to use the pre-trained weights and we're specifying which of those models we want so in this case I'm going to choose the large 512 version of that library from Intel and so I have both a processor and a model um sort of class that I can uh call methods against and so now it's it's uh really just a matter of reading my images into an inputs and then I'm going to make sure that I'm using torch specifying no gradients here uh and then I'm just going to um process those inputs and get them in the right format by calling the model uh on those inputs to get an output uh array or tensor and then I'm going to uh get my predicted uh uh depth using the predicted depth method of the outputs and so that's that's what I'm going to do or the predicted uh U attribute and so this is the way we do it we're going to uh uh use pytorch to do functional interpolation to kind of smooth out the the images that we're we're getting so uh this is just kind of standard image processing type stuff that you do uh we're going to get the inputs from the uh process those inputs and so that's what we're doing here and then again we're turning the the gradient off uh we apply the model to the inputs we get an output and then we can get the predicted depths from those from that output so it's very simple and uh again we're just going to smooth it you know make sure everything is interpolated in in some uh fashion and then what I'm going to do is I'm going to convert this to make sure that that U all my pixels are numbered between 0 and 255 interpreting them as unsigned int8 and then I'm going to display it and so that's really how I'm going to get the depth estimation and then from that I'm going to create a clipping mask and the way that I do that is that I um set a threshold in this case I've played with a little bit and found that for this particular image 66 was a was a really good threshold and then I just simply I'll go through uh the array and say wherever the um M the vector or the the tensor is greater than the threshold uh give me back the original pixels uh a otherwise uh put a zero there and so uh just by doing this I can convert that uh depth map to this clipping mask that you see here in black and white then I can use that clipping mask to just do an image composite uh and so I can take the um image I can convert it to RGB uh convert the black image to RGB just so that be consistent and then I use my mask with the one parameter to basically just say well I'm using RGB just just clip everything um yes no and so then I display the image and then this is what I what I generate so this is just at a high level you know kind of showing you a simple use case to um use depth estimation but um there are more clever ways to do uh use this technology and so one uh the Intel Labs team uh put together this um model called L magic and I I'll show you I'll share with you the location where you can go to their get link and you can look at their project and and you can um really dig into this if you want I've shown you the simple just get started quickly kind of approach what they did is that they combined um stable diffusion and depth estimation monocular depth estimation to do some really cool stuff and so um I can show you here there's link to the video here and uh I'm not going to play the video for you but um this is what the thumbnail of the video looks like uh I've kind of extracted from the video the highlights the high points and so uh the modalities that you can use for El magic are a text to Panorama an image to pan Panorama and then some other modalities and I'll show you what happens this is right from the video I'd really encourage you to watch the video it's really fascinating to see that they're generating an entire scene within uh taken from one image or or just some text um they can paint an entire panoramic scene for inside a house for example and so there's two examples here and if you play that video or if you go to their code and play with it you'll see exactly how to do this and you can just turn it on and play with it and so uh in this particular case they did staple diffusion and depth estimation to uh see that you know like the the details of the of the tables and the and the couches and that certain things are further away on the walls and so they build a much more realistic Panorama when they spin this around when you watch the video it's just really amazing here's a case where they took just a single image and they did the same thing uh based on images as an input and so here there it's an outdoor scene and they actually generated this Panorama right here now there's U other technologies that do this texture room and different ones but um this uh L magic has been really cool and I would really encourage you to to play with it and see what you can do here's some of the other modalities that you can apply so this is putting depth estimation into practice in some really compelling ways so in this case uh you can actually take the depth map itself as an input and uh generate uh rooms or whatever a panorama from that you can take a sketch and you can convert this into um a a panoram of room or you can even do use these things for outdoor scenes so here's a case where they have a color script and they say this is the important thing right here and so they're going to generate a panorama around you know built Outdoors around an an object so I'm going to leave you with the um QR codes for the L magic as well so this is the application of uh uh monocular depth estimation both the project page and the code you can click there and then I just wanted to kind of encourage you that you can play with all these things on our Intel Tyber developer cloud and there are other application areas you can imagine using this for self-driving cars you can imagine it for robotics applications you know in in robotics a lot of times when you have constrained robots in a controlled environment you know all the coordinates of your end Defector and you know that's basically the gripper on think of it as your fingers whatever the tool is that you're using uh but in an unconstrained uh situations such as in the rural world you have a robot interacting with the outside world in an uncontrolled um environment then uh being able to estimate you the position of the IND Defector with respect to objects that you want to grip or manipulate uh let's say you know you're wanting to do some work on a u uh you know a nay cell or on on wind turbines you know and so you build a robot to basically be able to uh drill and and and Patch um wind turbine blades um in dangerous situations well if you have an IND Defector to do those Drilling and the patching and whatever you you have then having a monocular depth estimation to be able to know where you are relative to that that um defect with respect to your tool could be really important so these are just some of the ideas of things you can do and I just encourage you to play with this on our developer Cloud our an told Tyber developer Cloud you can sign up for free and then you can play with the code right from the GitHub that I shared with you and so I just wanted to leave you just real quickly I'm always required to explain the machine details of what it was that I ran on and so I I'll just throw this on there as a disclaimer but uh uh aside from that I'm going to leave you with the QR code here for uh playing with the code yourself you can go to the Intel developer cloud and U begin to play the Intel Tyber developer cloud and with that I think it's a wrap

Original Description

In this webinar, Bob Chesebrough of Intel guides you through the steps he took to create a clipped image with background clutter removed from the image. He accomplished this using monocular depth estimation with PyTorch. This could potentially be used to automate structure from motion and other image-related tasks where you want to highlight or focus on a single portion of an image, particularly for identifying parts of the image that were closest to the camera. Specifically, he used depth estimation on a couple of images that he took at a natural history museum to capture just the dinosaur in the foreground, eliminating the background murals, lights, and building structure. The cool thing about this algorithm is that it creates a depth estimate from a single image!
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from PyTorch · PyTorch · 0 of 60

← Previous Next →
1 What is PyTorch?
What is PyTorch?
PyTorch
2 PyTorch Tutorial: A Quick Preview
PyTorch Tutorial: A Quick Preview
PyTorch
3 PyTorch Summer Hackathon 2019
PyTorch Summer Hackathon 2019
PyTorch
4 Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
PyTorch
5 PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch
6 Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
PyTorch
7 Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
PyTorch
8 Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
PyTorch
9 Introduction to Machine Learning for Developers at F8 2019
Introduction to Machine Learning for Developers at F8 2019
PyTorch
10 Powered by PyTorch at F8 2019
Powered by PyTorch at F8 2019
PyTorch
11 Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
PyTorch
12 New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
PyTorch
13 PyTorch Developer Conference 2018: Recap
PyTorch Developer Conference 2018: Recap
PyTorch
14 PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch
15 PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch
16 PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch
17 PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch
18 PyTorch Developer Conference 2019 | Full Livestream
PyTorch Developer Conference 2019 | Full Livestream
PyTorch
19 PyTorch Developer Conference 2019: Recap
PyTorch Developer Conference 2019: Recap
PyTorch
20 PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch
21 What’s new in PyTorch 1.3 - Lin Qiao
What’s new in PyTorch 1.3 - Lin Qiao
PyTorch
22 PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch
23 Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
PyTorch
24 Quantization - Dmytro Dzhulgakov
Quantization - Dmytro Dzhulgakov
PyTorch
25 PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch
26 Apex -  Michael Carilli, NVIDIA
Apex - Michael Carilli, NVIDIA
PyTorch
27 Dataloader Design for PyTorch - Tongzhou Wang, MIT
Dataloader Design for PyTorch - Tongzhou Wang, MIT
PyTorch
28 Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
PyTorch
29 PyTorch Mobile - David Reiss
PyTorch Mobile - David Reiss
PyTorch
30 Model Interpretability with Captum - Narine Kokhilkyan
Model Interpretability with Captum - Narine Kokhilkyan
PyTorch
31 Detectron2 - Next Gen Object Detection Library - Yuxin Wu
Detectron2 - Next Gen Object Detection Library - Yuxin Wu
PyTorch
32 Speech Extensions to Fairseq - Dmytro Okhonko
Speech Extensions to Fairseq - Dmytro Okhonko
PyTorch
33 PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch
34 PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch
35 PyTorch in Robotics - Yisong Yue, Caltech
PyTorch in Robotics - Yisong Yue, Caltech
PyTorch
36 StanfordNLP - Yuhao Zhang, Stanford
StanfordNLP - Yuhao Zhang, Stanford
PyTorch
37 Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
PyTorch
38 Collaborative Natural Language Inference - Sasha Rush, Cornell
Collaborative Natural Language Inference - Sasha Rush, Cornell
PyTorch
39 Privacy Preserving AI - Andrew Trask, OpenMined
Privacy Preserving AI - Andrew Trask, OpenMined
PyTorch
40 CrypTen - Laurens van der Maaten
CrypTen - Laurens van der Maaten
PyTorch
41 PyTorch at Uber - Sidney Zhang, Uber
PyTorch at Uber - Sidney Zhang, Uber
PyTorch
42 PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch
43 PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch
44 PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch
45 PyTorch Developer Conference 2019 - Panel Discussion
PyTorch Developer Conference 2019 - Panel Discussion
PyTorch
46 Using deep learning and PyTorch to power next gen aircraft at Caltech
Using deep learning and PyTorch to power next gen aircraft at Caltech
PyTorch
47 Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
PyTorch
48 TorchScript and PyTorch JIT | Deep Dive
TorchScript and PyTorch JIT | Deep Dive
PyTorch
49 Announcing the PyTorch Global Summer Hackathon 2020
Announcing the PyTorch Global Summer Hackathon 2020
PyTorch
50 Opening Up the Black Box: Model Understanding with Captum and PyTorch
Opening Up the Black Box: Model Understanding with Captum and PyTorch
PyTorch
51 PyTorch Mobile Runtime for Android
PyTorch Mobile Runtime for Android
PyTorch
52 Torchvision in 5 minutes
Torchvision in 5 minutes
PyTorch
53 3D Deep Learning with PyTorch3D
3D Deep Learning with PyTorch3D
PyTorch
54 What is Torchtext?
What is Torchtext?
PyTorch
55 TorchAudio: A Quick Intro
TorchAudio: A Quick Intro
PyTorch
56 PyTorch Mobile Runtime for iOS
PyTorch Mobile Runtime for iOS
PyTorch
57 PySlowFast: Deep learning with Video
PySlowFast: Deep learning with Video
PyTorch
58 PyTorch Pruning | How it's Made by Michela Paganini
PyTorch Pruning | How it's Made by Michela Paganini
PyTorch
59 Measuring Fairness in Machine Learning Systems
Measuring Fairness in Machine Learning Systems
PyTorch
60 PyTorch for Hackathons
PyTorch for Hackathons
PyTorch

This video webinar teaches the application of PyTorch for monocular depth estimation, covering the use of pre-trained models and the removal of background clutter from images. The webinar also explores the potential applications of monocular depth estimation in fields such as robotics and self-driving cars.

Key Takeaways
  1. Feed an image to the monocular depth estimation model
  2. Use the model to infer relative depth from the single image
  3. Import torch and Transformers
  4. Read original image
  5. Convert image to 3D dinosaur with mural removed using clipping algorithm
  6. Call pre-trained model on input to get output
  7. Get predicted depth from output
  8. Smooth out images using PyTorch functional interpolation
💡 Monocular depth estimation can be used to infer relative depth from a single image, allowing for the removal of background clutter and the isolation of objects, with potential applications in fields such as robotics and self-driving cars.

Related Reads

Up next
Marketing management for ugc net| Important topics of marketing management ugc net commerce dec 2023
Bhoomi Learning Centre~Dr. Muskan
Watch →