Using PyTorch for Monocular Depth Estimation Webinar

PyTorch · Beginner ·👁️ Computer Vision ·1y ago

Skills: CV Basics90%Modern CV Models80%Generative CV70%

Key Takeaways

This video webinar demonstrates the use of PyTorch for monocular depth estimation, utilizing models such as Midas and DPT Beit Large 512 to infer relative depth from a single image and remove background clutter. The webinar covers the application of monocular depth estimation in various fields, including robotics and self-driving cars.

Full Transcript

using pytorch for monocular depth estimation my name is Susan Kaylor and I work in AI technical product marketing at Intel today I have the pleasure of introducing our speaker Bob cheeseboro Bob is a Senior Solutions architect at Intel Bob's industry experience is in software development and AI solution engineering for Fortune 100 companies and National Laboratories for over three decades he is also a hobbyist who has logged over 800 miles and a th000 hours in the field finding dinosaur bones he and his sons discovered the only known crocodilian fossil from the Jurassic period in New Mexico they have also discovered and logged over 200 bone localities and and even described a new Mass bone bed in New mexic Mexico over to you Bob Susan thank you for that intro that was awesome I'm really excited to be here with you guys I want to show you one of the projects I um accomplished with this monocular depth estimation model from hugging face uh using pytorch and so uh to get started if you want to play this uh at home and follow along with your bingo card you can do that I've uh highlighted the git repo here in the URL it's called dinosaur depth map clipping and and uh I'll show you the QR code in case you that's your favorite way of doing it my QR code here that Susan provided me has a dinosaur in the middle so you'll see the dinosaur connection here directly uh I've also written a medium article that if you want to read more about what I did here you can just follow this link uh the uh idea though is I wanted to describe what are we trying to do with this monocular depth estimation well first of all what is monocular depth estimation mono means one and ocar to do with images or Vision so monocular means single image so monocular depth estimation is inferring relative depth from a single image so you don't need the stereo pairs of images and so the idea here is you know sort of look at each of these rows these are dinosaurs that I took pictures of at the New Mexico Museum of Natural History in Albuquerque and so these are actual dinosaurs I took images of and then there's various uh visual elements from the different angles that I took of that dinosaur uh from different perspectives and I just wanted to kind of see if I could use monocular depth estimation to pull out the dinosaur because I'm more into looking at the comparative anatomy and seeing if a bone that we found matches something on the stegosaurus so I might be looking at a particular rib in the neck or you know something like that or a femur or humorous so uh it's I it's just part of my hobbyist nature to want to be able to pull out first of all the dinosaur so what I've done is I've used the monocular depth estimation and I'll show you where you can grab this uh code on on hugging face and so forth but uh we uh take this um in just a minimal lines of code to be able to do monocular depth estimation I feed it one of these images let's take the one down on the bottom the one down on the bottom is a Stegosaurus at the Museum and in the background there's a mural and the mural has a a painting of a osaurus and an Allosaurus and a riverbed uh but you know the the visual elements that I care about are those things closest to the camera and so what monocular depth estimation allows you to do is to uh have an algorithm that paints in uh numbers 0 through 255 the relative depth from the camera with 255 the bright things being the closest to the camera and the things that are black being the things furthest away from the camera and so you'll see that uh what's amazing about the monocular depth estimation is that it's not fooled let's say by the size of the stegosaurus on the mural down in the lower left um you know it's not fooled by any of those pixels it knows that this uh dinosaur up front that those are the dominant pixels those are the things that that comprise the object and so what I can do is use monocular depth estimation to have that mapping from 0 to 255 of the representation of the dinosaur in the image and then it's just a simple matter of using a clipping algorithm to take a threshold of let's say you know some value 66 for example for you know some gray scale and say anything greater than 66 are the pixels that I'll keep and so then I can apply a clipping mask to actually pull the dinosaur out as I've done in the far right over there so that's the overview of what we're going to be talking about uh let me go into some of the det dets and here's some other images I did with just some pottery but uh uh in the notebook you'll see that uh now these are commented out so you you may want to um uh you know do the control slash to uncomment those and then you would run these cells to do the PIP installs for the um Transformers and so forth so uh anyway there's uh uh here's our our dependency chain right here so this is kind of the the um uh setup the way we start just a little bit more about monocular depth estimation I sort of articulated it um extemporaneously as we were going through the monocular depth estimation model that I'm talking about uh comes from something called Midas and it's something that the Intel Labs team posted to hugging face and it's called um multiple depth estimation accuracy with single Network and so uh the specific model uh that's built on Midas is uh DPT beit large 512 so this is the depth uh estimation uh algorithm now we have multiple sizes of this algorithm we have sizes that range from you know 384 and 256 so the 512 uh uses an internal resolution of 500 by 12 x 512 and so it take it's a little bit slower but it's a little bit more accurate and more precise in doing the depth estimation where the smaller models like 256 those could be done uh quicker to real time uh if if that's your need for for um doing depth estimation in videos and so I just wanted to give attribution where it's due so rer burkel Diana Waf and Matias M Muller from Intel uh created this model and put it on hugging face and you can uh get the code as I've I've shown you how to do you just go to where the model card lives and you can download any of these the large 384 or the swin version 2 tiny 256 but these all do the same thing there's also some videos I wanted to call your attention to uh both for Midas in general so you can click on the link to this YouTube video or an application of this technology to something called L magic which is a language model assisted uh generation of images with coherence and this is also by Intel labs and I'll be kind of hand waving you showing you what is possible by combining uh this depth estimation monocular depth estimation um and you can follow their code but what they did is that they combined that technique with stable diffusion to create a virtual Panorama uh given an image a single image so the the applications of this are much larger than what I'm going to show you I'm just going to get you started with the baby steps so those baby steps include importing torch and Transformers and um pill is what I'm using uh the pillow and I also use some numpy but um what I do is is uh I read the image the original image so this is all fairly straightforward stuff so here's the original image that I'm going to be consuming it's a Stegosaurus with that mural okay and I'm wanting to convert that image to this image of the 3D dinosaur with the mural removed and you'll see that there's a few little artifacts I could probably tweak the thresholds and I could uh fiddle with that and make it maybe slightly better but this is really great for me because this is just in one spelled swoop and in a snap of my fingers I can create these um uh cleaned up images and then I can start using those for comparative anatomy and and whatnot so this is a secret sauce basically you uh uh use these methods from the um uh DPT Library so you use DPT image processor and then you use the DPT for a depth estimation uh methods basically so or those uh um objects and then you use the meth methods called from pre-trained so we're going to use the pre-trained weights and we're specifying which of those models we want so in this case I'm going to choose the large 512 version of that library from Intel and so I have both a processor and a model um sort of class that I can uh call methods against and so now it's it's uh really just a matter of reading my images into an inputs and then I'm going to make sure that I'm using torch specifying no gradients here uh and then I'm just going to um process those inputs and get them in the right format by calling the model uh on those inputs to get an output uh array or tensor and then I'm going to uh get my predicted uh uh depth using the predicted depth method of the outputs and so that's that's what I'm going to do or the predicted uh U attribute and so this is the way we do it we're going to uh uh use pytorch to do functional interpolation to kind of smooth out the the images that we're we're getting so uh this is just kind of standard image processing type stuff that you do uh we're going to get the inputs from the uh process those inputs and so that's what we're doing here and then again we're turning the the gradient off uh we apply the model to the inputs we get an output and then we can get the predicted depths from those from that output so it's very simple and uh again we're just going to smooth it you know make sure everything is interpolated in in some uh fashion and then what I'm going to do is I'm going to convert this to make sure that that U all my pixels are numbered between 0 and 255 interpreting them as unsigned int8 and then I'm going to display it and so that's really how I'm going to get the depth estimation and then from that I'm going to create a clipping mask and the way that I do that is that I um set a threshold in this case I've played with a little bit and found that for this particular image 66 was a was a really good threshold and then I just simply I'll go through uh the array and say wherever the um M the vector or the the tensor is greater than the threshold uh give me back the original pixels uh a otherwise uh put a zero there and so uh just by doing this I can convert that uh depth map to this clipping mask that you see here in black and white then I can use that clipping mask to just do an image composite uh and so I can take the um image I can convert it to RGB uh convert the black image to RGB just so that be consistent and then I use my mask with the one parameter to basically just say well I'm using RGB just just clip everything um yes no and so then I display the image and then this is what I what I generate so this is just at a high level you know kind of showing you a simple use case to um use depth estimation but um there are more clever ways to do uh use this technology and so one uh the Intel Labs team uh put together this um model called L magic and I I'll show you I'll share with you the location where you can go to their get link and you can look at their project and and you can um really dig into this if you want I've shown you the simple just get started quickly kind of approach what they did is that they combined um stable diffusion and depth estimation monocular depth estimation to do some really cool stuff and so um I can show you here there's link to the video here and uh I'm not going to play the video for you but um this is what the thumbnail of the video looks like uh I've kind of extracted from the video the highlights the high points and so uh the modalities that you can use for El magic are a text to Panorama an image to pan Panorama and then some other modalities and I'll show you what happens this is right from the video I'd really encourage you to watch the video it's really fascinating to see that they're generating an entire scene within uh taken from one image or or just some text um they can paint an entire panoramic scene for inside a house for example and so there's two examples here and if you play that video or if you go to their code and play with it you'll see exactly how to do this and you can just turn it on and play with it and so uh in this particular case they did staple diffusion and depth estimation to uh see that you know like the the details of the of the tables and the and the couches and that certain things are further away on the walls and so they build a much more realistic Panorama when they spin this around when you watch the video it's just really amazing here's a case where they took just a single image and they did the same thing uh based on images as an input and so here there it's an outdoor scene and they actually generated this Panorama right here now there's U other technologies that do this texture room and different ones but um this uh L magic has been really cool and I would really encourage you to to play with it and see what you can do here's some of the other modalities that you can apply so this is putting depth estimation into practice in some really compelling ways so in this case uh you can actually take the depth map itself as an input and uh generate uh rooms or whatever a panorama from that you can take a sketch and you can convert this into um a a panoram of room or you can even do use these things for outdoor scenes so here's a case where they have a color script and they say this is the important thing right here and so they're going to generate a panorama around you know built Outdoors around an an object so I'm going to leave you with the um QR codes for the L magic as well so this is the application of uh uh monocular depth estimation both the project page and the code you can click there and then I just wanted to kind of encourage you that you can play with all these things on our Intel Tyber developer cloud and there are other application areas you can imagine using this for self-driving cars you can imagine it for robotics applications you know in in robotics a lot of times when you have constrained robots in a controlled environment you know all the coordinates of your end Defector and you know that's basically the gripper on think of it as your fingers whatever the tool is that you're using uh but in an unconstrained uh situations such as in the rural world you have a robot interacting with the outside world in an uncontrolled um environment then uh being able to estimate you the position of the IND Defector with respect to objects that you want to grip or manipulate uh let's say you know you're wanting to do some work on a u uh you know a nay cell or on on wind turbines you know and so you build a robot to basically be able to uh drill and and and Patch um wind turbine blades um in dangerous situations well if you have an IND Defector to do those Drilling and the patching and whatever you you have then having a monocular depth estimation to be able to know where you are relative to that that um defect with respect to your tool could be really important so these are just some of the ideas of things you can do and I just encourage you to play with this on our developer Cloud our an told Tyber developer Cloud you can sign up for free and then you can play with the code right from the GitHub that I shared with you and so I just wanted to leave you just real quickly I'm always required to explain the machine details of what it was that I ran on and so I I'll just throw this on there as a disclaimer but uh uh aside from that I'm going to leave you with the QR code here for uh playing with the code yourself you can go to the Intel developer cloud and U begin to play the Intel Tyber developer cloud and with that I think it's a wrap

Original Description

In this webinar, Bob Chesebrough of Intel guides you through the steps he took to create a clipped image with background clutter removed from the image. He accomplished this using monocular depth estimation with PyTorch. This could potentially be used to automate structure from motion and other image-related tasks where you want to highlight or focus on a single portion of an image, particularly for identifying parts of the image that were closest to the camera. Specifically, he used depth estimation on a couple of images that he took at a natural history museum to capture just the dinosaur in the foreground, eliminating the background murals, lights, and building structure. The cool thing about this algorithm is that it creates a depth estimate from a single image!

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from PyTorch · PyTorch · 0 of 60

← Previous Next →

What is PyTorch?

What is PyTorch?

PyTorch Tutorial: A Quick Preview

PyTorch Tutorial: A Quick Preview

PyTorch Summer Hackathon 2019

PyTorch Summer Hackathon 2019

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Introduction to Machine Learning for Developers at F8 2019

Introduction to Machine Learning for Developers at F8 2019

Powered by PyTorch at F8 2019

Powered by PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference Keynote - Mike Schroepfer

PyTorch Developer Conference Keynote - Mike Schroepfer

What’s new in PyTorch 1.3 - Lin Qiao

What’s new in PyTorch 1.3 - Lin Qiao

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Quantization - Dmytro Dzhulgakov

Quantization - Dmytro Dzhulgakov

PyTorch ONNX Export Support - Lara Haidar, Microsoft

PyTorch ONNX Export Support - Lara Haidar, Microsoft

Apex - Michael Carilli, NVIDIA

Apex - Michael Carilli, NVIDIA

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

PyTorch Mobile - David Reiss

PyTorch Mobile - David Reiss

Model Interpretability with Captum - Narine Kokhilkyan

Model Interpretability with Captum - Narine Kokhilkyan

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Speech Extensions to Fairseq - Dmytro Okhonko

Speech Extensions to Fairseq - Dmytro Okhonko

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch in Robotics - Yisong Yue, Caltech

PyTorch in Robotics - Yisong Yue, Caltech

StanfordNLP - Yuhao Zhang, Stanford

StanfordNLP - Yuhao Zhang, Stanford

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Collaborative Natural Language Inference - Sasha Rush, Cornell

Collaborative Natural Language Inference - Sasha Rush, Cornell

Privacy Preserving AI - Andrew Trask, OpenMined

Privacy Preserving AI - Andrew Trask, OpenMined

CrypTen - Laurens van der Maaten

CrypTen - Laurens van der Maaten

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch Developer Conference 2019 - Panel Discussion

PyTorch Developer Conference 2019 - Panel Discussion

Using deep learning and PyTorch to power next gen aircraft at Caltech

Using deep learning and PyTorch to power next gen aircraft at Caltech

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

TorchScript and PyTorch JIT | Deep Dive

TorchScript and PyTorch JIT | Deep Dive

Announcing the PyTorch Global Summer Hackathon 2020

Announcing the PyTorch Global Summer Hackathon 2020

Opening Up the Black Box: Model Understanding with Captum and PyTorch

Opening Up the Black Box: Model Understanding with Captum and PyTorch

PyTorch Mobile Runtime for Android

PyTorch Mobile Runtime for Android

Torchvision in 5 minutes

Torchvision in 5 minutes

3D Deep Learning with PyTorch3D

3D Deep Learning with PyTorch3D

What is Torchtext?

What is Torchtext?

TorchAudio: A Quick Intro

TorchAudio: A Quick Intro

PyTorch Mobile Runtime for iOS

PyTorch Mobile Runtime for iOS

PySlowFast: Deep learning with Video

PySlowFast: Deep learning with Video

PyTorch Pruning | How it's Made by Michela Paganini

PyTorch Pruning | How it's Made by Michela Paganini

Measuring Fairness in Machine Learning Systems

Measuring Fairness in Machine Learning Systems

PyTorch for Hackathons

PyTorch for Hackathons

This video webinar teaches the application of PyTorch for monocular depth estimation, covering the use of pre-trained models and the removal of background clutter from images. The webinar also explores the potential applications of monocular depth estimation in fields such as robotics and self-driving cars.

Key Takeaways

Feed an image to the monocular depth estimation model
Use the model to infer relative depth from the single image
Import torch and Transformers
Read original image
Convert image to 3D dinosaur with mural removed using clipping algorithm
Call pre-trained model on input to get output
Get predicted depth from output
Smooth out images using PyTorch functional interpolation

💡 Monocular depth estimation can be used to infer relative depth from a single image, allowing for the removal of background clutter and the isolation of objects, with potential applications in fields such as robotics and self-driving cars.

🔒 Pro feature: Ask AI to explain this lesson →

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA Developer

Related Reads

PANet Paper Walkthrough: When Feature Pyramids Go Bottom-Up

Learn how PANet's bottom-up feature pyramid approach improves feature extraction by shortening the path between low-level and high-level features

Towards Data Science

CCTV Action Recognition: Comprehensive Fine-Tuning & Real-Time Deployment Guide

Learn to fine-tune and deploy a hybrid Deep Learning model for CCTV action recognition using MobileNetV2 and Python

Medium · Python

I built a background remover that keeps the fine hair edges

Learn how to build a background remover that preserves fine hair edges, a challenging task in image processing

Dev.to · KunStudio

I Built a Python Package to Solve My Own CV Frustration — 7K Downloads in a Week

Learn how to create a Python package to simplify computer vision pipelines and achieve 7K downloads in a week

Medium · Machine Learning

Marketing management for ugc net| Important topics of marketing management ugc net commerce dec 2023

Bhoomi Learning Centre~Dr. Muskan