3D Deep Learning with PyTorch3D

PyTorch · Beginner ·📰 AI News & Updates ·5y ago

Skills: Multimodal LLMs85%CV Basics70%Modern CV Models60%Generative CV50%

Key Takeaways

PyTorch3D is a library of optimized, efficient, reusable components in PyTorch for state-of-the-art 3D deep learning tasks, providing methods for loading meshes, applying transforms, and using differentiable rendering to learn scene properties.

Full Transcript

hi everyone welcome to this tutorial on 3d deep learning with pytorch 3d my name is nikhila ravi and i'm a research engineer in the facebook ai research team working on computer vision and 3d understanding in this tutorial i'll give you an overview of the pytorch 3d library and then walk you through how to use several components including code examples in particular i'll cover the data structures common operations such as data loading and transformations loss functions and differentiable rendering firstly what is pytorch 3d it's a library of reusable components for state-of-the-art 3d planning research tasks the goals of pytorch 3d are to combine the features of a good deep learning library with the features needed for working with 3d data a key focus throughout is efficiency modularity and differentiability several components have custom cuda implementations for fast performance in addition most operators natively support heterogeneous batching of 3d data such as batching meshes of different sizes by torch 3d has pre-built packages for anaconda and can be easily installed with a few commands it has few external dependencies you can find detailed installation instructions on the pytorch 3d github repository this is an overview of the main components in the code base the foundation layer consists of data structures for 3d data data loading utilities and composable transforms the data structures in particular enable the operators and loss functions in the second layer to efficiently support heterogeneous batching to start let's look at the data structures for 3d data we found that batching meshes and point clouds requires different batching strategies and the flexibility to be able to move from one representation to the other meshes takes as input the vertices and faces for a batch of meshes you can start by defining a batch of meshes as a list of tensors we can then easily switch to a packed representation which is just a different view on the same data with this representation we need some auxiliary information for example the first indices into the packed tensor for each batch element the packed representation is useful for operations like graph convolution we might then need to reshape the vertices to add back in the batch dimension and this involves padding the vertices based on the number of vertices of the largest mesh in the batch the padded representation is useful for other operators like vertex line we can see why this flexibility is important by looking at the architecture diagram the mesh rcnn a paper from iccv 2019 which is built using pytorch 3d the meshes data structure is used throughout and the representation of the vertices and faces in the batch is interchanged multiple times during the end to end loop here's a quick code example of how you can use the meshes data structure and easily switch between different views and also access other properties of the mesh we start by importing meshes from the structures module we can initialize a list of the vertices and faces of all the meshes in the batch as a list of tensors we can then initialize the meshes class by calling the constructor with the list representations we can switch to a different representation such as the packed representation by calling the appropriate method and we can access the auxiliary tensors by calling their respective methods finally we can access other computed properties of the mesh such as the edges another set of common functions are loading utilities for 3d data and composable 3d transforms a common task for almost all projects is loading data from file for example loading meshes pi torch 3d provides methods for loading meshes from obj files here we load the vertices and faces and auxiliary information the faces and aux variables are in fact named tuples which contain a number of different variables we can get the face indices using the verts index key the normals and texture information can be retrieved from the aux tuple in many cases you will use the data from load obj to construct a meshes object in this case you can use the load objs as meshes function to directly load a mesh from file into a meshes object the batched mesh is of type meshes and in this example contains a batch of three meshes transforming 3d data is another common task pytorch 3d has a general purpose transforms 3d class with subclasses to support different types of transforms we can create separate translate and rotate transforms both of which can be independently applied to a tensor of x y z points or they can also be composed to create one combined transform you can also use the transform methods directly on the transform's 3d class for example here we have an xyz scaling followed by an xyz translation next let's look at some of the optimized operators in pi torch 3d k nearest neighbors is a function that's used frequently with point clouds here we have two point clouds p and q for a given point in cloud p the goal is to find the k closest points in cloud q for example k equals five in pytorch 3d we implement exact k n with custom kuder kernels that natively handle heterogeneous batches here's a quick code example we import k n points from the ops module we can then initialize two random tensors and then call the k n points method with the point and the desired value of k another operator which is used frequently with meshes is graph convolution each vertex in the mesh can have an associated feature vector f i graph convolution computes new feature vectors for each vertex propagating information along edges of the mesh for one particular node this involves two steps one gathering the features of all the adjacent nodes and summing them and two adding them back to the node's own feature vector the graphconf class is available in the ops module of pytorch 3d this can be initialized using the input and output dimensions as well as the method of initialization for the weights tensors and whether the graph is assumed to be directed or undirected the graphcon function is then called with the verts and edges of the mesh next let's look at some of the loss functions available in pi torch 3d chamfer loss is a method of comparing two sets of point clouds for example these points might be samples from the surface of a mesh chamfer loss is used as a loss function in many 3d planning research tasks for each point in set 1 we need to find the nearest neighbors in set 2 and then vice versa here is a quick example we first import the chamfer distance function along with two helper functions one to create a sphere mesh and another function to differentiably sample a point cloud from the surface of the mesh we then initialize two spheres of different topologies and sample 5000 points from the surface of each of these measures finally we use these points to calculate the chamfer loss lastly let's look at the differentiable rendering module in pi torch 3d what does having differentiable rendering step and a training loop mean a 3d scene can be composed of a number of different components including a mesh with textures light sources and a camera which is the viewpoint from which the image is generated now how do all these scene properties come into play in differentiable rendering each of these properties could be a variable which we want to learn for example the position of the camera the intensity of the light or the position of the mesh vertices in the forward pass we transform a mesh and pass it through a renderer to generate an image the image might then be used as part of a loss function we then want to propagate gradients back through the whole system and update the scene properties this is where the renderer needs to be differentiable so we can learn the scene properties in an end-to-end way the pytorch 3d renderer is split into two parts a rasterizer and a shader it can take as an input a heterogeneous batch of meshes and associated textures the first step inside the rasterizer is to use a camera to transform and project the input batch of meshes onto the 2d plane the next step is the rasterization from which we output four intermediate variables for each pixel which we called the fragment data this includes the z-buffer 2d euclidean distance barycentric coordinates and the face indices we also output not just the closest value but the top k values for each of these variables in the shader we continue to keep the top k values while applying shading and texturing and finally in the blending step aggregate across the top k values the rasterization step is encoder for efficiency but the rest of the pipeline is in pi torch for increased modularity and ease of experimentation here is a quick example of how to set up a renderer with pytorch 3d we have more detailed examples in the tutorial section of the pytorch 3d github code base first import the necessary components from the renderer module next we need to initialize a camera and here we use a perspective camera and the look at transform to determine the rotate and translate transforms next we can initialize the rasterization settings which include the faces per pixel which corresponds to the k parameter so this determines the top k values which are returned from the rasterizer for a full explanation of the parameters please refer to the pytorch 3d documentation next we initialize a renderer by composing a rasterizer and a shader there can be many different types of shaders and it's also very easy to create your own if the mesh or any of the scene properties had tenses with requires grad equals true i.e we want to learn this parameter we can easily back propagate through the entire system for example given a ground truth output image we can calculate the loss and then directly call backward on the loss the tutorials have more detailed examples of learning using the renderer in the blending step while we aggregate across the top k values it's very easy to try different blending functions in pi torch the blending for this cube uses a soft max blending formulation from soft rasterizer which can be written in a few lines of code and pie torch we have three different types of mesh texturing options including vertex textures vertex uv coordinates and a texture map and a texture atlas where each face has its own small r cross r texture map the texture type can be chosen based on your use case vertex textures are the simplest to implement uv coordinates and texture maps enable more detailed textures but are limited to one texture map per mesh and finally texture atlas allows representation of complex mesh textures such as shape net meshes which have multiple texture maps per mesh i want to conclude by highlighting how you can get started with pytorch 3d on the github repository we have several tutorials which take you step by step through some example use cases these tutorials can also be run with google colab so you can try the code without having to download or install anything the tutorials include 3d shape prediction bundle adjustment pose optimization and textured mesh rendering from multiple viewpoints thanks a lot for listening you can find the code on github or also via the pytorch 3d website and there you can also find links to the documentation and tutorials we hope you found this tutorial useful and we look forward to seeing the projects you build for the hackathon you

Original Description

Facebook AI Research Engineer Nikhila Ravi presents an informative overview of PyTorch3D, a library of optimized, efficient, reusable components in PyTorch for state-of-the-art 3D deep learning tasks. Efficiency, modularity, and differentiability are the key elements of PyTorch3D that bring faster performance to any 3D Deep Learning project. Subscribe to this page to get the latest news, updates, and weekly tutorials planned for the full duration of the Hackathon. Haven't signed up yet? Get involved, and learn how you could build with the community and also have a chance to win up to $25,000: https://bit.ly/2ZwLYKX

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from PyTorch · PyTorch · 53 of 60

← Previous Next →

What is PyTorch?

What is PyTorch?

PyTorch Tutorial: A Quick Preview

PyTorch Tutorial: A Quick Preview

PyTorch Summer Hackathon 2019

PyTorch Summer Hackathon 2019

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Introduction to Machine Learning for Developers at F8 2019

Introduction to Machine Learning for Developers at F8 2019

Powered by PyTorch at F8 2019

Powered by PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference Keynote - Mike Schroepfer

PyTorch Developer Conference Keynote - Mike Schroepfer

What’s new in PyTorch 1.3 - Lin Qiao

What’s new in PyTorch 1.3 - Lin Qiao

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Quantization - Dmytro Dzhulgakov

Quantization - Dmytro Dzhulgakov

PyTorch ONNX Export Support - Lara Haidar, Microsoft

PyTorch ONNX Export Support - Lara Haidar, Microsoft

Apex - Michael Carilli, NVIDIA

Apex - Michael Carilli, NVIDIA

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

PyTorch Mobile - David Reiss

PyTorch Mobile - David Reiss

Model Interpretability with Captum - Narine Kokhilkyan

Model Interpretability with Captum - Narine Kokhilkyan

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Speech Extensions to Fairseq - Dmytro Okhonko

Speech Extensions to Fairseq - Dmytro Okhonko

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch in Robotics - Yisong Yue, Caltech

PyTorch in Robotics - Yisong Yue, Caltech

StanfordNLP - Yuhao Zhang, Stanford

StanfordNLP - Yuhao Zhang, Stanford

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Collaborative Natural Language Inference - Sasha Rush, Cornell

Collaborative Natural Language Inference - Sasha Rush, Cornell

Privacy Preserving AI - Andrew Trask, OpenMined

Privacy Preserving AI - Andrew Trask, OpenMined

CrypTen - Laurens van der Maaten

CrypTen - Laurens van der Maaten

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch Developer Conference 2019 - Panel Discussion

PyTorch Developer Conference 2019 - Panel Discussion

Using deep learning and PyTorch to power next gen aircraft at Caltech

Using deep learning and PyTorch to power next gen aircraft at Caltech

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

TorchScript and PyTorch JIT | Deep Dive

TorchScript and PyTorch JIT | Deep Dive

Announcing the PyTorch Global Summer Hackathon 2020

Announcing the PyTorch Global Summer Hackathon 2020

Opening Up the Black Box: Model Understanding with Captum and PyTorch

Opening Up the Black Box: Model Understanding with Captum and PyTorch

PyTorch Mobile Runtime for Android

PyTorch Mobile Runtime for Android

Torchvision in 5 minutes

Torchvision in 5 minutes

3D Deep Learning with PyTorch3D

3D Deep Learning with PyTorch3D

What is Torchtext?

What is Torchtext?

TorchAudio: A Quick Intro

TorchAudio: A Quick Intro

PyTorch Mobile Runtime for iOS

PyTorch Mobile Runtime for iOS

PySlowFast: Deep learning with Video

PySlowFast: Deep learning with Video

PyTorch Pruning | How it's Made by Michela Paganini

PyTorch Pruning | How it's Made by Michela Paganini

Measuring Fairness in Machine Learning Systems

Measuring Fairness in Machine Learning Systems

PyTorch for Hackathons

PyTorch for Hackathons

PyTorch3D is a library for state-of-the-art 3D deep learning tasks, providing methods for loading meshes, applying transforms, and using differentiable rendering to learn scene properties. This library is useful for tasks such as 3D shape prediction, bundle adjustment, and pose optimization.

Key Takeaways

Load a mesh from an OBJ file using the load_objs function
Create a Meshes object from a batch of meshes
Apply a transform to a tensor of x, y, z points
Use the k-nearest neighbors function to find the k closest points in a point cloud
Compute new feature vectors for each vertex using graph convolution
Initialize a camera and a rasterizer
Transform and project meshes onto a 2D plane
Output intermediate variables for each pixel
Apply shading and texturing
Aggregate across top k values

💡 PyTorch3D provides a differentiable renderer to learn scene properties in an end-to-end way, which is useful for tasks such as 3D shape prediction, bundle adjustment, and pose optimization.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Multimodal LLMs

View skill →

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

AI Tool Journey

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Developer

Building Multimodal Search and RAG

Building Multimodal Search and RAG

Midjourney Trick: Consistent Character in Different Images

Midjourney Trick: Consistent Character in Different Images

Ollama Multimodal: EASILY setup Llava locally & Integrate API

Ollama Multimodal: EASILY setup Llava locally & Integrate API

The ONLY Real Time Speech AI that can run locally!!!

The ONLY Real Time Speech AI that can run locally!!!

Related AI Lessons

The AI Moat Paradox: The Better Models Become, the Less Models Matter

The AI moat paradox suggests that as AI models improve, their importance may decrease, and understanding this concept is crucial for AI professionals and businesses.

170,927 AI Papers Reveal the Biggest Research Shifts of the First Half of 2026

Discover the biggest AI research shifts of 2026 based on 170,927 papers, and learn how to apply these trends to your work

Medium · Machine Learning

170,927 AI Papers Reveal the Biggest Research Shifts of the First Half of 2026

Discover the major research shifts in AI from 170,927 papers published in the first half of 2026, and learn how to analyze trends in AI research

Medium · Data Science

[PoV] When Everyone Is Smart, No One Is

In a world where AI makes everyone smart, the value of intelligence decreases, and new challenges arise

‘ENOUGH IS ENOUGH’: Lebanon is STANDING UP to Iran, expert says