Torchvision Transforms

PyTorch · Beginner ·👁️ Computer Vision ·3y ago

Skills: CV Basics90%

Key Takeaways

Torchvision Transforms for computer vision tasks

Full Transcript

hey everyone my name is Philip and I'm a software engineer at quantite and a toront region maintainer today I'm going to talk to you about the extension of torch Vision transforms to object detection segmentation and video tasks of course major extension like this is not a solo effort apart from myself Victor and vasilis help bring this to you before we dive into what improvements we actually make let's have a quick look at the status quo in a snippet on the left you see a minimal augmentation pipeline suited for image classification this use case is already handled well by our current API in the result on the bottom right you see the image is flapped flipped the Hue has changed and it's rotated a little bit so far so good but what happens if you went to Beyond image classification tasks let's imagine if we want to do object detection instead meaning instead of classifying only the whole image we now want to detect and classify individual objects on the image but the current API you are now stuck the transforms do not support bounding boxes and even if they did they do not support the joint transformation of multiple inputs that we need for this and this is where the work I'm presenting comes in by importing the transforms from the Prototype namespace you can reuse the same pipeline without any additional modifications running the snippet yields the following result the image you see looks exactly as the one on the slide before but in addition the bounding boxes and the labels are handled as well if you're a keen Observer maybe you caught another difference in the code snippet on the last slide that I didn't mention of course we also need to pass the bounding boxes to transform for them to be handled and then the question becomes how do I do that how do I pass my input to the transform and the answer to this is it doesn't matter with our extension you can use whatever input structure you prefer on this slide you see a few examples but the comment on the last example is true the input structure is actually arbitrary the type information of each input has to be communicated somehow so how does it work and the answer to that is already someone in the question the type information is communicated through the actual type of the input we introduced tensor subclasses that are thin wrappers around the plain tensors there are zero copy abstraction and look and feel like the regular tensors that you are used to in addition they allow us to store metadata like the color space of an image or the format of a bounding box on the actual object rather than externally the API currently supports images videos bounding boxes masks labels and one hot labels now that we have the 10 000 foot overview let's dive a little into the details the API that we design comprises three levels ranging from high to low level functionality the highest abstraction are the transform objects that we have already seen in the examples on the previous slides as mentioned they support arbitrary input structures each transform knows what kind of input it can handle and returns everything else unchanged for example safely path through an image alongside the other inputs which can be very helpful if something goes wrong down the line plane tensors are treated as images or where applicable as videos to mimic the behavior of the old transforms in addition the transforms are now joined by Design random parameters are sampled only once per call and applied to all inputs within the same while the interface is fully backwards compatible thought script unfortunately does not allow arbitrary inputs or tensor subclassing and thus the transforms are no longer cheat scriptable the medium level of the API comprises the dispatchers in the current transforms this is the functional API they only support a single input but it can be any of the previous mentioned tensor subclasses metadata like the color space of an image or the format of a bounding box is passed implicitly as attributes on the object the dispatchers have the same fallback for playing tensors as the transforms have for this use case they remain fully transcriptable the lowest level of the API are the kernels which are also located inside the functional API they were already present on the previous API but were considered private this extension promotes them to regular functionality let's work with plain tensors and are thus decoupled from all the previously introduced abstractions this means the metadata has to be passed explicitly but they also that they are fully descriptible although I haven't mentioned it on any level pillow images are still supported the transforms and dispatchers handle them the same way they do with the tensor subclasses and there are specific kernels just for them since we've already looked at some examples for the transforms let's also have a look at an example of what a functional API in the top half of the snippet the kernel use case is shown apart from the values inside the tensor you also have to pass the format as well as the spatial size of the image by using the bounding box subclass this metadata is stored on the tensor thus you don't have to pass them explicitly to the dispatcher ultimately of course the resulting values are the same with all of this extra functionality there's still one question looming in the background will the performance be worth and I'm happy to report that the answer to this is no in fact we're actually a little bit faster than before we made quite an effort to improve the performance of the API without compromising functionality in most cases I'm going to refrain to announce heavily aggregated numbers since there's too much Nuance to fit into this talk I'm going to focus on General Trends instead on the next slide there's a link to a detailed report for you if you want to take a deep dive looking at the individual aspects of my of our API we see a marginal Improvement for the pill back end for the tensor back end there are a number of kernels we have improved significantly Improvement is in double digit percentages the remaining kernels are basically thin wrappers around single pie torch operator and thus we can't optimize them further from touch region still we're actively working on them with the pie George core team to improve them as well with this in mind we can now also look at how this affects an actual training we use the torturvision image classification recipe for benchmarking since it touches most parts of the API as expected the performance with the pill back end is basically the same the same for the tensor back end we measured an 80 Improvement which translates to a couple of hours on the hardware we used again for the full Benchmark in all of its Glory see the link on the next slide the only thing that is left to say is we would love to hear your thoughts about this you can reach us through the repository on what particular 32 issues displayed here thank you for listening and we hope to hear from you soon

Original Description

Philip Meier from Quansight presents "Torchvision Transforms" at PyTorch Conference 2022. TorchVision is extending its Transforms API! This talk previews the current prototype that is no longer limited to Image classification, but can also natively handle Object Detection, Instance and Semantic Segmentation, and Video classification. Visit our website: https://pytorch.org/ Read our blog: https://pytorch.org/blog/ Follow us on Twitter: https://twitter.com/PyTorch Follow us on LinkedIn: https://www.linkedin.com/company/pyto... Follow us on Facebook: https://www.facebook.com/pytorch #PyTorch #ArtificialIntelligence #MachineLearning

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from PyTorch · PyTorch · 0 of 60

← Previous Next →

What is PyTorch?

What is PyTorch?

PyTorch Tutorial: A Quick Preview

PyTorch Tutorial: A Quick Preview

PyTorch Summer Hackathon 2019

PyTorch Summer Hackathon 2019

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Introduction to Machine Learning for Developers at F8 2019

Introduction to Machine Learning for Developers at F8 2019

Powered by PyTorch at F8 2019

Powered by PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference Keynote - Mike Schroepfer

PyTorch Developer Conference Keynote - Mike Schroepfer

What’s new in PyTorch 1.3 - Lin Qiao

What’s new in PyTorch 1.3 - Lin Qiao

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Quantization - Dmytro Dzhulgakov

Quantization - Dmytro Dzhulgakov

PyTorch ONNX Export Support - Lara Haidar, Microsoft

PyTorch ONNX Export Support - Lara Haidar, Microsoft

Apex - Michael Carilli, NVIDIA

Apex - Michael Carilli, NVIDIA

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

PyTorch Mobile - David Reiss

PyTorch Mobile - David Reiss

Model Interpretability with Captum - Narine Kokhilkyan

Model Interpretability with Captum - Narine Kokhilkyan

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Speech Extensions to Fairseq - Dmytro Okhonko

Speech Extensions to Fairseq - Dmytro Okhonko

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch in Robotics - Yisong Yue, Caltech

PyTorch in Robotics - Yisong Yue, Caltech

StanfordNLP - Yuhao Zhang, Stanford

StanfordNLP - Yuhao Zhang, Stanford

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Collaborative Natural Language Inference - Sasha Rush, Cornell

Collaborative Natural Language Inference - Sasha Rush, Cornell

Privacy Preserving AI - Andrew Trask, OpenMined

Privacy Preserving AI - Andrew Trask, OpenMined

CrypTen - Laurens van der Maaten

CrypTen - Laurens van der Maaten

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch Developer Conference 2019 - Panel Discussion

PyTorch Developer Conference 2019 - Panel Discussion

Using deep learning and PyTorch to power next gen aircraft at Caltech

Using deep learning and PyTorch to power next gen aircraft at Caltech

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

TorchScript and PyTorch JIT | Deep Dive

TorchScript and PyTorch JIT | Deep Dive

Announcing the PyTorch Global Summer Hackathon 2020

Announcing the PyTorch Global Summer Hackathon 2020

Opening Up the Black Box: Model Understanding with Captum and PyTorch

Opening Up the Black Box: Model Understanding with Captum and PyTorch

PyTorch Mobile Runtime for Android

PyTorch Mobile Runtime for Android

Torchvision in 5 minutes

Torchvision in 5 minutes

3D Deep Learning with PyTorch3D

3D Deep Learning with PyTorch3D

What is Torchtext?

What is Torchtext?

TorchAudio: A Quick Intro

TorchAudio: A Quick Intro

PyTorch Mobile Runtime for iOS

PyTorch Mobile Runtime for iOS

PySlowFast: Deep learning with Video

PySlowFast: Deep learning with Video

PyTorch Pruning | How it's Made by Michela Paganini

PyTorch Pruning | How it's Made by Michela Paganini

Measuring Fairness in Machine Learning Systems

Measuring Fairness in Machine Learning Systems

PyTorch for Hackathons

PyTorch for Hackathons

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA Developer

Related Reads

The Role of 3D Cuboid Annotation in Autonomous Vehicle Perception

Learn how 3D cuboid annotation enables autonomous vehicles to perceive their environment accurately, and why it's crucial for safe navigation, with steps to apply this knowledge in practice.

Vision AI: Transforming Business Operations with Computer Vision AI

Learn how Vision AI transforms business operations with computer vision, and why it matters for companies to leverage video data

Vision AI: Transforming Business Operations with Computer Vision AI

Learn how Vision AI transforms business operations with computer vision AI, enabling companies to extract valuable insights from camera videos

Medium · Machine Learning

Cloud-Optimized OpenCV + A Special Surprise Announcement on OpenCV Live

Learn about Cloud-Optimized OpenCV for faster computer vision computations and a special announcement on OpenCV Live

Marketing management for ugc net| Important topics of marketing management ugc net commerce dec 2023

Bhoomi Learning Centre~Dr. Muskan