Torchvision Transforms

PyTorch · Beginner ·👁️ Computer Vision ·3y ago
Skills: CV Basics90%

Key Takeaways

Torchvision Transforms for computer vision tasks

Full Transcript

hey everyone my name is Philip and I'm a software engineer at quantite and a toront region maintainer today I'm going to talk to you about the extension of torch Vision transforms to object detection segmentation and video tasks of course major extension like this is not a solo effort apart from myself Victor and vasilis help bring this to you before we dive into what improvements we actually make let's have a quick look at the status quo in a snippet on the left you see a minimal augmentation pipeline suited for image classification this use case is already handled well by our current API in the result on the bottom right you see the image is flapped flipped the Hue has changed and it's rotated a little bit so far so good but what happens if you went to Beyond image classification tasks let's imagine if we want to do object detection instead meaning instead of classifying only the whole image we now want to detect and classify individual objects on the image but the current API you are now stuck the transforms do not support bounding boxes and even if they did they do not support the joint transformation of multiple inputs that we need for this and this is where the work I'm presenting comes in by importing the transforms from the Prototype namespace you can reuse the same pipeline without any additional modifications running the snippet yields the following result the image you see looks exactly as the one on the slide before but in addition the bounding boxes and the labels are handled as well if you're a keen Observer maybe you caught another difference in the code snippet on the last slide that I didn't mention of course we also need to pass the bounding boxes to transform for them to be handled and then the question becomes how do I do that how do I pass my input to the transform and the answer to this is it doesn't matter with our extension you can use whatever input structure you prefer on this slide you see a few examples but the comment on the last example is true the input structure is actually arbitrary the type information of each input has to be communicated somehow so how does it work and the answer to that is already someone in the question the type information is communicated through the actual type of the input we introduced tensor subclasses that are thin wrappers around the plain tensors there are zero copy abstraction and look and feel like the regular tensors that you are used to in addition they allow us to store metadata like the color space of an image or the format of a bounding box on the actual object rather than externally the API currently supports images videos bounding boxes masks labels and one hot labels now that we have the 10 000 foot overview let's dive a little into the details the API that we design comprises three levels ranging from high to low level functionality the highest abstraction are the transform objects that we have already seen in the examples on the previous slides as mentioned they support arbitrary input structures each transform knows what kind of input it can handle and returns everything else unchanged for example safely path through an image alongside the other inputs which can be very helpful if something goes wrong down the line plane tensors are treated as images or where applicable as videos to mimic the behavior of the old transforms in addition the transforms are now joined by Design random parameters are sampled only once per call and applied to all inputs within the same while the interface is fully backwards compatible thought script unfortunately does not allow arbitrary inputs or tensor subclassing and thus the transforms are no longer cheat scriptable the medium level of the API comprises the dispatchers in the current transforms this is the functional API they only support a single input but it can be any of the previous mentioned tensor subclasses metadata like the color space of an image or the format of a bounding box is passed implicitly as attributes on the object the dispatchers have the same fallback for playing tensors as the transforms have for this use case they remain fully transcriptable the lowest level of the API are the kernels which are also located inside the functional API they were already present on the previous API but were considered private this extension promotes them to regular functionality let's work with plain tensors and are thus decoupled from all the previously introduced abstractions this means the metadata has to be passed explicitly but they also that they are fully descriptible although I haven't mentioned it on any level pillow images are still supported the transforms and dispatchers handle them the same way they do with the tensor subclasses and there are specific kernels just for them since we've already looked at some examples for the transforms let's also have a look at an example of what a functional API in the top half of the snippet the kernel use case is shown apart from the values inside the tensor you also have to pass the format as well as the spatial size of the image by using the bounding box subclass this metadata is stored on the tensor thus you don't have to pass them explicitly to the dispatcher ultimately of course the resulting values are the same with all of this extra functionality there's still one question looming in the background will the performance be worth and I'm happy to report that the answer to this is no in fact we're actually a little bit faster than before we made quite an effort to improve the performance of the API without compromising functionality in most cases I'm going to refrain to announce heavily aggregated numbers since there's too much Nuance to fit into this talk I'm going to focus on General Trends instead on the next slide there's a link to a detailed report for you if you want to take a deep dive looking at the individual aspects of my of our API we see a marginal Improvement for the pill back end for the tensor back end there are a number of kernels we have improved significantly Improvement is in double digit percentages the remaining kernels are basically thin wrappers around single pie torch operator and thus we can't optimize them further from touch region still we're actively working on them with the pie George core team to improve them as well with this in mind we can now also look at how this affects an actual training we use the torturvision image classification recipe for benchmarking since it touches most parts of the API as expected the performance with the pill back end is basically the same the same for the tensor back end we measured an 80 Improvement which translates to a couple of hours on the hardware we used again for the full Benchmark in all of its Glory see the link on the next slide the only thing that is left to say is we would love to hear your thoughts about this you can reach us through the repository on what particular 32 issues displayed here thank you for listening and we hope to hear from you soon

Original Description

Philip Meier from Quansight presents "Torchvision Transforms" at PyTorch Conference 2022. TorchVision is extending its Transforms API! This talk previews the current prototype that is no longer limited to Image classification, but can also natively handle Object Detection, Instance and Semantic Segmentation, and Video classification. Visit our website: https://pytorch.org/ Read our blog: https://pytorch.org/blog/ Follow us on Twitter: https://twitter.com/PyTorch Follow us on LinkedIn: https://www.linkedin.com/company/pyto... Follow us on Facebook: https://www.facebook.com/pytorch #PyTorch #ArtificialIntelligence #MachineLearning
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from PyTorch · PyTorch · 0 of 60

← Previous Next →
1 What is PyTorch?
What is PyTorch?
PyTorch
2 PyTorch Tutorial: A Quick Preview
PyTorch Tutorial: A Quick Preview
PyTorch
3 PyTorch Summer Hackathon 2019
PyTorch Summer Hackathon 2019
PyTorch
4 Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
PyTorch
5 PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch
6 Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
PyTorch
7 Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
PyTorch
8 Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
PyTorch
9 Introduction to Machine Learning for Developers at F8 2019
Introduction to Machine Learning for Developers at F8 2019
PyTorch
10 Powered by PyTorch at F8 2019
Powered by PyTorch at F8 2019
PyTorch
11 Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
PyTorch
12 New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
PyTorch
13 PyTorch Developer Conference 2018: Recap
PyTorch Developer Conference 2018: Recap
PyTorch
14 PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch
15 PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch
16 PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch
17 PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch
18 PyTorch Developer Conference 2019 | Full Livestream
PyTorch Developer Conference 2019 | Full Livestream
PyTorch
19 PyTorch Developer Conference 2019: Recap
PyTorch Developer Conference 2019: Recap
PyTorch
20 PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch
21 What’s new in PyTorch 1.3 - Lin Qiao
What’s new in PyTorch 1.3 - Lin Qiao
PyTorch
22 PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch
23 Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
PyTorch
24 Quantization - Dmytro Dzhulgakov
Quantization - Dmytro Dzhulgakov
PyTorch
25 PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch
26 Apex -  Michael Carilli, NVIDIA
Apex - Michael Carilli, NVIDIA
PyTorch
27 Dataloader Design for PyTorch - Tongzhou Wang, MIT
Dataloader Design for PyTorch - Tongzhou Wang, MIT
PyTorch
28 Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
PyTorch
29 PyTorch Mobile - David Reiss
PyTorch Mobile - David Reiss
PyTorch
30 Model Interpretability with Captum - Narine Kokhilkyan
Model Interpretability with Captum - Narine Kokhilkyan
PyTorch
31 Detectron2 - Next Gen Object Detection Library - Yuxin Wu
Detectron2 - Next Gen Object Detection Library - Yuxin Wu
PyTorch
32 Speech Extensions to Fairseq - Dmytro Okhonko
Speech Extensions to Fairseq - Dmytro Okhonko
PyTorch
33 PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch
34 PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch
35 PyTorch in Robotics - Yisong Yue, Caltech
PyTorch in Robotics - Yisong Yue, Caltech
PyTorch
36 StanfordNLP - Yuhao Zhang, Stanford
StanfordNLP - Yuhao Zhang, Stanford
PyTorch
37 Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
PyTorch
38 Collaborative Natural Language Inference - Sasha Rush, Cornell
Collaborative Natural Language Inference - Sasha Rush, Cornell
PyTorch
39 Privacy Preserving AI - Andrew Trask, OpenMined
Privacy Preserving AI - Andrew Trask, OpenMined
PyTorch
40 CrypTen - Laurens van der Maaten
CrypTen - Laurens van der Maaten
PyTorch
41 PyTorch at Uber - Sidney Zhang, Uber
PyTorch at Uber - Sidney Zhang, Uber
PyTorch
42 PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch
43 PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch
44 PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch
45 PyTorch Developer Conference 2019 - Panel Discussion
PyTorch Developer Conference 2019 - Panel Discussion
PyTorch
46 Using deep learning and PyTorch to power next gen aircraft at Caltech
Using deep learning and PyTorch to power next gen aircraft at Caltech
PyTorch
47 Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
PyTorch
48 TorchScript and PyTorch JIT | Deep Dive
TorchScript and PyTorch JIT | Deep Dive
PyTorch
49 Announcing the PyTorch Global Summer Hackathon 2020
Announcing the PyTorch Global Summer Hackathon 2020
PyTorch
50 Opening Up the Black Box: Model Understanding with Captum and PyTorch
Opening Up the Black Box: Model Understanding with Captum and PyTorch
PyTorch
51 PyTorch Mobile Runtime for Android
PyTorch Mobile Runtime for Android
PyTorch
52 Torchvision in 5 minutes
Torchvision in 5 minutes
PyTorch
53 3D Deep Learning with PyTorch3D
3D Deep Learning with PyTorch3D
PyTorch
54 What is Torchtext?
What is Torchtext?
PyTorch
55 TorchAudio: A Quick Intro
TorchAudio: A Quick Intro
PyTorch
56 PyTorch Mobile Runtime for iOS
PyTorch Mobile Runtime for iOS
PyTorch
57 PySlowFast: Deep learning with Video
PySlowFast: Deep learning with Video
PyTorch
58 PyTorch Pruning | How it's Made by Michela Paganini
PyTorch Pruning | How it's Made by Michela Paganini
PyTorch
59 Measuring Fairness in Machine Learning Systems
Measuring Fairness in Machine Learning Systems
PyTorch
60 PyTorch for Hackathons
PyTorch for Hackathons
PyTorch

Related Reads

📰
The Role of 3D Cuboid Annotation in Autonomous Vehicle Perception
Learn how 3D cuboid annotation enables autonomous vehicles to perceive their environment accurately, and why it's crucial for safe navigation, with steps to apply this knowledge in practice.
Dev.to AI
📰
Vision AI: Transforming Business Operations with Computer Vision AI
Learn how Vision AI transforms business operations with computer vision, and why it matters for companies to leverage video data
Medium · AI
📰
Vision AI: Transforming Business Operations with Computer Vision AI
Learn how Vision AI transforms business operations with computer vision AI, enabling companies to extract valuable insights from camera videos
Medium · Machine Learning
📰
Cloud-Optimized OpenCV + A Special Surprise Announcement on OpenCV Live
Learn about Cloud-Optimized OpenCV for faster computer vision computations and a special announcement on OpenCV Live
OpenCV Blog
Up next
Marketing management for ugc net| Important topics of marketing management ugc net commerce dec 2023
Bhoomi Learning Centre~Dr. Muskan
Watch →