TorchServe CPP Backend

PyTorch · Intermediate ·🌐 Frontend Engineering ·3y ago

Key Takeaways

TorchServe CPP backend for PyTorch model serving solution

Full Transcript

foreign I'm a senior software engineer in AWS in this session I will introduce touch serve CPP backend today I will cover these topics first introduce our team and so why we view the CPP back end what's the change in architecture and then Deep dive the CPP backend internal give a demo and The Benchmark result touch serve is an open source project jointly developed by AWS and The Meta these are members for CPP back-end project why CPP backend there's a high production demand for lower model loading latency reducing GPU memory usage improving GPU utilization concurrency and even request embed touch serve in Edge device however it is difficult to support these requirements in Python backend due to Python language limitation so we decided to build the CPP backend to better serve thoughts of community now let's see what's the change in touch of architecture introduce the Byler CPP backend left aside a picture is touch server original architecture it is a German front end and the python candle worker process on the right side picture we can see backhand extended to support the C plus plus worker process Java front end is not only able to connect with the python worker process but also able to connect with the CPP worker process let's deep dive CPP backend internal as same as python backend CPP backend is divided into multiple layers top layer is Socket to communicate with Java front-end second layer is service management which is responsible for request response encoding decoding and dispatching the requirements third layer is common backend API to support different machine learning plugin by default touch serve provide touchscrip.back end the bottom layer is machine learning platform Handler API for customer model loading pre-processed processing and inference the bottom two layers are highly related to types of users as same as python backend I will explain them one by one the common backhand API layer defines three virtual functions for different machine learning platform to plugging function initialize is from machine learning platform initialization function load model internal is used to implement the model loading in different machine learning platform function predict is used to implement the prediction in different machine learning platform touchscripto backend is the default CPP backend as same as the python backend Handler CPP backend has default base Handler implementation but the users can override it function initialize it use is used for model initialization function load model is used for model loading function pre-post the process is used for model pre and post processing function inference is used to call model to wrong prediction CPP backend maintains the same behave as python backend to allow users to build a model map file by using model archival command for example users is able to use touchscript the base Handler to view the model map file user is also able to build a customer Handler Dynamic library and then wrap it in model Mafia by using model archival command finally users are able to load model and the wrong prediction by using same rest API or grpc API as python backend let's take a look at a demo this is the CPP back and the user menu page to save time already installer code this is a CPP backend common API links this is a touchscript the base Handler API links let's see two use case to build a model my file first I'm you I'm using touchscript the base Handler to build a model my file and then move the model map file into the model store second I'm going to using customer Handler Dynamic library and wrap it into model map file foreign into model store let's start a torch serve we can see the two models are loaded and then we run the model inference this is a model text classification by using script tokenizer Benchmark result the model size is about a 1.9 gigabytes the first picture shows model loading latency is reduced about 25 percentage the P90 and the P99 inference latency also reduced when the batch size is 1 the P99 latency is reduced about the 10 percentage wisdom exercise increasing the gap between CPP and depends on backend is reduced these are the GitHub links for touch soap and CPP backend please try and give us feedback thank you

Original Description

Watch Li Ning from AWS present her talk "TorchServe CPP Backend" at PyTorch Conference 2022. TorchServe is a PyTorch model serving solution. Internally, it is divided into two parts: frontend for model management, and backend for model loading and prediction. TorchServe’s default Python backend allows users to easily plug in a model’s pre and post processing, and also serves PyTorch’s eager mode and torchscripted models. However, this backend limits TorchServe to further performance optimizations due to Python’s restrictions. TorchServe’s CPP backend is a new feature implemented in C++. It not only allows users to plug in model pre and post processing as Python backend does, but also builds the foundation for GPU utilization and concurrency optimization, even providing the flexibility to be embedded in an edge device. Visit our website: https://pytorch.org/ Read our blog: https://pytorch.org/blog/ Follow us on Twitter: https://twitter.com/PyTorch Follow us on LinkedIn: https://www.linkedin.com/company/pyto... Follow us on Facebook: https://www.facebook.com/pytorch #PyTorch #ArtificialIntelligence #MachineLearning
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from PyTorch · PyTorch · 0 of 60

← Previous Next →
1 What is PyTorch?
What is PyTorch?
PyTorch
2 PyTorch Tutorial: A Quick Preview
PyTorch Tutorial: A Quick Preview
PyTorch
3 PyTorch Summer Hackathon 2019
PyTorch Summer Hackathon 2019
PyTorch
4 Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
PyTorch
5 PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch
6 Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
PyTorch
7 Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
PyTorch
8 Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
PyTorch
9 Introduction to Machine Learning for Developers at F8 2019
Introduction to Machine Learning for Developers at F8 2019
PyTorch
10 Powered by PyTorch at F8 2019
Powered by PyTorch at F8 2019
PyTorch
11 Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
PyTorch
12 New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
PyTorch
13 PyTorch Developer Conference 2018: Recap
PyTorch Developer Conference 2018: Recap
PyTorch
14 PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch
15 PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch
16 PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch
17 PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch
18 PyTorch Developer Conference 2019 | Full Livestream
PyTorch Developer Conference 2019 | Full Livestream
PyTorch
19 PyTorch Developer Conference 2019: Recap
PyTorch Developer Conference 2019: Recap
PyTorch
20 PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch
21 What’s new in PyTorch 1.3 - Lin Qiao
What’s new in PyTorch 1.3 - Lin Qiao
PyTorch
22 PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch
23 Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
PyTorch
24 Quantization - Dmytro Dzhulgakov
Quantization - Dmytro Dzhulgakov
PyTorch
25 PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch
26 Apex -  Michael Carilli, NVIDIA
Apex - Michael Carilli, NVIDIA
PyTorch
27 Dataloader Design for PyTorch - Tongzhou Wang, MIT
Dataloader Design for PyTorch - Tongzhou Wang, MIT
PyTorch
28 Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
PyTorch
29 PyTorch Mobile - David Reiss
PyTorch Mobile - David Reiss
PyTorch
30 Model Interpretability with Captum - Narine Kokhilkyan
Model Interpretability with Captum - Narine Kokhilkyan
PyTorch
31 Detectron2 - Next Gen Object Detection Library - Yuxin Wu
Detectron2 - Next Gen Object Detection Library - Yuxin Wu
PyTorch
32 Speech Extensions to Fairseq - Dmytro Okhonko
Speech Extensions to Fairseq - Dmytro Okhonko
PyTorch
33 PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch
34 PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch
35 PyTorch in Robotics - Yisong Yue, Caltech
PyTorch in Robotics - Yisong Yue, Caltech
PyTorch
36 StanfordNLP - Yuhao Zhang, Stanford
StanfordNLP - Yuhao Zhang, Stanford
PyTorch
37 Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
PyTorch
38 Collaborative Natural Language Inference - Sasha Rush, Cornell
Collaborative Natural Language Inference - Sasha Rush, Cornell
PyTorch
39 Privacy Preserving AI - Andrew Trask, OpenMined
Privacy Preserving AI - Andrew Trask, OpenMined
PyTorch
40 CrypTen - Laurens van der Maaten
CrypTen - Laurens van der Maaten
PyTorch
41 PyTorch at Uber - Sidney Zhang, Uber
PyTorch at Uber - Sidney Zhang, Uber
PyTorch
42 PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch
43 PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch
44 PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch
45 PyTorch Developer Conference 2019 - Panel Discussion
PyTorch Developer Conference 2019 - Panel Discussion
PyTorch
46 Using deep learning and PyTorch to power next gen aircraft at Caltech
Using deep learning and PyTorch to power next gen aircraft at Caltech
PyTorch
47 Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
PyTorch
48 TorchScript and PyTorch JIT | Deep Dive
TorchScript and PyTorch JIT | Deep Dive
PyTorch
49 Announcing the PyTorch Global Summer Hackathon 2020
Announcing the PyTorch Global Summer Hackathon 2020
PyTorch
50 Opening Up the Black Box: Model Understanding with Captum and PyTorch
Opening Up the Black Box: Model Understanding with Captum and PyTorch
PyTorch
51 PyTorch Mobile Runtime for Android
PyTorch Mobile Runtime for Android
PyTorch
52 Torchvision in 5 minutes
Torchvision in 5 minutes
PyTorch
53 3D Deep Learning with PyTorch3D
3D Deep Learning with PyTorch3D
PyTorch
54 What is Torchtext?
What is Torchtext?
PyTorch
55 TorchAudio: A Quick Intro
TorchAudio: A Quick Intro
PyTorch
56 PyTorch Mobile Runtime for iOS
PyTorch Mobile Runtime for iOS
PyTorch
57 PySlowFast: Deep learning with Video
PySlowFast: Deep learning with Video
PyTorch
58 PyTorch Pruning | How it's Made by Michela Paganini
PyTorch Pruning | How it's Made by Michela Paganini
PyTorch
59 Measuring Fairness in Machine Learning Systems
Measuring Fairness in Machine Learning Systems
PyTorch
60 PyTorch for Hackathons
PyTorch for Hackathons
PyTorch

Related AI Lessons

Up next
The masks we wear | Zora Krstić | TEDxLuxembourgCity
TEDx Talks
Watch →