TorchServe CPP Backend
Skills:
Backend Performance80%
Key Takeaways
TorchServe CPP backend for PyTorch model serving solution
Full Transcript
foreign I'm a senior software engineer in AWS in this session I will introduce touch serve CPP backend today I will cover these topics first introduce our team and so why we view the CPP back end what's the change in architecture and then Deep dive the CPP backend internal give a demo and The Benchmark result touch serve is an open source project jointly developed by AWS and The Meta these are members for CPP back-end project why CPP backend there's a high production demand for lower model loading latency reducing GPU memory usage improving GPU utilization concurrency and even request embed touch serve in Edge device however it is difficult to support these requirements in Python backend due to Python language limitation so we decided to build the CPP backend to better serve thoughts of community now let's see what's the change in touch of architecture introduce the Byler CPP backend left aside a picture is touch server original architecture it is a German front end and the python candle worker process on the right side picture we can see backhand extended to support the C plus plus worker process Java front end is not only able to connect with the python worker process but also able to connect with the CPP worker process let's deep dive CPP backend internal as same as python backend CPP backend is divided into multiple layers top layer is Socket to communicate with Java front-end second layer is service management which is responsible for request response encoding decoding and dispatching the requirements third layer is common backend API to support different machine learning plugin by default touch serve provide touchscrip.back end the bottom layer is machine learning platform Handler API for customer model loading pre-processed processing and inference the bottom two layers are highly related to types of users as same as python backend I will explain them one by one the common backhand API layer defines three virtual functions for different machine learning platform to plugging function initialize is from machine learning platform initialization function load model internal is used to implement the model loading in different machine learning platform function predict is used to implement the prediction in different machine learning platform touchscripto backend is the default CPP backend as same as the python backend Handler CPP backend has default base Handler implementation but the users can override it function initialize it use is used for model initialization function load model is used for model loading function pre-post the process is used for model pre and post processing function inference is used to call model to wrong prediction CPP backend maintains the same behave as python backend to allow users to build a model map file by using model archival command for example users is able to use touchscript the base Handler to view the model map file user is also able to build a customer Handler Dynamic library and then wrap it in model Mafia by using model archival command finally users are able to load model and the wrong prediction by using same rest API or grpc API as python backend let's take a look at a demo this is the CPP back and the user menu page to save time already installer code this is a CPP backend common API links this is a touchscript the base Handler API links let's see two use case to build a model my file first I'm you I'm using touchscript the base Handler to build a model my file and then move the model map file into the model store second I'm going to using customer Handler Dynamic library and wrap it into model map file foreign into model store let's start a torch serve we can see the two models are loaded and then we run the model inference this is a model text classification by using script tokenizer Benchmark result the model size is about a 1.9 gigabytes the first picture shows model loading latency is reduced about 25 percentage the P90 and the P99 inference latency also reduced when the batch size is 1 the P99 latency is reduced about the 10 percentage wisdom exercise increasing the gap between CPP and depends on backend is reduced these are the GitHub links for touch soap and CPP backend please try and give us feedback thank you
Original Description
Watch Li Ning from AWS present her talk "TorchServe CPP Backend" at PyTorch Conference 2022.
TorchServe is a PyTorch model serving solution. Internally, it is divided into two parts: frontend for model management, and backend for model loading and prediction. TorchServe’s default Python backend allows users to easily plug in a model’s pre and post processing, and also serves PyTorch’s eager mode and torchscripted models. However, this backend limits TorchServe to further performance optimizations due to Python’s restrictions. TorchServe’s CPP backend is a new feature implemented in C++. It not only allows users to plug in model pre and post processing as Python backend does, but also builds the foundation for GPU utilization and concurrency optimization, even providing the flexibility to be embedded in an edge device.
Visit our website: https://pytorch.org/
Read our blog: https://pytorch.org/blog/
Follow us on Twitter: https://twitter.com/PyTorch
Follow us on LinkedIn: https://www.linkedin.com/company/pyto...
Follow us on Facebook: https://www.facebook.com/pytorch
#PyTorch #ArtificialIntelligence #MachineLearning
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from PyTorch · PyTorch · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
What is PyTorch?
PyTorch
PyTorch Tutorial: A Quick Preview
PyTorch
PyTorch Summer Hackathon 2019
PyTorch
Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
PyTorch
PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch
Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
PyTorch
Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
PyTorch
Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
PyTorch
Introduction to Machine Learning for Developers at F8 2019
PyTorch
Powered by PyTorch at F8 2019
PyTorch
Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
PyTorch
New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
PyTorch
PyTorch Developer Conference 2018: Recap
PyTorch
PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch
PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch
PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch
PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch
PyTorch Developer Conference 2019 | Full Livestream
PyTorch
PyTorch Developer Conference 2019: Recap
PyTorch
PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch
What’s new in PyTorch 1.3 - Lin Qiao
PyTorch
PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch
Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
PyTorch
Quantization - Dmytro Dzhulgakov
PyTorch
PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch
Apex - Michael Carilli, NVIDIA
PyTorch
Dataloader Design for PyTorch - Tongzhou Wang, MIT
PyTorch
Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
PyTorch
PyTorch Mobile - David Reiss
PyTorch
Model Interpretability with Captum - Narine Kokhilkyan
PyTorch
Detectron2 - Next Gen Object Detection Library - Yuxin Wu
PyTorch
Speech Extensions to Fairseq - Dmytro Okhonko
PyTorch
PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch
PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch
PyTorch in Robotics - Yisong Yue, Caltech
PyTorch
StanfordNLP - Yuhao Zhang, Stanford
PyTorch
Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
PyTorch
Collaborative Natural Language Inference - Sasha Rush, Cornell
PyTorch
Privacy Preserving AI - Andrew Trask, OpenMined
PyTorch
CrypTen - Laurens van der Maaten
PyTorch
PyTorch at Uber - Sidney Zhang, Uber
PyTorch
PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch
PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch
PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch
PyTorch Developer Conference 2019 - Panel Discussion
PyTorch
Using deep learning and PyTorch to power next gen aircraft at Caltech
PyTorch
Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
PyTorch
TorchScript and PyTorch JIT | Deep Dive
PyTorch
Announcing the PyTorch Global Summer Hackathon 2020
PyTorch
Opening Up the Black Box: Model Understanding with Captum and PyTorch
PyTorch
PyTorch Mobile Runtime for Android
PyTorch
Torchvision in 5 minutes
PyTorch
3D Deep Learning with PyTorch3D
PyTorch
What is Torchtext?
PyTorch
TorchAudio: A Quick Intro
PyTorch
PyTorch Mobile Runtime for iOS
PyTorch
PySlowFast: Deep learning with Video
PyTorch
PyTorch Pruning | How it's Made by Michela Paganini
PyTorch
Measuring Fairness in Machine Learning Systems
PyTorch
PyTorch for Hackathons
PyTorch
More on: Backend Performance
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Next.js vs Remix vs SvelteKit: Which Framework Should You Learn?
Dev.to · Etrit Neziri
Had my Frontend Developer interview with Capgemini (Application Developer) today, and I wanted to…
Medium · JavaScript
10 Frontend Developer Tools to Boost Productivity in 2026
Medium · Programming
10 Frontend Developer Tools to Boost Productivity in 2026
Medium · JavaScript
🎓
Tutor Explanation
DeepCamp AI