TorchServe CPP Backend

PyTorch · Intermediate ·🌐 Frontend Engineering ·3y ago

Skills: Backend Performance80%

Key Takeaways

TorchServe CPP backend for PyTorch model serving solution

Full Transcript

foreign I'm a senior software engineer in AWS in this session I will introduce touch serve CPP backend today I will cover these topics first introduce our team and so why we view the CPP back end what's the change in architecture and then Deep dive the CPP backend internal give a demo and The Benchmark result touch serve is an open source project jointly developed by AWS and The Meta these are members for CPP back-end project why CPP backend there's a high production demand for lower model loading latency reducing GPU memory usage improving GPU utilization concurrency and even request embed touch serve in Edge device however it is difficult to support these requirements in Python backend due to Python language limitation so we decided to build the CPP backend to better serve thoughts of community now let's see what's the change in touch of architecture introduce the Byler CPP backend left aside a picture is touch server original architecture it is a German front end and the python candle worker process on the right side picture we can see backhand extended to support the C plus plus worker process Java front end is not only able to connect with the python worker process but also able to connect with the CPP worker process let's deep dive CPP backend internal as same as python backend CPP backend is divided into multiple layers top layer is Socket to communicate with Java front-end second layer is service management which is responsible for request response encoding decoding and dispatching the requirements third layer is common backend API to support different machine learning plugin by default touch serve provide touchscrip.back end the bottom layer is machine learning platform Handler API for customer model loading pre-processed processing and inference the bottom two layers are highly related to types of users as same as python backend I will explain them one by one the common backhand API layer defines three virtual functions for different machine learning platform to plugging function initialize is from machine learning platform initialization function load model internal is used to implement the model loading in different machine learning platform function predict is used to implement the prediction in different machine learning platform touchscripto backend is the default CPP backend as same as the python backend Handler CPP backend has default base Handler implementation but the users can override it function initialize it use is used for model initialization function load model is used for model loading function pre-post the process is used for model pre and post processing function inference is used to call model to wrong prediction CPP backend maintains the same behave as python backend to allow users to build a model map file by using model archival command for example users is able to use touchscript the base Handler to view the model map file user is also able to build a customer Handler Dynamic library and then wrap it in model Mafia by using model archival command finally users are able to load model and the wrong prediction by using same rest API or grpc API as python backend let's take a look at a demo this is the CPP back and the user menu page to save time already installer code this is a CPP backend common API links this is a touchscript the base Handler API links let's see two use case to build a model my file first I'm you I'm using touchscript the base Handler to build a model my file and then move the model map file into the model store second I'm going to using customer Handler Dynamic library and wrap it into model map file foreign into model store let's start a torch serve we can see the two models are loaded and then we run the model inference this is a model text classification by using script tokenizer Benchmark result the model size is about a 1.9 gigabytes the first picture shows model loading latency is reduced about 25 percentage the P90 and the P99 inference latency also reduced when the batch size is 1 the P99 latency is reduced about the 10 percentage wisdom exercise increasing the gap between CPP and depends on backend is reduced these are the GitHub links for touch soap and CPP backend please try and give us feedback thank you

Original Description

Watch Li Ning from AWS present her talk "TorchServe CPP Backend" at PyTorch Conference 2022. TorchServe is a PyTorch model serving solution. Internally, it is divided into two parts: frontend for model management, and backend for model loading and prediction. TorchServe’s default Python backend allows users to easily plug in a model’s pre and post processing, and also serves PyTorch’s eager mode and torchscripted models. However, this backend limits TorchServe to further performance optimizations due to Python’s restrictions. TorchServe’s CPP backend is a new feature implemented in C++. It not only allows users to plug in model pre and post processing as Python backend does, but also builds the foundation for GPU utilization and concurrency optimization, even providing the flexibility to be embedded in an edge device. Visit our website: https://pytorch.org/ Read our blog: https://pytorch.org/blog/ Follow us on Twitter: https://twitter.com/PyTorch Follow us on LinkedIn: https://www.linkedin.com/company/pyto... Follow us on Facebook: https://www.facebook.com/pytorch #PyTorch #ArtificialIntelligence #MachineLearning

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from PyTorch · PyTorch · 0 of 60

← Previous Next →

What is PyTorch?

What is PyTorch?

PyTorch Tutorial: A Quick Preview

PyTorch Tutorial: A Quick Preview

PyTorch Summer Hackathon 2019

PyTorch Summer Hackathon 2019

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Introduction to Machine Learning for Developers at F8 2019

Introduction to Machine Learning for Developers at F8 2019

Powered by PyTorch at F8 2019

Powered by PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference Keynote - Mike Schroepfer

PyTorch Developer Conference Keynote - Mike Schroepfer

What’s new in PyTorch 1.3 - Lin Qiao

What’s new in PyTorch 1.3 - Lin Qiao

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Quantization - Dmytro Dzhulgakov

Quantization - Dmytro Dzhulgakov

PyTorch ONNX Export Support - Lara Haidar, Microsoft

PyTorch ONNX Export Support - Lara Haidar, Microsoft

Apex - Michael Carilli, NVIDIA

Apex - Michael Carilli, NVIDIA

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

PyTorch Mobile - David Reiss

PyTorch Mobile - David Reiss

Model Interpretability with Captum - Narine Kokhilkyan

Model Interpretability with Captum - Narine Kokhilkyan

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Speech Extensions to Fairseq - Dmytro Okhonko

Speech Extensions to Fairseq - Dmytro Okhonko

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch in Robotics - Yisong Yue, Caltech

PyTorch in Robotics - Yisong Yue, Caltech

StanfordNLP - Yuhao Zhang, Stanford

StanfordNLP - Yuhao Zhang, Stanford

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Collaborative Natural Language Inference - Sasha Rush, Cornell

Collaborative Natural Language Inference - Sasha Rush, Cornell

Privacy Preserving AI - Andrew Trask, OpenMined

Privacy Preserving AI - Andrew Trask, OpenMined

CrypTen - Laurens van der Maaten

CrypTen - Laurens van der Maaten

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch Developer Conference 2019 - Panel Discussion

PyTorch Developer Conference 2019 - Panel Discussion

Using deep learning and PyTorch to power next gen aircraft at Caltech

Using deep learning and PyTorch to power next gen aircraft at Caltech

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

TorchScript and PyTorch JIT | Deep Dive

TorchScript and PyTorch JIT | Deep Dive

Announcing the PyTorch Global Summer Hackathon 2020

Announcing the PyTorch Global Summer Hackathon 2020

Opening Up the Black Box: Model Understanding with Captum and PyTorch

Opening Up the Black Box: Model Understanding with Captum and PyTorch

PyTorch Mobile Runtime for Android

PyTorch Mobile Runtime for Android

Torchvision in 5 minutes

Torchvision in 5 minutes

3D Deep Learning with PyTorch3D

3D Deep Learning with PyTorch3D

What is Torchtext?

What is Torchtext?

TorchAudio: A Quick Intro

TorchAudio: A Quick Intro

PyTorch Mobile Runtime for iOS

PyTorch Mobile Runtime for iOS

PySlowFast: Deep learning with Video

PySlowFast: Deep learning with Video

PyTorch Pruning | How it's Made by Michela Paganini

PyTorch Pruning | How it's Made by Michela Paganini

Measuring Fairness in Machine Learning Systems

Measuring Fairness in Machine Learning Systems

PyTorch for Hackathons

PyTorch for Hackathons

More on: Backend Performance

View skill →

Build Real Time Chat Rooms With Node.js And Socket.io

Build Real Time Chat Rooms With Node.js And Socket.io

Web Dev Simplified

Yet Another "Highly Technical Talk" with Hanselman and Toub | BRK121

Yet Another "Highly Technical Talk" with Hanselman and Toub | BRK121

Microsoft Developer

Build an Online Auction Server with ExpressJS

Build an Online Auction Server with ExpressJS

Setting up a Proxy for Google Cloud SQL (P5D71) - Live Coding with Jesse

Setting up a Proxy for Google Cloud SQL (P5D71) - Live Coding with Jesse

freeCodeCamp.org

"Highly Technical Talk" with Hanselman and Toub | BRK194

"Highly Technical Talk" with Hanselman and Toub | BRK194

Microsoft Developer

GraphQL Server Intermediate Tutorial - Boilerplate with Typescript, PostgreSQL, and Redis

GraphQL Server Intermediate Tutorial - Boilerplate with Typescript, PostgreSQL, and Redis

freeCodeCamp.org

Related AI Lessons

Next.js vs Remix vs SvelteKit: Which Framework Should You Learn?

Learn how to choose between Next.js, Remix, and SvelteKit for your next project and why it matters for your career as a developer

Dev.to · Etrit Neziri

Had my Frontend Developer interview with Capgemini (Application Developer) today, and I wanted to…

Prepare for a frontend developer interview with Capgemini by reviewing JavaScript fundamentals and practicing common interview questions

Medium · JavaScript

10 Frontend Developer Tools to Boost Productivity in 2026

Boost frontend productivity with 10 essential tools for modern web app development

Medium · Programming

10 Frontend Developer Tools to Boost Productivity in 2026

Boost frontend productivity with top 10 developer tools in 2026

Medium · JavaScript

The masks we wear | Zora Krstić | TEDxLuxembourgCity