Tour De Force: LLM Inference Optimization From Simple To Sophisticated - Christin Pohl, Microsoft

PyTorch · Beginner ·🏭 MLOps & LLMOps ·1mo ago
Skills: LLMOps80%
Tour De Force: LLM Inference Optimization From Simple To Sophisticated - Christin Pohl, Microsoft Making your GPUs go brrr is complex. Efficient LLM inference requires navigating a maze of optimization techniques each with different trade-offs. This session provides a practical journey through inference optimizations, clearly categorized by implementation effort. We'll explore techniques across three levels: - Model choices (start here): Model selection, quantization, smart routing - Library-level improvements (using PyTorch-based frameworks like vLLM, SGLang, TensorRT-LLM): Continuous batching, KV-cache management, tensor parallelism - Custom implementations: Speculative decoding with custom draft heads, disaggregated inference, fine-tuning smaller models The session covers practical trade-offs and key metrics: time to first token, inter-token latency, throughput, and cost per token. Whether deploying your first model or optimizing at scale, this talk delivers actionable insights into which techniques to prioritize for deeper investigation.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Things I Learned Building an End-to-End ML Pipeline on Kubernetes: From Validated Data to Live…
Learn how to build an end-to-end ML pipeline on Kubernetes, automating 60 models with Airflow DAG
Medium · Machine Learning
Day 2: Set Up and Configure Jupyter Notebook Server | KodeKloud MLOps Journey
Learn to set up and configure a Jupyter Notebook Server for MLOps, a crucial step in streamlining your machine learning workflow
Medium · Machine Learning
Day 2: Set Up and Configure Jupyter Notebook Server | KodeKloud MLOps Journey
Learn to set up and configure a Jupyter Notebook Server for MLOps, enabling data scientists to collaborate and work efficiently
Medium · Data Science
Day 2: Set Up and Configure Jupyter Notebook Server | KodeKloud MLOps Journey
Learn to set up and configure a Jupyter Notebook Server for MLOps, a crucial step in data science and machine learning workflows
Medium · Python
Up next
Brevitas Quantization Library - Pablo Monteagudo Lago, AMD
PyTorch
Watch →