How to Engineer AI Inference Systems [Philip Kiely] - 766

The TWIML AI Podcast with Sam Charrington · Advanced ·📐 ML Fundamentals ·2h ago
Skills: ML Pipelines90%
In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and model serving. Philip shares how research-to-production can move in hours, not months, and why understanding “the knobs” of inference—batching, quantization, speculation, and KV cache reuse—lets teams design better products and SLAs. We trace the inference maturity journey from closed APIs to dedicated deployments and in-house platforms, discuss GPU lifecycles, and survey today’s runtime landscape, including vLLM, SGLang, and TensorRT LLM. Finally, we look ahead to agents and multimodality, making the case for specialized, workload-specific runtimes when performance and efficiency matter most. 🗒️ For the full list of resources for this episode, visit the show notes page: https://twimlai.com/go/766. 🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1 🗣️ CONNECT WITH US! =============================== Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/ Follow us on Twitter: https://twitter.com/twimlai Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/ Join our Slack Community: https://twimlai.com/community/ Subscribe to our newsletter: https://twimlai.com/newsletter/ Want to get in touch? Send us a message: https://twimlai.com/contact/ 🔗 LINKS & RESOURCES =============================== Inference Engineering Book - https://www.baseten.co/inference-engineering/ Baseten - https://www.baseten.co/ 📸 Camera: https://amzn.to/3TQ3zsg 🎙️Microphone: https://amzn.to/3t5zXeV 🚦Lights: https://amzn.to/3TQlX49 🎛️ Audio Interface: https://amzn.to/3TVFAIq 🎚️ Stream Deck: https://amzn.to/3zzm7F5
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Regime Detection in Markets: Why Most Trading Strategies Fail (and How Quants Adapt)
Learn how regime detection in markets can help traders adapt to changing market conditions and improve their trading strategies
Medium · Data Science
Bombay Stock Exchange, Jan 2026
Learn how deepfakes can be used to manipulate investors and the challenges of building effective deepfake detectors, with a focus on the gap between lab accuracy and real-world performance.
Medium · Machine Learning
The Monitoring Pipeline, With One Prediction Tracked Across 30 Days of Silence (Part 5)
Learn to monitor a prediction pipeline with a single tracked prediction over 30 days of inactivity
Medium · AI
AI-Based Agriculture Image Classification System using Deep Learning
Learn to build an AI-based agriculture image classification system using deep learning to improve crop yields and farming efficiency
Dev.to · Mogalluru Pavan
Up next
Python for Data Science with AI
Coursera
Watch →