Efficient Memory Management for LLM serving

West Coast Machine Learning · Advanced ·📐 ML Fundamentals ·6mo ago
In this meetup, Neha led our discussion of the paper, Efficient Memory Management for LLM Serving. Our Meetup: https://www.meetup.com/East-Bay-Tri-Valley-Machine-Learning-Meetup/ *Content* 00:00 Intro 09:48 Memory usage 21:50 Cache mgmt 32:11 Challenges 36:00 Paged attention 47:46 Sampling 49:24 Beam search 53:00 Memory mgmt. 58:00 Kernel opt ============================ 😊About Us West Coast Machine Learning is a channel dedicated to exploring the exciting world of machine learning and AI! Our group of techies is passionate about AI, deep learning, neural networks, computer vision, tiny ML, and other cool geeky machine learning topics. We love to dive deep into the technical details and stay up to date with the latest research developments. Our Meetup group and YouTube channel is the perfect place to connect with other like-minded individuals who share your love of machine learning. We offer a mix of research paper discussions, coding reviews, and other data science topics. So, if you're looking to stay up to date with the latest developments in machine learning, connect with other techies, and learn something new, be sure to subscribe to our channel and join our Meetup community today! Meetup: https://www.meetup.com/east-bay-tri-valley-machine-learning-meetup/ ============================= #llms #llm-memory-mgmt #llm-memory-usage #llm-serving
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

7 Common Java Streams Mistakes and How to Avoid Them
Learn to avoid common Java Streams mistakes and improve your coding skills
Medium · Programming
The Threshold Is a Business Decision, Not a Statistical One
Learn how to build a production-grade fraud detection system and why statistical thresholds are business decisions, not just statistical ones
Medium · Machine Learning
Can Your Stress Level Predict How Much You Sleep?
Explore the relationship between stress levels and sleep patterns using data analysis and machine learning techniques to uncover hidden patterns
Medium · Machine Learning
Role of Model Architecture In Inference — Inference Series
Learn how generative AI architecture impacts inference system design and why it matters for efficient model deployment
Medium · Machine Learning

Chapters (9)

Intro
9:48 Memory usage
21:50 Cache mgmt
32:11 Challenges
36:00 Paged attention
47:46 Sampling
49:24 Beam search
53:00 Memory mgmt.
58:00 Kernel opt
Up next
Generative Artificial Intelligence Full Course 2026 | Gen AI Tutorial For Beginners | Simplilearn
Simplilearn
Watch →