Efficient Memory Management for LLM serving

West Coast Machine Learning · Advanced ·📐 ML Fundamentals ·4mo ago
In this meetup, Neha led our discussion of the paper, Efficient Memory Management for LLM Serving. Our Meetup: https://www.meetup.com/East-Bay-Tri-Valley-Machine-Learning-Meetup/ *Content* 00:00 Intro 09:48 Memory usage 21:50 Cache mgmt 32:11 Challenges 36:00 Paged attention 47:46 Sampling 49:24 Beam search 53:00 Memory mgmt. 58:00 Kernel opt ============================ 😊About Us West Coast Machine Learning is a channel dedicated to exploring the exciting world of machine learning and AI! Our group of techies is passionate about AI, deep learning, neural networks, computer vision, tiny M…
Watch on YouTube ↗ (saves to browser)

Chapters (9)

Intro
9:48 Memory usage
21:50 Cache mgmt
32:11 Challenges
36:00 Paged attention
47:46 Sampling
49:24 Beam search
53:00 Memory mgmt.
58:00 Kernel opt
The NEW wave of engineering 🤔
Next Up
The NEW wave of engineering 🤔
Sajjaad Khader