Efficient Memory Management for LLM serving
In this meetup, Neha led our discussion of the paper, Efficient Memory Management for LLM Serving.
Our Meetup: https://www.meetup.com/East-Bay-Tri-Valley-Machine-Learning-Meetup/
*Content*
00:00 Intro
09:48 Memory usage
21:50 Cache mgmt
32:11 Challenges
36:00 Paged attention
47:46 Sampling
49:24 Beam search
53:00 Memory mgmt.
58:00 Kernel opt
============================
😊About Us
West Coast Machine Learning is a channel dedicated to exploring the exciting world of machine learning and AI! Our group of techies is passionate about AI, deep learning, neural networks, computer vision, tiny M…
Watch on YouTube ↗
(saves to browser)
Chapters (9)
Intro
9:48
Memory usage
21:50
Cache mgmt
32:11
Challenges
36:00
Paged attention
47:46
Sampling
49:24
Beam search
53:00
Memory mgmt.
58:00
Kernel opt
DeepCamp AI