Efficient Memory Management for LLM serving
Skills:
LLM Engineering80%
Key Takeaways
Explores efficient memory management techniques for Large Language Model serving
Original Description
In this meetup, Neha led our discussion of the paper, Efficient Memory Management for LLM Serving.
Our Meetup: https://www.meetup.com/East-Bay-Tri-Valley-Machine-Learning-Meetup/
*Content*
00:00 Intro
09:48 Memory usage
21:50 Cache mgmt
32:11 Challenges
36:00 Paged attention
47:46 Sampling
49:24 Beam search
53:00 Memory mgmt.
58:00 Kernel opt
============================
😊About Us
West Coast Machine Learning is a channel dedicated to exploring the exciting world of machine learning and AI! Our group of techies is passionate about AI, deep learning, neural networks, computer vision, tiny ML, and other cool geeky machine learning topics. We love to dive deep into the technical details and stay up to date with the latest research developments.
Our Meetup group and YouTube channel is the perfect place to connect with other like-minded individuals who share your love of machine learning. We offer a mix of research paper discussions, coding reviews, and other data science topics. So, if you're looking to stay up to date with the latest developments in machine learning, connect with other techies, and learn something new, be sure to subscribe to our channel and join our Meetup community today!
Meetup: https://www.meetup.com/east-bay-tri-valley-machine-learning-meetup/
=============================
#llms #llm-memory-mgmt #llm-memory-usage #llm-serving
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Engineering
View skill →Related Reads
📰
📰
📰
📰
Chunking Done Right: Normalization, sentence boundaries, and overlap
Medium · Programming
Why Materials Scientists Are Still Copy-Pasting Data from PDFs in 2026 (And Why AI Changes…
Medium · Machine Learning
From Python Slop to 4µs Rust: How We Accelerated Market Microstructure Simulations by 25,000x
Medium · Data Science
Crafting the Optimal Path: A Deep-Dive Evaluation of Informed vs.
Medium · Python
Chapters (9)
Intro
9:48
Memory usage
21:50
Cache mgmt
32:11
Challenges
36:00
Paged attention
47:46
Sampling
49:24
Beam search
53:00
Memory mgmt.
58:00
Kernel opt
🎓
Tutor Explanation
DeepCamp AI