The Ultimate Transformer Course for Working Engineers

Name: The Ultimate Transformer Course for Working Engineers
Uploaded: 2026-05-12T14:25:03Z
Channel: DeepLearningAI
Description: Learn more: https://bit.ly/4tts8MQ Large language models can feel opaque, especially when you’re dealing with slow inference, hallucinations, memory bot...

DeepLearningAI · Intermediate ·🧠 Large Language Models ·1d ago

Skills: LLM Engineering85%

Learn more: https://bit.ly/4tts8MQ Large language models can feel opaque, especially when you’re dealing with slow inference, hallucinations, memory bottlenecks, or output you can’t fully explain. Today, we’re launching Transformers in Practice, a course taught by Sharon Zhou, VP of Engineering & AI at AMD. The course focuses on understanding what’s actually happening inside transformer-based models so you can reason about their behavior, debug issues more effectively, and make better deployment decisions. You’ll learn: - How transformers generate text one token at a time, and how sampling affects output - What attention, positional encoding, and transformer layers are actually doing - Why hallucinations happen and how techniques like RAG and constrained generation help - How optimizations like quantization, KV caching, flash attention, and speculative decoding improve inference efficiency on GPUs Throughout the course, interactive visualizations help build intuition for concepts that are often difficult to grasp through theory alone. This course will give you a practical understanding of transformers from both the model and systems perspectives. Enroll now: https://bit.ly/4tts8MQ

Watch on YouTube ↗ (saves to browser)