Learn how ChatGPT and DeepSeek models work: How Transformer LLMs Work [Free Course]

Jay Alammar · Advanced ·🧠 Large Language Models ·1y ago

Key Takeaways

This video course teaches how Transformer LLMs work, covering the modern Transformer architecture, tokenizers, embeddings, and mixture-of-expert models, with a focus on ChatGPT and DeepSeek models.

Full Transcript

I'm delighted to introduce how Transformer alums work built with Jay Alma and Martin honas the authors of the beautifully illustrated book Hands-On lunch language models Jay is director of engineering fellow at coher and Martin is senior clinical data scientist at the Netherlands Comprehensive Cancer Institute in this course you learn at a deep technical level about the inner workings of how the Transformer Network architecture the P alms works this is the architecture that revolutionize generative AI in fact the GPT in chat GPT stands for generative pre-train Transformer so you build an intuition on how L's process text and you also work with code examples to illustrate the key components of the transform architecture so what you learn is things like what's an attention mechanism and different flavors like self attention and what is a KV cache and so on and if these terms don't yet make sense to you they will after this School the original Transformer was introduced in the 2017 paper attention is all you need by Ashish vaswani and others as a highly scalable model for machine translation TOS variants of this architecture Now power most of today's OMS from open AI anthropic Google coher and meta in 2018 Jay pioneered the efforts of explaining the Transformer architecture in a well-known article The Illustrated Transformer Jay created these wonderful visualizations of the Transformer that help many people understand how it works he also Illustrated other models like gpd2 Bert and staple diffusion thank you Andrew it's great to be here Martin and I think that illustrating complex Concepts such as Transformers creates a fun and easy learning process in our book we worked on an updated version of The Illustrated Transformer as well as describing how to prompt use and train with Hands-On coding examples and we're excited to share some of those ideas with you thank you Jay and Andrew in this course you will see an overview of how language models evolved into the Transformer architecture focusing on language representation you will see early representations where large sparse vectors simply Mark the presence of a word to the smaller dense contextual embeddings that represent the meaning of a word in the context of the sentences they are in you will also learn the meaning of this mysterious and much overused word embedding you will then explore how llm inputs are broken down into tokens which represent words word pieces before they are sent to the language model there are several popular tokenizers and you will see how they differ you will also learn how llms map each token to an embedding Vector you'll then take a closer look at the components of the llm architecture and learn how decoder only llms generate outputs you will learn the details of the Transformer block and how it has evolved in the years since the original paper was released you'll explore an implementation of recent models in the huging face Transformers library and after finishing this course you understand how LMS work in great depth and you have intuitions to help how you approach building applications with LMS I hope you enjoy the course [Music]

Original Description

Enroll for free now: https://bit.ly/4aRnn7Z Github Repo: https://github.com/HandsOnLLM/Hands-On-Large-Language-Models We're ecstatic to bring you "How Transformer LLMs Work" -- a free course with ~90 minutes of video, code, and crisp visuals and animations that explain the modern Transformer architecture, tokenizers, embeddings, and mixture-of-expert models. @MaartenGrootendorst and I have developed a lot of the visual language over the last several years (tens of thousands of iterations for hundreds of figures) for the book. This was informed by many incredible colleagues at Cohere, C4AI, and the open source and open science ML community. But to have an opportunity to collaborate with the legendary Andrew Ng and the team at @Deeplearningai we took them to the next level with animations and a concise narrative meant to enable technical learners to pick up an ML paper and understand the architecture description. In this course, you'll learn how a transformer network architecture that powers LLMs works. You'll build the intuition of how LLMs process text and work with code examples that illustrate the key components of the transformer architecture. Key topics covered in this course include: The evolution of how language has been represented numerically, from the Bag-of-Words model through Word2Vec embeddings to the transformer architecture that captures word meanings in full context. How LLM inputs are broken down into tokens, which represent words or pieces before they are sent to the language model. The details of a transformer and the three main stages, consisting of tokenization and embedding, the stack of transformer blocks, and the language model head. The details of the transformer block, including attention, which calculates relevance scores followed by the feedforward layer, which incorporates stored information learned in training. How cached calculations make transformers faster, how the transformer block has evolved over the years since the original p
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Jay Alammar · Jay Alammar · 38 of 38

← Previous Next →
1 Jay's Visual Intro to AI
Jay's Visual Intro to AI
Jay Alammar
2 Making Money from AI by Predicting Sales - Jay's Intro to AI Part 2
Making Money from AI by Predicting Sales - Jay's Intro to AI Part 2
Jay Alammar
3 How GPT3 Works - Easily Explained with Animations
How GPT3 Works - Easily Explained with Animations
Jay Alammar
4 The Narrated Transformer Language Model
The Narrated Transformer Language Model
Jay Alammar
5 My Visualization Tools (my Apple Keynote setup for visualizations and animations)
My Visualization Tools (my Apple Keynote setup for visualizations and animations)
Jay Alammar
6 Explainable AI Cheat Sheet - Five Key Categories
Explainable AI Cheat Sheet - Five Key Categories
Jay Alammar
7 The Unreasonable Effectiveness of RNNs (Article and Visualization Commentary) [2015 article]
The Unreasonable Effectiveness of RNNs (Article and Visualization Commentary) [2015 article]
Jay Alammar
8 Neural Activations & Dataset Examples
Neural Activations & Dataset Examples
Jay Alammar
9 Up and Down the Ladder of Abstraction [interactive article by Bret Victor, 2011]
Up and Down the Ladder of Abstraction [interactive article by Bret Victor, 2011]
Jay Alammar
10 Probing Classifiers: A Gentle Intro (Explainable AI for Deep Learning)
Probing Classifiers: A Gentle Intro (Explainable AI for Deep Learning)
Jay Alammar
11 Inspecting Neural Networks with CCA - A Gentle Intro (Explainable AI for Deep Learning)
Inspecting Neural Networks with CCA - A Gentle Intro (Explainable AI for Deep Learning)
Jay Alammar
12 Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
Jay Alammar
13 Behavioral Testing of ML Models (Unit tests for machine learning)
Behavioral Testing of ML Models (Unit tests for machine learning)
Jay Alammar
14 Favorite AI/ML Books: Intro to ML with Python (Book Review)
Favorite AI/ML Books: Intro to ML with Python (Book Review)
Jay Alammar
15 Favorite Python Books: Effective Python
Favorite Python Books: Effective Python
Jay Alammar
16 Favorite Stats Books: Seven Pillars of Statistical Wisdom
Favorite Stats Books: Seven Pillars of Statistical Wisdom
Jay Alammar
17 Understanding Animal Languages - Seeing Voices 2
Understanding Animal Languages - Seeing Voices 2
Jay Alammar
18 How digital assistants like Siri work #shorts
How digital assistants like Siri work #shorts
Jay Alammar
19 Writing Code in Jupyter Notebooks #shorts
Writing Code in Jupyter Notebooks #shorts
Jay Alammar
20 Experience Grounds Language: Improving language models beyond the world of text
Experience Grounds Language: Improving language models beyond the world of text
Jay Alammar
21 pandas for data science in python #shorts
pandas for data science in python #shorts
Jay Alammar
22 The Illustrated Retrieval Transformer
The Illustrated Retrieval Transformer
Jay Alammar
23 AI Image Generation is MIND BLOWING! #shorts
AI Image Generation is MIND BLOWING! #shorts
Jay Alammar
24 A Generalist Agent (Gato) - DeepMind's single model learns 600 tasks
A Generalist Agent (Gato) - DeepMind's single model learns 600 tasks
Jay Alammar
25 The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
Jay Alammar
26 AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)
AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)
Jay Alammar
27 What is Generative AI? 4 Important Things to Know (about ChatGPT, MidJourney, Cohere & future AIs)
What is Generative AI? 4 Important Things to Know (about ChatGPT, MidJourney, Cohere & future AIs)
Jay Alammar
28 AI is Eating The World - This is Where YOU Can Use it to Compete (AI Product Moats)
AI is Eating The World - This is Where YOU Can Use it to Compete (AI Product Moats)
Jay Alammar
29 What is LangChain? Where does it fit with LLMs like ChatGPT and Cohere? #shorts
What is LangChain? Where does it fit with LLMs like ChatGPT and Cohere? #shorts
Jay Alammar
30 Are language models with more parameters better? #shorts #chatgpt
Are language models with more parameters better? #shorts #chatgpt
Jay Alammar
31 How to manage LLM prompts with tools like LangChain #languagemodels #chatgpt
How to manage LLM prompts with tools like LangChain #languagemodels #chatgpt
Jay Alammar
32 What is Llama Index? how does it help in building LLM applications? #languagemodels #chatgpt
What is Llama Index? how does it help in building LLM applications? #languagemodels #chatgpt
Jay Alammar
33 prompt chains are important for building large language model applications
prompt chains are important for building large language model applications
Jay Alammar
34 ChatGPT has Never Seen a SINGLE Word (Despite Reading Most of The Internet). Meet LLM Tokenizers.
ChatGPT has Never Seen a SINGLE Word (Despite Reading Most of The Internet). Meet LLM Tokenizers.
Jay Alammar
35 What makes LLM tokenizers different from each other? GPT4 vs. FlanT5 Vs. Starcoder Vs. BERT and more
What makes LLM tokenizers different from each other? GPT4 vs. FlanT5 Vs. Starcoder Vs. BERT and more
Jay Alammar
36 Building LLM Agents with Tool Use
Building LLM Agents with Tool Use
Jay Alammar
37 SWE-Bench authors reflect on the state of LLM agents at Neurips 2024
SWE-Bench authors reflect on the state of LLM agents at Neurips 2024
Jay Alammar
Learn how ChatGPT and DeepSeek models work: How Transformer LLMs Work [Free Course]
Learn how ChatGPT and DeepSeek models work: How Transformer LLMs Work [Free Course]
Jay Alammar

This course teaches the fundamentals of Transformer LLMs, including tokenization, embeddings, and the transformer architecture, with a focus on building intuition and working with code examples.

Key Takeaways
  1. Enroll in the free course
  2. Watch the video lectures
  3. Work with code examples of transformer architecture
  4. Build intuition of how LLMs process text
  5. Design and implement transformer-based LLMs
💡 The transformer architecture is a key component of modern LLMs, and understanding how it works is crucial for building and optimizing these models.

Related Reads

📰
How I Stopped Fighting Hallucinations in LLM Data Extraction
Learn to stop fighting hallucinations in LLM data extraction and improve your data quality
Dev.to · zhongqiyue
📰
Anthropic’s Claude Sonnet 5 Is “Near-Opus Intelligence” For All Plans via @sejournal, @martinibuster
Anthropic's Claude Sonnet 5 model offers near-opus intelligence for all plans, including the free tier, with introductory pricing on tokens
Search Engine Journal
📰
Understanding How LLMs Work: From Text to Tokens, Embeddings, Transformers, and Predictions
Learn how Large Language Models (LLMs) process text into tokens, embeddings, and predictions, and why understanding their inner workings matters for AI applications
Dev.to · Klinsmann R
📰
How ChatGPT Understands Your Questions: A Beginner-Friendly Guide
Learn how ChatGPT understands your questions and improves its responses with fine-tuning and context understanding
Dev.to · Shreyas Rasaikar
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →