Let's build GPT: from scratch, in code, spelled out.

Andrej Karpathy · Advanced ·🧠 Large Language Models ·3y ago
We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!) . I recommend people watch the earlier makemore videos to get comfortable with the autoregressive language modeling framework and basics of tensors and PyTorch nn, which we take for granted in this video. Links: - Google colab for the video: https://colab.research.google.com/drive/1JMLa53HDuA-i7ZBmqV7ZnA3c_fvtXnx-?usp=sharing - GitHub repo for the video: https://github.com/karpathy/ng-video-lecture - Playlist of the whole Zero to Hero series so far: https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ - nanoGPT repo: https://github.com/karpathy/nanoGPT - my website: https://karpathy.ai - my twitter: https://twitter.com/karpathy - our Discord channel: https://discord.gg/3zy8kqD9Cp Supplementary links: - Attention is All You Need paper: https://arxiv.org/abs/1706.03762 - OpenAI GPT-3 paper: https://arxiv.org/abs/2005.14165 - OpenAI ChatGPT blog post: https://openai.com/blog/chatgpt/ - The GPU I'm training the model on is from Lambda GPU Cloud, I think the best and easiest way to spin up an on-demand GPU instance in the cloud that you can ssh to: https://lambdalabs.com . If you prefer to work in notebooks, I think the easiest path today is Google Colab. Suggested exercises: - EX1: The n-dimensional tensor mastery challenge: Combine the `Head` and `MultiHeadAttention` into one class that processes all the heads in parallel, treating the heads as another batch dimension (answer is in nanoGPT). - EX2: Train the GPT on your own dataset of choice! What other data could be fun to blabber on about? (A fun advanced suggestion if you like: train a GPT to do addition of two numbers, i.e. a+b=c. You may find it helpful to predict the digits of c in reverse order, as the typica
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Stop Evaluating LLMs with “Vibe Checks”
Learn to evaluate LLMs effectively by building a decision-grade scorecard, moving beyond subjective 'vibe checks'
Towards Data Science
I gave the OpenAI SDK live web search by changing one line
Enable live web search in OpenAI SDK by modifying a single line of code, unlocking new possibilities for web-based applications
Dev.to · mv7
How I Made My Android App Discoverable on 4 LLMs in 24 Hours (llms.txt, IndexNow, JSON-LD, the Bing Cycle)
Make your Android app discoverable on 4 LLMs in 24 hours using llms.txt, IndexNow, JSON-LD, and the Bing Cycle
Dev.to · TAMSIV
What LLMs Can Actually Do for Your Business
Discover how LLMs can revolutionize your business by automating written content generation, improving email management, and enhancing overall productivity
Medium · AI
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →