LLM Foundations: Understanding Tokenization & Training: Chapter 4
Key Takeaways
The video covers LLM foundations, focusing on tokenization and training phases, including pre-training and supervised instruction tuning, with examples from GPT-4 and Lama models.
Full Transcript
[Music] welcome to module 2 in this module we want to understand how large language models work but first let's check out some use cases that llms enable llms can be used to generate text like marketing copy or emails they can answer questions translate documents and determine the sentiment of a text llms can summarize long documents they can act as personal assistants or chatbots we can use them to query tabular data interact with API or even evaluate other language models but what happens behind the scenes understanding llm architecture isn't necessary for building applications it's like driving a car you don't need to know how the engine works to drive still some technical details can be helpful looking at gp4 technical report we can read that gp4 one of the most known llms is a Transformer based model pre-trained to predict the next token in a document we won't dive and and try to understand the Transformer architecture that's not necessary for building Alm applications but we want to we want to focus on the second part of the statement which is predicting the next token in a document so here's how this works we start with some input text in our case weights and biases is then we tokenize the text we need to split it into tokens that are represented by numbers that we fit into the black box which is the llm then as an output of the llm we have a distribution of probabilities over the entire vocabulary all of the tokens that we have available for our model and each of these tokens comes with a probability that it comes as a next token in the sequence and based on those probabilities we pick we sample uh one of the tokens to follow to continue with the sequence in this case we select the token the because it has a high output probability then we append this token to our input sequence and we repeat the process we tokenize it we fit it into the llm and again we get a distribution of probabilities across our vocabulary all of the tokens and again we pick a token with high probability in this case let's pick machine and finally uh we again repeat this whole process and we sample the token learning and if we continue with this process we can predict we can sample the text weights and biases is the machine learning platform companies like open AI C here Mosaic or meta have already trained models for us and we use them behind apis which means we do not need to train these models to use them in our applications however knowing how they were trained can provide useful insights there are two main steps in training llms the first is pre-training where the model learns from a massive data set with sources like the entire internet such as common C C4 uh GitHub Wikipedia book books archive which are academic papers and Stock Exchange which is a set of questions and answers this pre-training data set has been published by meta uh that trained Lama model we don't know exactly the pre-training data set used for training GPT 4 but we can imagine it must have been something similar in this case in pre-training a model that has gone through this phase is pretty good in predicting texts such as found in this data set on the internet uh on on GitHub on Wikipedia and so on but this may not be enough we actually want this model to follow our instructions to respond to our questions and this is where the Second Step which is supervised instruction tuning uh can be helpful in this step the model is further trained with expert generated question answer Pairs and this helps align the model with user expectations and follow instructions some llms like gp4 undergo an additional phase reinforcement learning from Human feedback here the model is trained to optimize for higher quality answers preferred by human judges understanding these training phases can be helpful it can give us intuitions for example how to formulate a prompt in order to get the expected answer the expected output from the model in the next video we'll experiment with this concept in Jupiter notebook with code
Original Description
🤖 Discover LLM Core Techniques in Chapter 4: Join Darek Kleczek to explore tokenization, training phases, and real-world use cases of LLMs.
🧑🏾🎓 *Full course with certification and class materials available free at http://wandb.me/building-llm-powered-apps*
🏆 *Daily swag draw* and grand prize Airpods draw from Dec 1 and 31, 2023. Details at http://wandb.me/llm-apps-contest
🗣️ Join the course conversation on our Discord channel at http://wandb.me/course-discord
🏫 This is chapter 4 of 27 in the Building LLM-Powered Apps course.
*Episode Description*
Embark on a journey to understand the intricacies of Large Language Models (LLMs) in Chapter 4 and the beginning of the second module of our free course, "Building LLM-Powered Apps," presented by Weights & Biases. Join our knowledgeable machine learning engineer, Darek Kleczek, as he sheds light on the inner workings and diverse applications of LLMs.
🌟 *Chapter Highlights*
-Unveiling LLM Use Cases: Explore the myriad of applications where algorithms and LLMs can generate text.
-Peek Behind the Scenes: While you don't need to be an LLM expert to build applications, Darek provides a glimpse behind the scenes.
-Tokenization in Action: Learn the crucial process of tokenization, breaking down input text into numerical tokens that can be processed by the LLM.
-Training Phases: Gain insights into LLMs' two main training phases – pre-training and supervised instruction tuning.
-Reinforcement Learning: Explore additional phases, like reinforcement learning from human feedback, where LLMs like GPT-4 optimize for higher quality answers preferred by human judges.
🎓 *Enroll for Free:* Join us on this educational journey to master the art of building LLM-powered applications. Enroll at http://wandb.me/building-llm-powered-apps.
👉 *Next Chapter Sneak Peek:* Get ready for a hands-on experience in our next chapter, where we'll delve into practical experiments with LLMs in a Jupyter Notebook.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Weights & Biases · Weights & Biases · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
0. What is machine learning?
Weights & Biases
1. Build Your First Machine Learning Model
Weights & Biases
Intro to ML: Course Overview
Weights & Biases
2. Multi-Layer Perceptrons
Weights & Biases
3. Convolutional Neural Networks
Weights & Biases
Weights & Biases at OpenAI
Weights & Biases
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
4. Autoencoders
Weights & Biases
5. Sentiment Analysis
Weights & Biases
6. Recurrent Neural Networks [RNNs]
Weights & Biases
7. Text Generation using LSTMs and GRUs
Weights & Biases
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
Introducing Weights & Biases
Weights & Biases
10. Seq2Seq Models
Weights & Biases
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
14. Data Augmentation | Keras
Weights & Biases
15. Batch Size and Learning Rate in CNNs
Weights & Biases
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
Organizing ML projects — W&B walkthrough (2020)
Weights & Biases
Brandon Rohrer — Machine Learning in Production for Robots
Weights & Biases
Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Weights & Biases
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
Testing Machine Learning Models with Eric Schles
Weights & Biases
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
Rachael Tatman — Conversational AI and Linguistics
Weights & Biases
Reformer by Han Lee
Weights & Biases
Sequence Models with Pujaa Rajan
Weights & Biases
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Weights & Biases
Jack Clark — Building Trustworthy AI Systems
Weights & Biases
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
Made with ML - Goku Mohandas
Weights & Biases
Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases
More on: LLM Foundations
View skill →Related Reads
📰
📰
📰
📰
How to Use Poe for Llm-Friendly Content Structure in 2026
Dev.to AI
Kairos-4B: the open-source world model that just lapped the competition four times over
Medium · Machine Learning
Google’s Open Knowledge Format (OKF): Is This the Beginning of the End for RAG?
Medium · Programming
New AI tutor achieves 0.71-1.30 SD effect size in Dartmouth course [pdf]
Hacker News (AI)
🎓
Tutor Explanation
DeepCamp AI