LLM Foundations: Understanding Tokenization & Training: Chapter 4

Weights & Biases · Intermediate ·🧠 Large Language Models ·2y ago

Key Takeaways

The video covers LLM foundations, focusing on tokenization and training phases, including pre-training and supervised instruction tuning, with examples from GPT-4 and Lama models.

Full Transcript

[Music] welcome to module 2 in this module we want to understand how large language models work but first let's check out some use cases that llms enable llms can be used to generate text like marketing copy or emails they can answer questions translate documents and determine the sentiment of a text llms can summarize long documents they can act as personal assistants or chatbots we can use them to query tabular data interact with API or even evaluate other language models but what happens behind the scenes understanding llm architecture isn't necessary for building applications it's like driving a car you don't need to know how the engine works to drive still some technical details can be helpful looking at gp4 technical report we can read that gp4 one of the most known llms is a Transformer based model pre-trained to predict the next token in a document we won't dive and and try to understand the Transformer architecture that's not necessary for building Alm applications but we want to we want to focus on the second part of the statement which is predicting the next token in a document so here's how this works we start with some input text in our case weights and biases is then we tokenize the text we need to split it into tokens that are represented by numbers that we fit into the black box which is the llm then as an output of the llm we have a distribution of probabilities over the entire vocabulary all of the tokens that we have available for our model and each of these tokens comes with a probability that it comes as a next token in the sequence and based on those probabilities we pick we sample uh one of the tokens to follow to continue with the sequence in this case we select the token the because it has a high output probability then we append this token to our input sequence and we repeat the process we tokenize it we fit it into the llm and again we get a distribution of probabilities across our vocabulary all of the tokens and again we pick a token with high probability in this case let's pick machine and finally uh we again repeat this whole process and we sample the token learning and if we continue with this process we can predict we can sample the text weights and biases is the machine learning platform companies like open AI C here Mosaic or meta have already trained models for us and we use them behind apis which means we do not need to train these models to use them in our applications however knowing how they were trained can provide useful insights there are two main steps in training llms the first is pre-training where the model learns from a massive data set with sources like the entire internet such as common C C4 uh GitHub Wikipedia book books archive which are academic papers and Stock Exchange which is a set of questions and answers this pre-training data set has been published by meta uh that trained Lama model we don't know exactly the pre-training data set used for training GPT 4 but we can imagine it must have been something similar in this case in pre-training a model that has gone through this phase is pretty good in predicting texts such as found in this data set on the internet uh on on GitHub on Wikipedia and so on but this may not be enough we actually want this model to follow our instructions to respond to our questions and this is where the Second Step which is supervised instruction tuning uh can be helpful in this step the model is further trained with expert generated question answer Pairs and this helps align the model with user expectations and follow instructions some llms like gp4 undergo an additional phase reinforcement learning from Human feedback here the model is trained to optimize for higher quality answers preferred by human judges understanding these training phases can be helpful it can give us intuitions for example how to formulate a prompt in order to get the expected answer the expected output from the model in the next video we'll experiment with this concept in Jupiter notebook with code

Original Description

🤖 Discover LLM Core Techniques in Chapter 4: Join Darek Kleczek to explore tokenization, training phases, and real-world use cases of LLMs. 🧑🏾‍🎓 *Full course with certification and class materials available free at http://wandb.me/building-llm-powered-apps* 🏆 *Daily swag draw* and grand prize Airpods draw from Dec 1 and 31, 2023. Details at http://wandb.me/llm-apps-contest 🗣️ Join the course conversation on our Discord channel at http://wandb.me/course-discord 🏫 This is chapter 4 of 27 in the Building LLM-Powered Apps course. *Episode Description* Embark on a journey to understand the intricacies of Large Language Models (LLMs) in Chapter 4 and the beginning of the second module of our free course, "Building LLM-Powered Apps," presented by Weights & Biases. Join our knowledgeable machine learning engineer, Darek Kleczek, as he sheds light on the inner workings and diverse applications of LLMs. 🌟 *Chapter Highlights* -Unveiling LLM Use Cases: Explore the myriad of applications where algorithms and LLMs can generate text. -Peek Behind the Scenes: While you don't need to be an LLM expert to build applications, Darek provides a glimpse behind the scenes. -Tokenization in Action: Learn the crucial process of tokenization, breaking down input text into numerical tokens that can be processed by the LLM. -Training Phases: Gain insights into LLMs' two main training phases – pre-training and supervised instruction tuning. -Reinforcement Learning: Explore additional phases, like reinforcement learning from human feedback, where LLMs like GPT-4 optimize for higher quality answers preferred by human judges. 🎓 *Enroll for Free:* Join us on this educational journey to master the art of building LLM-powered applications. Enroll at http://wandb.me/building-llm-powered-apps. 👉 *Next Chapter Sneak Peek:* Get ready for a hands-on experience in our next chapter, where we'll delve into practical experiments with LLMs in a Jupyter Notebook.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 0 of 60

← Previous Next →
1 0. What is machine learning?
0. What is machine learning?
Weights & Biases
2 1. Build Your First Machine Learning Model
1. Build Your First Machine Learning Model
Weights & Biases
3 Intro to ML: Course Overview
Intro to ML: Course Overview
Weights & Biases
4 2. Multi-Layer Perceptrons
2. Multi-Layer Perceptrons
Weights & Biases
5 3. Convolutional Neural Networks
3. Convolutional Neural Networks
Weights & Biases
6 Weights & Biases at OpenAI
Weights & Biases at OpenAI
Weights & Biases
7 Why Experiment Tracking is Crucial to OpenAI
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
8 4. Autoencoders
4. Autoencoders
Weights & Biases
9 5. Sentiment Analysis
5. Sentiment Analysis
Weights & Biases
10 6. Recurrent Neural Networks [RNNs]
6. Recurrent Neural Networks [RNNs]
Weights & Biases
11 7. Text Generation using LSTMs and GRUs
7. Text Generation using LSTMs and GRUs
Weights & Biases
12 8. Text Classification Using Convolutional Neural Networks
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
13 9. Hybrid LSTMs [Long Short-Term Memory]
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
14 Toyota Research Institute on Experiment Tracking with Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
15 Weights and Biases - Developer Tools for Deep Learning
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
16 Introducing Weights & Biases
Introducing Weights & Biases
Weights & Biases
17 10. Seq2Seq Models
10. Seq2Seq Models
Weights & Biases
18 11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
19 12. One-shot learning for teaching neural networks to classify objects never seen before
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
20 13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
21 14. Data Augmentation | Keras
14. Data Augmentation | Keras
Weights & Biases
22 15. Batch Size and Learning Rate in CNNs
15. Batch Size and Learning Rate in CNNs
Weights & Biases
23 Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
24 Grading Rubric for AI Applications with Sergey Karayev  (2019)
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
25 16. Video Frame Prediction using CNNs and LSTMs (2019)
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
26 Image to LaTeX - Applied Deep Learning Fellowship (2019)
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
27 17.  Build and Deploy an Emotion Classifier (2019)
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
28 Applied Deep Learning - Data Management with Josh Tobin (2019)
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
29 Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
30 Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
31 Troubleshooting and Iterating ML Models with Lee Redden (2019)
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
32 Designing a Machine Learning Project with Neal Khosla (2019)
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
33 Lukas Beiwald on ML Tools and Experiment Management (2019)
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
34 Building Machine Learning Teams with Josh Tobin (2019)
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
35 Pieter Abeel on Potential Deep Learning Research Directions  (2019)
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
36 Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
37 Five Lessons for Team-Oriented Research with Peter Welder (2019)
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
38 Applied Deep Learning - Rosanne Liu on AI Research (2019)
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
39 Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
40 Organizing ML projects — W&B walkthrough (2020)
Organizing ML projects — W&B walkthrough (2020)
Weights & Biases
41 Brandon Rohrer — Machine Learning in Production for Robots
Brandon Rohrer — Machine Learning in Production for Robots
Weights & Biases
42 Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Weights & Biases
43 My experiments with Reinforcement Learning with Jariullah Safi
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
44 Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
45 Testing Machine Learning Models with Eric Schles
Testing Machine Learning Models with Eric Schles
Weights & Biases
46 How Linear Algebra is not like Algebra with Charles Frye
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
47 Predicting Protein Structures using Deep Learning with Jonathan King
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
48 Rachael Tatman — Conversational AI and Linguistics
Rachael Tatman — Conversational AI and Linguistics
Weights & Biases
49 Reformer by Han Lee
Reformer by Han Lee
Weights & Biases
50 Sequence Models with Pujaa Rajan
Sequence Models with Pujaa Rajan
Weights & Biases
51 GitHub Actions & Machine Learning Workflows with Hamel Husain
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
52 Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Weights & Biases
53 Jack Clark — Building Trustworthy AI Systems
Jack Clark — Building Trustworthy AI Systems
Weights & Biases
54 Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
55 Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
56 Antipatterns in open source research code with Jariullah Safi
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
57 Attention for time series forecasting & COVID predictions - Isaac Godfried
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
58 Made with ML - Goku Mohandas
Made with ML - Goku Mohandas
Weights & Biases
59 Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Weights & Biases
60 Deep Learning Salon by Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases

This video teaches the fundamentals of LLMs, including tokenization and training phases, and how they can be applied in real-world scenarios. Understanding these concepts is crucial for building effective LLM-powered applications. The video also provides insights into the training phases of LLMs, including pre-training and supervised instruction tuning.

Key Takeaways
  1. Tokenize input text
  2. Fit tokens into the LLM
  3. Get a distribution of probabilities over the vocabulary
  4. Sample a token with high probability
  5. Append the token to the input sequence
  6. Repeat the process
💡 Understanding the training phases of LLMs can provide useful insights into how to formulate effective prompts and optimize prompt engineering.

Related Reads

📰
How to Use Poe for Llm-Friendly Content Structure in 2026
Use Poe to structure content for search engines and AI-powered answer engines
Dev.to AI
📰
Kairos-4B: the open-source world model that just lapped the competition four times over
Learn about Kairos-4B, an open-source world model that surpasses competition four times over, and how it achieves real-time performance on edge devices
Medium · Machine Learning
📰
Google’s Open Knowledge Format (OKF): Is This the Beginning of the End for RAG?
Google's Open Knowledge Format (OKF) might enhance Retrieval-Augmented Generation (RAG) rather than replace it, and understanding OKF is crucial for professionals working with AI and knowledge management
Medium · Programming
📰
New AI tutor achieves 0.71-1.30 SD effect size in Dartmouth course [pdf]
Phosphor, an AI-powered learning platform, achieves significant learning gains by integrating LLM-graded formative assessments into instructional content, increasing student engagement and efficacy
Hacker News (AI)
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →