LLM Foundations: Understanding Tokenization & Training: Chapter 4

Weights & Biases · Intermediate ·🧠 Large Language Models ·2y ago

Skills: LLM Foundations90%Prompt Craft60%

Key Takeaways

The video covers LLM foundations, focusing on tokenization and training phases, including pre-training and supervised instruction tuning, with examples from GPT-4 and Lama models.

Full Transcript

[Music] welcome to module 2 in this module we want to understand how large language models work but first let's check out some use cases that llms enable llms can be used to generate text like marketing copy or emails they can answer questions translate documents and determine the sentiment of a text llms can summarize long documents they can act as personal assistants or chatbots we can use them to query tabular data interact with API or even evaluate other language models but what happens behind the scenes understanding llm architecture isn't necessary for building applications it's like driving a car you don't need to know how the engine works to drive still some technical details can be helpful looking at gp4 technical report we can read that gp4 one of the most known llms is a Transformer based model pre-trained to predict the next token in a document we won't dive and and try to understand the Transformer architecture that's not necessary for building Alm applications but we want to we want to focus on the second part of the statement which is predicting the next token in a document so here's how this works we start with some input text in our case weights and biases is then we tokenize the text we need to split it into tokens that are represented by numbers that we fit into the black box which is the llm then as an output of the llm we have a distribution of probabilities over the entire vocabulary all of the tokens that we have available for our model and each of these tokens comes with a probability that it comes as a next token in the sequence and based on those probabilities we pick we sample uh one of the tokens to follow to continue with the sequence in this case we select the token the because it has a high output probability then we append this token to our input sequence and we repeat the process we tokenize it we fit it into the llm and again we get a distribution of probabilities across our vocabulary all of the tokens and again we pick a token with high probability in this case let's pick machine and finally uh we again repeat this whole process and we sample the token learning and if we continue with this process we can predict we can sample the text weights and biases is the machine learning platform companies like open AI C here Mosaic or meta have already trained models for us and we use them behind apis which means we do not need to train these models to use them in our applications however knowing how they were trained can provide useful insights there are two main steps in training llms the first is pre-training where the model learns from a massive data set with sources like the entire internet such as common C C4 uh GitHub Wikipedia book books archive which are academic papers and Stock Exchange which is a set of questions and answers this pre-training data set has been published by meta uh that trained Lama model we don't know exactly the pre-training data set used for training GPT 4 but we can imagine it must have been something similar in this case in pre-training a model that has gone through this phase is pretty good in predicting texts such as found in this data set on the internet uh on on GitHub on Wikipedia and so on but this may not be enough we actually want this model to follow our instructions to respond to our questions and this is where the Second Step which is supervised instruction tuning uh can be helpful in this step the model is further trained with expert generated question answer Pairs and this helps align the model with user expectations and follow instructions some llms like gp4 undergo an additional phase reinforcement learning from Human feedback here the model is trained to optimize for higher quality answers preferred by human judges understanding these training phases can be helpful it can give us intuitions for example how to formulate a prompt in order to get the expected answer the expected output from the model in the next video we'll experiment with this concept in Jupiter notebook with code

Original Description

🤖 Discover LLM Core Techniques in Chapter 4: Join Darek Kleczek to explore tokenization, training phases, and real-world use cases of LLMs. 🧑🏾‍🎓 *Full course with certification and class materials available free at http://wandb.me/building-llm-powered-apps* 🏆 *Daily swag draw* and grand prize Airpods draw from Dec 1 and 31, 2023. Details at http://wandb.me/llm-apps-contest 🗣️ Join the course conversation on our Discord channel at http://wandb.me/course-discord 🏫 This is chapter 4 of 27 in the Building LLM-Powered Apps course. *Episode Description* Embark on a journey to understand the intricacies of Large Language Models (LLMs) in Chapter 4 and the beginning of the second module of our free course, "Building LLM-Powered Apps," presented by Weights & Biases. Join our knowledgeable machine learning engineer, Darek Kleczek, as he sheds light on the inner workings and diverse applications of LLMs. 🌟 *Chapter Highlights* -Unveiling LLM Use Cases: Explore the myriad of applications where algorithms and LLMs can generate text. -Peek Behind the Scenes: While you don't need to be an LLM expert to build applications, Darek provides a glimpse behind the scenes. -Tokenization in Action: Learn the crucial process of tokenization, breaking down input text into numerical tokens that can be processed by the LLM. -Training Phases: Gain insights into LLMs' two main training phases – pre-training and supervised instruction tuning. -Reinforcement Learning: Explore additional phases, like reinforcement learning from human feedback, where LLMs like GPT-4 optimize for higher quality answers preferred by human judges. 🎓 *Enroll for Free:* Join us on this educational journey to master the art of building LLM-powered applications. Enroll at http://wandb.me/building-llm-powered-apps. 👉 *Next Chapter Sneak Peek:* Get ready for a hands-on experience in our next chapter, where we'll delve into practical experiments with LLMs in a Jupyter Notebook.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 0 of 60

← Previous Next →

0. What is machine learning?

0. What is machine learning?

Weights & Biases

1. Build Your First Machine Learning Model

1. Build Your First Machine Learning Model

Weights & Biases

Intro to ML: Course Overview

Intro to ML: Course Overview

Weights & Biases

2. Multi-Layer Perceptrons

2. Multi-Layer Perceptrons

Weights & Biases

3. Convolutional Neural Networks

3. Convolutional Neural Networks

Weights & Biases

Weights & Biases at OpenAI

Weights & Biases at OpenAI

Weights & Biases

Why Experiment Tracking is Crucial to OpenAI

Why Experiment Tracking is Crucial to OpenAI

Weights & Biases

4. Autoencoders

4. Autoencoders

Weights & Biases

5. Sentiment Analysis

5. Sentiment Analysis

Weights & Biases

6. Recurrent Neural Networks [RNNs]

6. Recurrent Neural Networks [RNNs]

Weights & Biases

7. Text Generation using LSTMs and GRUs

7. Text Generation using LSTMs and GRUs

Weights & Biases

8. Text Classification Using Convolutional Neural Networks

8. Text Classification Using Convolutional Neural Networks

Weights & Biases

9. Hybrid LSTMs [Long Short-Term Memory]

9. Hybrid LSTMs [Long Short-Term Memory]

Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Weights & Biases

Weights and Biases - Developer Tools for Deep Learning

Weights and Biases - Developer Tools for Deep Learning

Weights & Biases

Introducing Weights & Biases

Introducing Weights & Biases

Weights & Biases

10. Seq2Seq Models

10. Seq2Seq Models

Weights & Biases

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

Weights & Biases

12. One-shot learning for teaching neural networks to classify objects never seen before

12. One-shot learning for teaching neural networks to classify objects never seen before

Weights & Biases

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

Weights & Biases

14. Data Augmentation | Keras

14. Data Augmentation | Keras

Weights & Biases

15. Batch Size and Learning Rate in CNNs

15. Batch Size and Learning Rate in CNNs

Weights & Biases

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Weights & Biases

Grading Rubric for AI Applications with Sergey Karayev (2019)

Grading Rubric for AI Applications with Sergey Karayev (2019)

Weights & Biases

16. Video Frame Prediction using CNNs and LSTMs (2019)

16. Video Frame Prediction using CNNs and LSTMs (2019)

Weights & Biases

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Weights & Biases

17. Build and Deploy an Emotion Classifier (2019)

17. Build and Deploy an Emotion Classifier (2019)

Weights & Biases

Applied Deep Learning - Data Management with Josh Tobin (2019)

Applied Deep Learning - Data Management with Josh Tobin (2019)

Weights & Biases

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Weights & Biases

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Weights & Biases

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Weights & Biases

Designing a Machine Learning Project with Neal Khosla (2019)

Designing a Machine Learning Project with Neal Khosla (2019)

Weights & Biases

Lukas Beiwald on ML Tools and Experiment Management (2019)

Lukas Beiwald on ML Tools and Experiment Management (2019)

Weights & Biases

Building Machine Learning Teams with Josh Tobin (2019)

Building Machine Learning Teams with Josh Tobin (2019)

Weights & Biases

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Weights & Biases

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Weights & Biases

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Weights & Biases

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Weights & Biases

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Weights & Biases

Organizing ML projects — W&B walkthrough (2020)

Organizing ML projects — W&B walkthrough (2020)

Weights & Biases

Brandon Rohrer — Machine Learning in Production for Robots

Brandon Rohrer — Machine Learning in Production for Robots

Weights & Biases

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Weights & Biases

My experiments with Reinforcement Learning with Jariullah Safi

My experiments with Reinforcement Learning with Jariullah Safi

Weights & Biases

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Weights & Biases

Testing Machine Learning Models with Eric Schles

Testing Machine Learning Models with Eric Schles

Weights & Biases

How Linear Algebra is not like Algebra with Charles Frye

How Linear Algebra is not like Algebra with Charles Frye

Weights & Biases

Predicting Protein Structures using Deep Learning with Jonathan King

Predicting Protein Structures using Deep Learning with Jonathan King

Weights & Biases

Rachael Tatman — Conversational AI and Linguistics

Rachael Tatman — Conversational AI and Linguistics

Weights & Biases

Reformer by Han Lee

Reformer by Han Lee

Weights & Biases

Sequence Models with Pujaa Rajan

Sequence Models with Pujaa Rajan

Weights & Biases

GitHub Actions & Machine Learning Workflows with Hamel Husain

GitHub Actions & Machine Learning Workflows with Hamel Husain

Weights & Biases

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Weights & Biases

Jack Clark — Building Trustworthy AI Systems

Jack Clark — Building Trustworthy AI Systems

Weights & Biases

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Weights & Biases

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Weights & Biases

Antipatterns in open source research code with Jariullah Safi

Antipatterns in open source research code with Jariullah Safi

Weights & Biases

Attention for time series forecasting & COVID predictions - Isaac Godfried

Attention for time series forecasting & COVID predictions - Isaac Godfried

Weights & Biases

Made with ML - Goku Mohandas

Made with ML - Goku Mohandas

Weights & Biases

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Weights & Biases

Deep Learning Salon by Weights & Biases

Deep Learning Salon by Weights & Biases

Weights & Biases

This video teaches the fundamentals of LLMs, including tokenization and training phases, and how they can be applied in real-world scenarios. Understanding these concepts is crucial for building effective LLM-powered applications. The video also provides insights into the training phases of LLMs, including pre-training and supervised instruction tuning.

Key Takeaways

Tokenize input text
Fit tokens into the LLM
Get a distribution of probabilities over the vocabulary
Sample a token with high probability
Append the token to the input sequence
Repeat the process

💡 Understanding the training phases of LLMs can provide useful insights into how to formulate effective prompts and optimize prompt engineering.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

How to Use Poe for Llm-Friendly Content Structure in 2026

Use Poe to structure content for search engines and AI-powered answer engines

Kairos-4B: the open-source world model that just lapped the competition four times over

Learn about Kairos-4B, an open-source world model that surpasses competition four times over, and how it achieves real-time performance on edge devices

Medium · Machine Learning

Google’s Open Knowledge Format (OKF): Is This the Beginning of the End for RAG?

Google's Open Knowledge Format (OKF) might enhance Retrieval-Augmented Generation (RAG) rather than replace it, and understanding OKF is crucial for professionals working with AI and knowledge management

Medium · Programming

New AI tutor achieves 0.71-1.30 SD effect size in Dartmouth course [pdf]

Phosphor, an AI-powered learning platform, achieves significant learning gains by integrating LLM-graded formative assessments into instructional content, increasing student engagement and efficacy

Hacker News (AI)

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)