WizardLM: Evolving Instruction Datasets to Create a Better Model

Sam Witteveen · Intermediate ·📄 Research Papers Explained ·3y ago

Skills: LLM Engineering90%Fine-tuning LLMs85%Reading ML Papers80%

Key Takeaways

The WizardLM project introduces a new way to distill a dataset for fine-tuning LLMs, going beyond just a new model. The paper and accompanying resources, including a Colab notebook and GitHub repository, provide a comprehensive overview of the project.

Full Transcript

all right in this video I'm going to be looking at wizard LM so this is a new academic paper that also includes a model which most people are just talking about the model and also includes a data set as well and the thing I'm going to say actually is that for me the model itself is probably not the most interesting thing about this project so this comes out of Microsoft I think it's from Microsoft research Asia I'm not 100 sure about that but I think so what they've basically done here is a really interesting idea building on where alpaca was going so if you think of alpaca started out with 175 written human examples and then distilled from gpt3 that amount up to 52 000. and it turned out that in that data set there were lots of Errors there were quite a number of bad responses that kind of thing and that that's why people went through manually and looked at it and created the cleaned alpaca data set so one of the things and then this also led to people basically training more of these models on distilled data sets so a distilled data set if you're not familiar with the term is something that's basically been taken from another large language model and is not something that's been created uh by humans usually people are using open AI usually people are using either the gbt3 model or the chat GPT model I think we're starting to see some people look at using the gpt4 model for this now there's a whole controversy around whether that is legal to do or when is that legal to do there are you know moral issues of where some people say that okay they scraped the internet so of course whatever they've got it's hard to defend legally I don't want to get involved in that what I want to look at in in this particular thing is this idea of how do we distill a better data set than actually what came from alpaca and then if we distill a a data set how does that relate to actually training up a model that's going to be better as well so if you're just here for the model look at the bottom you can Skip Along to the collab and I'll you know walk you through the model and stuff in a bit I the idea with the paper is really interesting so I they've released a data set and they've released a model and this current models and data set is a 7 billion parameter model trained with a 70 000 examples from what they're calling evolved instructions the cool thing that they're working on is that they're working on another version of this which is has three hundred thousand instructions so my guess is pretty soon we will see either a better wizard lm2 7 billion or perhaps even a 13 million or bigger model so let's look at the paper and look at what this actually does with the alpaca idea they took a human written instruction and then spun variations out on that here what they're doing is that they're basically taking a core data set similar kind of idea but then what they do is they evolve these instructions so they start out with a very simple kind of instruction and this is one of the key things that they point out about things like share GPT and alpaca is that the instructions on the whole are very simple instructions and because you're just training and fine-tuning on simple instructions you never actually give the model enough sort of hard instructions so that it won't be able to do those when it just comes to inference time so their goal is that you start out with some sort of simple instruction and then they actually just use a random process through the prompt and we'll look at the prompt in a minute to develop these this sort of a simple initial instruction into something that's going to be much more complicated and that could be going in a variety of different directions as they're showing in this diagram here and then when they get something that's more complicated they will store that and keep that but then they'll often also go okay let's make a more complicated version of the complicated version kind of thing and so they raise these up through degrees of difficulty as they go along and the idea is that they're getting both simple instructions right through to very complicated instructions and when they do the the actual fine tuning they want to have a nice mix of these so that you're not just training on sort of 80 really simple instructions and 20 really hard instructions that you've got simple media um right through up I think they go through 10 levels of difficulty in in here and you can see that the way that they talk about this um is that this idea of evolving or mutation now they have checks in there to see that if it evolves in a way that doesn't make sense or something like that they then don't accept that they're just going for the idea of it evolving and getting uh to something more complicated so if we look at the diagram here they will basically come up with initial instruction they will then generate out some different instructions I guess this is what the wizard is supposed to be doing here and we can we'll see in a sec that the wizard if this is what the wizard is doing it's just a prompt that does this if they find they basically have an instructional Eliminator if they find something wrong with an instruction they take it out otherwise they pass it through and they store it into the instruction pool so let's look at how they're actually doing this so they talk about the primary purpose of the in-depth evolving is to make currently given instructions more complex and increase the their difficulty level and and we can see if we look at this prompt so here they talk about the prompts they're very careful that they don't make it too complicated too quickly so they always want to push it just we limit each evolving to be a bit harder and restrict adding a maximum of 10 to 20 words so the idea is that each instruction should be slightly be harder but not too hard so they're getting these levels of difficulty or the degrees of difficulty as they go through the prompt template is as follows so they have this prompt template and then they just randomly change out some of the prompt here so you can see the idea is that I want you to act as a promptory writer your objective is to rewrite a given prompt into a more complex version to make those famous AI systems I.E chat GPT and gbg4 a bit harder to handle but the Rewritten prompt must be reasonable and it must be understood and responded by by humans so they've got a whole set of prompt thing going on here and you can see these bits where for certain things they will add a constraint or a requirement for certain things they'll have different ways that they can mutate it and change it and so this is an interesting idea so this this idea of evolving and mutating things and then picking uh the best ones uh I haven't seen it applied to prompts directly before but this is it's something that's been used in machine learning in the past that's for sure so this is what's going on in here I and from this they then go on to train up and they've got all their prompts in here it's really nice paper actually for seeing what they've done and for them explaining what what they've done in there so they've got you know these things going on like I said they've released a a data set so they've got this data set here and you could go and train your own model on this data set already and they're already working on 300 000 approximately 300K which is going to be the full evolved instructions and they're planning to train another model based on that so that's something that's interesting to to look at all right so they've released a demo if you want to try this out without having to run it yourself come in here try this out see how it goes for you the model I'll just give you the tldr thing the model is definitely very good I it's not uh perhaps as good as the the stable vikuna model that just came out but that model is almost twice as big as this so I wouldn't expect that this would necessarily contend with that but you could imagine that a version of The Wizard LM so it's a llama model trained on 13 billion parameters on this 300K data set is going to be pretty good and probably perhaps even better than the the stable uh vikuna model with the RL HF so this is definitely a nice interesting alternative to using RL HF for this so I've got a code lab here this is I've set it up the same as my other ones I've put in some so filtering stuff that I basically was using from the stable vacuna in here and we can come through and just look at the standard outputs on the whole I think that the standard outputs are actually very good that this model is doing when we ask at the questions about the the llamas raccoonas it's it's on top of that it does make mistakes and these mistakes I think can be attributed a lot of them attributed to the model size that perhaps a bigger model would do better for this so here we can see that we've asked it about gbt4 it's replying about gpt3 so that's not a good sort of sign the simple questions like you know what is the capital of England yes it certainly knows that there's no you know no problems with facts like that writing the stories I feel like I it's perhaps not as good as some of the other models that are have come along like koala or something but again that was trained on with extra data sets that relate to stories and poems so that sort of benefited I think from that unfortunately I know for a lot of people are going to find that it it doesn't a good job where if you don't like the smai language model it tends to say that quite a bit what I found so in this case the stable vacuuming model was doing much better in its answers that we didn't get that as a AI language model in here now this is probably because their instructions have been distilled from Chachi BT where it says this a lot so you could imagine that once the 300K version is out you could filter that data set to remove all the as an AI language model responses and be able to get a better more open data set for doing fine tuning for something like this okay it's logic and reasoning was not very good here so we can see here that this is the same question that we asked before where you've got 23 apples you use 20 you buy six more should be nine unfortunately here it's saying that it's three so again this is partly due to the size of the model I think although it could be that they've just not fine-tuned on this and my guess is on on the evolved instructions you're going to find that there are probably quite a number of errors in there just from the way that they've done it I don't know that I'm just guessing at that but from when you're spinning up and Distilling large data sets like this later on you tend to find that okay actually there were mistakes in this the alpaca data set being the classic one there that said this was a question that the stable de vacuna got wrong which was can I write a Haiku in a single tweet and this one says yes you can and actually goes ahead and writes a haiku there think that's Haiku I'm not sure the number of syllables Etc when we ask it the hypothetical questions it doesn't do a great job of these Ken Jeffrey Hinton have a conversation with George Washington it really doesn't get the concept of this question so it's answering I can provide information questions however I it's not possible to physically bring together to individuals who are not alive again this would be interesting to see have any of these sorts of things been mixed it what's in the data set that's like that I so this kind of data set is a very academic data set and so it would be interesting to see something like a wizard LM or a stable vacuna trained with some of the academic stuff as well as the more you know distilled sort of data sets or rlhf data sets too and they're going to do very well if we ask it can Jeffrey Hinton have dinner with Harry Potter I it doesn't really get this at all it appears to be hypothetical or fan fiction type of question that is not based in reality that's true I guess in in many ways it's answer actually is accurate but it's certainly it doesn't want to it doesn't propose a nice simple answer for this when I ask at some facts about Marcus Aurelius these were things I did with vacuna it does quite a good job on this first one asking about three facts it actually gets his son where a stable kuna got this one wrong and it then also gets the other questions about him you know right as well so it does a good job on both of those final question since it was The Wizard LM I asked her tell me about Harry Potter and studying Hogwarts and we got the as I an AI assistant I can provide you with information on the fictional World of Harry Potter and then that it gives us some stuff from the thing which is actually not bad anyway have a play with the model the data set I think is something that's very interesting my guess is that in the not too distant future we're going to see a wizard lm2 which is probably going to be uh better than this and we may even see a 13 billion version of this which is going to be a lot better than this so stay tuned for those things as always if you've got questions please put them in the comments below if you found the video useful please click like And subscribe I will talk to you in the next video bye for now

Original Description

Colab: https://colab.research.google.com/drive/1H308Mj11PTMCUm_TxTj8DF189ujDG_1w?usp=sharing Demo: https://261f01fdd31bfe1ca0.gradio.live/ Github: https://github.com/nlpxucan/WizardLM/tree/main Paper: https://arxiv.org/abs/2304.12244 In this paper I look at the WizardLM project which goes beyond just a new model and introduces a new way to distill a dataset for fine tuning. For more tutorials on using LLMs and building Agents, check out my Patreon: Patreon: https://www.patreon.com/SamWitteveen Twitter: https://twitter.com/Sam_Witteveen My Links: Linkedin: https://www.linkedin.com/in/samwitteveen/ Github: https://github.com/samwit/langchain-tutorials https://github.com/samwit/llm-tutorials 00:00 Intro 02:57 Paper 08:32 Colab Walkthrough

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sam Witteveen · Sam Witteveen · 49 of 60

← Previous Next →

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #2 Tools and Chains

LangChain Basics Tutorial #2 Tools and Chains

ChatGPT API Announcement & Code Walkthrough with LangChain

ChatGPT API Announcement & Code Walkthrough with LangChain

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain Chat with Flan20B

LangChain Chat with Flan20B

LangChain - Using Hugging Face Models locally (code walkthrough)

LangChain - Using Hugging Face Models locally (code walkthrough)

PAL : Program-aided Language Models with LangChain code

PAL : Program-aided Language Models with LangChain code

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 2

Building a Summarization System with LangChain and GPT-3 - Part 2

Microsoft's Visual ChatGPT using LangChain

Microsoft's Visual ChatGPT using LangChain

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

LangChain Agents - Joining Tools and Chains with Decisions

LangChain Agents - Joining Tools and Chains with Decisions

Investigating Alpaca 7B - Finetuned LLaMa LLM

Investigating Alpaca 7B - Finetuned LLaMa LLM

Comparing LLMs with LangChain

Comparing LLMs with LangChain

Running Alpaca7B in Colab

Running Alpaca7B in Colab

How to finetune your own Alpaca 7B

How to finetune your own Alpaca 7B

How to make a custom dataset like Alpaca7B

How to make a custom dataset like Alpaca7B

Understanding Constitutional AI - the paper and key concepts

Understanding Constitutional AI - the paper and key concepts

Using Constitutional AI in LangChain

Using Constitutional AI in LangChain

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Text-to-video-synthesis with Diffusers and Colab

Text-to-video-synthesis with Diffusers and Colab

Meet Dolly the new Alpaca model

Meet Dolly the new Alpaca model

Checking out the Cerebras-GPT family of models

Checking out the Cerebras-GPT family of models

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

Is GPT4All your new personal ChatGPT?

Is GPT4All your new personal ChatGPT?

Raven - RWKV-7B RNN's LLM Strikes Back

Raven - RWKV-7B RNN's LLM Strikes Back

Talk to your CSV & Excel with LangChain

Talk to your CSV & Excel with LangChain

Vicuna - 90% of ChatGPT quality by using a new dataset?

Vicuna - 90% of ChatGPT quality by using a new dataset?

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

Auto-GPT - How to Automate a Task Based AI with GPT-4

Auto-GPT - How to Automate a Task Based AI with GPT-4

Improve your BabyAGI with LangChain

Improve your BabyAGI with LangChain

Generative Agents - Deep Dive and GPT-4 Recreation

Generative Agents - Deep Dive and GPT-4 Recreation

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Red Pajama - Operation: Freeing LLaMA

Red Pajama - Operation: Freeing LLaMA

Investigating Open Assistant - Models, Datasets and Addons

Investigating Open Assistant - Models, Datasets and Addons

Investigating MiniGPT-4 - The Secret behind GPT-V?

Investigating MiniGPT-4 - The Secret behind GPT-V?

Stable LM 3B - The new tiny kid on the block.

Stable LM 3B - The new tiny kid on the block.

Bard can now code and put that code in Colab for you.

Bard can now code and put that code in Colab for you.

Checking out Bark: a Text to Speech system by Suno AI

Checking out Bark: a Text to Speech system by Suno AI

Fine-tuning LLMs with PEFT and LoRA

Fine-tuning LLMs with PEFT and LoRA

Master PDF Chat with LangChain - Your essential guide to queries on documents

Master PDF Chat with LangChain - Your essential guide to queries on documents

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

StableVicuna: The New King of Open ChatGPTs?

StableVicuna: The New King of Open ChatGPTs?

WizardLM: Evolving Instruction Datasets to Create a Better Model

WizardLM: Evolving Instruction Datasets to Create a Better Model

LaMini-LM - Mini Models Maxi Data!

LaMini-LM - Mini Models Maxi Data!

Finding the Best Free ChatGPT

Finding the Best Free ChatGPT

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

Transformers Agent - Is this Hugging Face's LangChain Competitor?

Transformers Agent - Is this Hugging Face's LangChain Competitor?

StarCoder - The LLM to make you a coding star?

StarCoder - The LLM to make you a coding star?

Testing Starcoder for Reasoning with PAL

Testing Starcoder for Reasoning with PAL

The New Wizards - Unfiltered & Unaligned

The New Wizards - Unfiltered & Unaligned

Camel + LangChain for Synthetic Data & Market Research

Camel + LangChain for Synthetic Data & Market Research

The WizardLM project introduces a new approach to fine-tuning LLMs by distilling instruction datasets. This video provides an overview of the project, including a walkthrough of the accompanying Colab notebook and GitHub repository.

Key Takeaways

Read the WizardLM paper
Explore the Colab notebook
Clone the GitHub repository
Fine-tune an LLM using the WizardLM approach
Evaluate the performance of the fine-tuned model

💡 The WizardLM project demonstrates the importance of dataset distillation in fine-tuning LLMs, highlighting the potential for improved model performance and efficiency.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Chapters (3)

Intro

2:57 Paper

8:32 Colab Walkthrough

Beyond Big Vendors: ERP Systems Explained #shorts

Digital Transformation with Eric Kimberling