Vicuna - 90% of ChatGPT quality by using a new dataset?

Sam Witteveen · Intermediate ·🧠 Large Language Models ·3y ago

Key Takeaways

The video discusses Vicuna, an open-source chatbot that achieves 90% of ChatGPT's quality by using a new dataset, and explores the controversy surrounding Google's use of the dataset to train their Bard model. Vicuna is fine-tuned on a dataset of conversations from ChatGPT and Share GPT, and is benchmarked using a unique approach involving GPT-4.

Full Transcript

so another new model that got released towards the end of last week was vikunya the word open source used very Loosely at the moment with these models but they're calling it an open source chat bot impressing gpg4 with 90 chat gbt quality so we'll talk about what that actually means in a second so this comes out from a group of people from a number of procedures institutions in America who've been working on these kinds of things the first thing to note about this so I'm going to talk a little bit about the model we'll have a look at the model and we'll look at there's a scandal around the data set for the model as well because it turns out the model is just like so many of the other ones that we've looked at already it's basically just fine-tuning a llama model so they take a a llama model and then fine tune it now in this case they've taken the bigger 13 billion llama model and they're fine-tuned it and what they're fine-tuned it on is pretty interesting so they're fine-tuned it on this data set of conversations that are taken from chat GPT and actually they're taken from a site called share GPT so if we look up what share gbt is this is a site where people can post different conversations and you can see see you know okay what actually the chat GPT conversation was Etc and it turns out up until recently this site had a huge number of conversations that were easy to access and that you could do searches for I think they had an explore page all this kind of stuff now that's all gone right and we'll talk about in a minute why it's gone but let's let's get back to avikunya so vicuna I think is an animal that's kind of similar to an alpaca Lam pretty soon they're going to run out of animals for names for these models but anyway this is another one like I said it's tuned from llama and they've benchmarked it using an interesting kind of way of looking at it so what they've done is they've basically taken something and done a generation and they do the generation with a llama model with an alpaca model with Bard and with chechi BT and they feed this into GPT 4 with this prompt and get it to generate a score for each of these so not surprisingly llama comes at the lowest by itself because it's not really tuned you know there's no real fine tuning done for instructions with that model or very little of that done so they find that alpaca does well but then fukuno does way better right and gets very close to Bard when they do their tests and then of course Chachi PT is going to score 100 because that's what they're trying to get right so it's an interesting way of benchmarking they even make a good point here that it's not yet a rigorous approach and then they also make a nice Point here that building an evaluation system for chat Bots remains an open question requiring further research so this is definitely true at the moment is how do you Benchmark one of these language models against another because if you just run pure sort of evaluations on them you can find that one model will respond to one way of prompting and give you a really good result another model with that same style of prompting will give you a really bad result but with a different style of prompting will give you a better result than the first model so it comes down to how do you work out the best prompting strategies for the different models so that's also an interesting area of research at the moment that people are looking at so they have a bunch of examples in here of comparing it and so if we look at these we can see that okay comparing alpaca to bakuna the vikuna one is definitely generating longer text and I'm not sure if that's always going to be a good thing or a bad thing we know that chat gbt tended to be very verbose with its generation to the point where a lot of people wanted it to be shorter and I think that's one of the things they were aiming for more with the gbt4 stuff we can see here we can just basically run different questions through so you can look at some different questions and we can see the results that come out of these and see the compare and we can look at different models too so we can look at the Llama output we can see that's really short we can look at The Bard's output if we open these up and we can get a sense of how they compare at least just through our own sort of eyes or which do we prefer really is kind of the way of going one of the things that we haven't seen people do with language models as such but people do with speech models is do MOS scores which are mean opinion scores where with speech it's probably quite easy they would basically just say oh out of these two examples which one was spoken by a human and which one wasn't and they would build up scores that way and there's a bit of a simplification but it sort of gives you an idea maybe that's going to be one of the ways that people can do this with language models as well going forward anyway so they've got this sort of way of benchmarking and they've got a nice write-up of the overview of how they did it and they talk about that it's basically just fine-tuning and the key thing here is this data set and if we scroll down we can sort of look at the different data sets so we know llama is the base model so the Llama model again why are people using the Llama model they're using the Llama model because it's built and it seems to do so well because it's got a trillion tokens and it's been trained on when we compare it to other open source models that are out there that are trained on 300 billion tokens even if they're bigger than the sort of equivalent llama model they just don't do as well so this is definitely one of the key factors but anyway we've got alpaca trained on llama vicuno trained on llama obviously bud and GPT not trained on that one so the alpaca is trained with this 52 000 samples of the self-instruct style whereas vicuna is trained on 70 000 samples of conversations and even you could think of these as being longer than seventy thousand two because they're actually expand somewhere here yeah so here they talk about they actually expanded out from being 512 which is what alpaca was trying trained on a sequence length of 512 out to 2048 and they do that by having multi-round conversations in there so it could be like I ask a question I get the answer I ask a follow-up question I get the answer I ask a follow-up question I get the answer that's all in one Span in this so it's definitely a lot more data that they're training on compared with the alpaca in there so they showed a little bit about they've got their assessment of how they're doing this and then they talk a little bit about the sort of costing of doing this too so they've come up with a nice way of being able to use spot instances so that they can sort of change instances out I guess and that obviously reduces the cost for this so anyway you can go through and have a bit more in depth look and it shows you this the other interesting thing that came out around the same time and we'll have a play with the model in a second is this all this sort of controversy around shared GPT so share GPT last week tweeted out on their account they basically tweeted out that they were taking down somewhere here they were taking down their explore page one of the reasons why they were taking it down is because they believed that Google had been using this that using the data on their site to train The Bard model and so much so an online site called the information actually reported a story about one of Google's researchers quitting over this so this is the guy by the name of Jacob Devlin and supposedly he quit after sharing concerns with Sundar pichai the CEO and Jeff Dean and another senior managers on the bad team about the fact that they were using this data from share GPT which was really data from open ai's chat GPT one of the things that makes the story kind of interesting is that perhaps hasn't been reported as much is who is this guy Jacob Devlin well it turns out he is a very good researcher he's the first author on the book paper so many of you will have heard of Bert and this was one of Google's Star models it came out in 2018 and has been used in the search engines were used in many different uses across Google and across industry in general set the tone for Roberta for a lot of the other models that came along after it and so it's interesting that if he's deciding to quit because of this and we only know what people are reporting right I'm not saying any inside knowledge of this and in fact if we look at you know later on Google denied that it was trained with you know that bud was trained with chechi PT data so it could be that there are actually for sure there are multiple versions of Bard and perhaps one of the early ones they use that and then the final ones and they didn't or I'm not sure but they've come out very strongly and denied this and basically said that Bard is not trained on any data from share gbt or chat GPT anyway in regards to this they took this down share GPT took a lot of their data down and this is what fukuna was trained on right so bakuna had a lot of this data obviously which they do talk about in here that they basically converted to markdown as a way to use it and stuff like that and it does seem like that data turns out to be very useful again though this totally makes these models just not usable for commercial use in that we've basically got llama which is not allowed to be used for commercial we've got Chachi PT data which is not allowed to be used for commercial so they're nice to play with and hopefully sooner or later we will get an open source version of this that we can use commercially but for now that's not the case another interesting thing in this article just quickly is that it's kind of interesting is they talk about deepmind being pulled in to help with this project they're calling Gemini which is trying to take on some of these language models now that could be very interesting because deepmind has actually done some really interesting work with language models already and they built their own system called Sparrow which again has never been released publicly and one of the things it was that they were trying to incorporate with this was citations so here's a blog post to go along with the paper for this that talks about what Sparrow is and some of the key things that they were trying to do to influence this and some of these things are really interesting because if you think about things like everyone's talking about RL h f right reinforcement learning from Human feedback I will make a video about that going forward but a lot of those ideas originally came out of Deep Mind and then got picked up at open Ai and they ran with them probably better than a lot of other places did so it is interesting that if some of the idea is from deepmind Sparrow come out and get used that could also make something really interesting all right let's have a play with the models so here is the actual model so as always I've got the links in the description you can come along and have a play with this here is the model you can actually play with it and they're actually serving it and it seems to run quite quickly so I've done a few different versions of playing with this one of the ones I did was I think I did this for GPT for all as well where we asked it to write some limb rakes and it still didn't do that great this definitely does seem to do a bit better for this not always right sometimes it does well and sometimes it doesn't do well for this kind of thing another one that we used a lot was right an email explaining what a GPT 4 should be opens and you'll see that it generates it's generating pretty quickly and I would say that some of these things are definitely better than some of the other models that we've looked at now the Raven one is interesting in it would be great to see the Raven model trained on this shared GPT data set I'll come back to this in a second unfortunately that one of the things that they acknowledge here is that in their release they have released code for training and serving and evaluating they're planning to try and release the weights in some way but there is no plan to release the data set so the magic data set that they scraped before everyone else unfortunately it looks like it's not going to come out anytime soon anyway you can see that this is done quite a nice email for this go along have a play with this site and see for yourself what you think about it unfortunately the model is not out yet that's when the model comes out I'll look at how we could serve it ourselves and do stuff with it ourselves but until then have a play with it like this as always if you've got any questions please put them in the comments below I will try my best to go through and answer the comments in the first day or two after the video comes out and if you found this useful please click like And subscribe I will see you in the next video bye for now

Original Description

Vicuna Demo: https://chat.lmsys.org/ In this video, I go through the new LLaMa finetuning called Vicuna and how it uses a new dataset to supposedly get to 90% quality of ChatGPT. We also look at the scandal of whether Google used that dataset. Vicuna post : https://vicuna.lmsys.org/ ShareGPT: https://sharegpt.com/ ShareGPT twitter: https://twitter.com/sharegpt Google denies using ShareGPT: https://www.theverge.com/2023/3/29/23662621/google-bard-chatgpt-sharegpt-training-denies Sparrow: https://www.deepmind.com/blog/building-safer-dialogue-agents For more tutorials on using LLMs and building Agents, check out my Patreon: Patreon: https://www.patreon.com/SamWitteveen Twitter: https://twitter.com/Sam_Witteveen My Links: Linkedin: https://www.linkedin.com/in/samwitteveen/ Github: https://github.com/samwit/langchain-tutorials https://github.com/samwit/llm-tutorials
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sam Witteveen · Sam Witteveen · 29 of 60

1 LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
Sam Witteveen
2 LangChain Basics Tutorial #2 Tools and Chains
LangChain Basics Tutorial #2 Tools and Chains
Sam Witteveen
3 ChatGPT API Announcement & Code Walkthrough with LangChain
ChatGPT API Announcement & Code Walkthrough with LangChain
Sam Witteveen
4 Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Sam Witteveen
5 LangChain - Conversations with Memory (explanation & code walkthrough)
LangChain - Conversations with Memory (explanation & code walkthrough)
Sam Witteveen
6 LangChain Chat with Flan20B
LangChain Chat with Flan20B
Sam Witteveen
7 LangChain - Using Hugging Face Models locally (code walkthrough)
LangChain - Using Hugging Face Models locally (code walkthrough)
Sam Witteveen
8 PAL : Program-aided Language Models with LangChain code
PAL : Program-aided Language Models with LangChain code
Sam Witteveen
9 Building a Summarization System with LangChain and GPT-3 - Part 1
Building a Summarization System with LangChain and GPT-3 - Part 1
Sam Witteveen
10 Building a Summarization System with LangChain and GPT-3 - Part 2
Building a Summarization System with LangChain and GPT-3 - Part 2
Sam Witteveen
11 Microsoft's Visual ChatGPT using LangChain
Microsoft's Visual ChatGPT using LangChain
Sam Witteveen
12 Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Sam Witteveen
13 LangChain Agents - Joining Tools and Chains with Decisions
LangChain Agents - Joining Tools and Chains with Decisions
Sam Witteveen
14 Investigating Alpaca 7B - Finetuned LLaMa LLM
Investigating Alpaca 7B - Finetuned LLaMa LLM
Sam Witteveen
15 Comparing LLMs with LangChain
Comparing LLMs with LangChain
Sam Witteveen
16 Running Alpaca7B in Colab
Running Alpaca7B in Colab
Sam Witteveen
17 How to finetune your own Alpaca 7B
How to finetune your own Alpaca 7B
Sam Witteveen
18 How to make a custom dataset like Alpaca7B
How to make a custom dataset like Alpaca7B
Sam Witteveen
19 Understanding Constitutional AI - the paper and key concepts
Understanding Constitutional AI - the paper and key concepts
Sam Witteveen
20 Using Constitutional AI in LangChain
Using Constitutional AI in LangChain
Sam Witteveen
21 Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Sam Witteveen
22 Text-to-video-synthesis with Diffusers and Colab
Text-to-video-synthesis with Diffusers and Colab
Sam Witteveen
23 Meet Dolly the new Alpaca model
Meet Dolly the new Alpaca model
Sam Witteveen
24 Checking out the Cerebras-GPT family of models
Checking out the Cerebras-GPT family of models
Sam Witteveen
25 A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
Sam Witteveen
26 Is GPT4All your new personal ChatGPT?
Is GPT4All your new personal ChatGPT?
Sam Witteveen
27 Raven - RWKV-7B RNN's LLM Strikes Back
Raven - RWKV-7B RNN's LLM Strikes Back
Sam Witteveen
28 Talk to your CSV & Excel with LangChain
Talk to your CSV & Excel with LangChain
Sam Witteveen
Vicuna - 90% of ChatGPT quality by using a new dataset?
Vicuna - 90% of ChatGPT quality by using a new dataset?
Sam Witteveen
30 Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Sam Witteveen
31 Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Sam Witteveen
32 BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
Sam Witteveen
33 Auto-GPT - How to Automate a Task Based AI with GPT-4
Auto-GPT - How to Automate a Task Based AI with GPT-4
Sam Witteveen
34 Improve your BabyAGI with LangChain
Improve your BabyAGI with LangChain
Sam Witteveen
35 Generative Agents - Deep Dive and GPT-4 Recreation
Generative Agents - Deep Dive and GPT-4 Recreation
Sam Witteveen
36 GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
Sam Witteveen
37 Dolly 2.0 by Databricks: Open for Business but is it  Ready to Impress!
Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!
Sam Witteveen
38 Red Pajama - Operation: Freeing LLaMA
Red Pajama - Operation: Freeing LLaMA
Sam Witteveen
39 Investigating Open Assistant - Models, Datasets and Addons
Investigating Open Assistant - Models, Datasets and Addons
Sam Witteveen
40 Investigating MiniGPT-4 - The Secret behind GPT-V?
Investigating MiniGPT-4 - The Secret behind GPT-V?
Sam Witteveen
41 Stable LM 3B - The new tiny kid on the block.
Stable LM 3B - The new tiny kid on the block.
Sam Witteveen
42 Bard can now code and put that code in Colab for you.
Bard can now code and put that code in Colab for you.
Sam Witteveen
43 Checking out Bark: a Text to Speech system by Suno AI
Checking out Bark: a Text to Speech system by Suno AI
Sam Witteveen
44 Fine-tuning LLMs with PEFT and LoRA
Fine-tuning LLMs with PEFT and LoRA
Sam Witteveen
45 Master PDF Chat with LangChain - Your essential guide to queries on documents
Master PDF Chat with LangChain - Your essential guide to queries on documents
Sam Witteveen
46 Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Sam Witteveen
47 Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Sam Witteveen
48 StableVicuna: The New King of Open ChatGPTs?
StableVicuna: The New King of Open ChatGPTs?
Sam Witteveen
49 WizardLM: Evolving Instruction Datasets to Create a Better Model
WizardLM: Evolving Instruction Datasets to Create a Better Model
Sam Witteveen
50 LaMini-LM - Mini Models Maxi Data!
LaMini-LM - Mini Models Maxi Data!
Sam Witteveen
51 Finding the Best Free ChatGPT
Finding the Best Free ChatGPT
Sam Witteveen
52 MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
Sam Witteveen
53 LangChain Retrieval QA Over Multiple Files with ChromaDB
LangChain Retrieval QA Over Multiple Files with ChromaDB
Sam Witteveen
54 LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
Sam Witteveen
55 LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
Sam Witteveen
56 Transformers Agent - Is this Hugging Face's LangChain Competitor?
Transformers Agent - Is this Hugging Face's LangChain Competitor?
Sam Witteveen
57 StarCoder - The LLM to make you a coding star?
StarCoder - The LLM to make you a coding star?
Sam Witteveen
58 Testing Starcoder for Reasoning with PAL
Testing Starcoder for Reasoning with PAL
Sam Witteveen
59 The New Wizards - Unfiltered & Unaligned
The New Wizards - Unfiltered & Unaligned
Sam Witteveen
60 Camel + LangChain for Synthetic Data & Market Research
Camel + LangChain for Synthetic Data & Market Research
Sam Witteveen

The video discusses Vicuna, an open-source chatbot that achieves 90% of ChatGPT's quality by using a new dataset, and explores the controversy surrounding Google's use of the dataset to train their Bard model. Viewers can learn how to fine-tune a language model on a new dataset and improve the performance of a chatbot. The video also covers the evaluation of chatbots and the use of retrieval augmented generation and vector stores.

Key Takeaways
  1. Fine-tune a language model on a new dataset
  2. Use retrieval augmented generation to improve chatbot performance
  3. Use vector stores to improve chatbot performance
  4. Evaluate the performance of a chatbot using a unique approach
  5. Use GPT-4 to benchmark the performance of a chatbot
💡 The use of a new dataset can significantly improve the performance of a chatbot, and fine-tuning a language model on this dataset can achieve 90% of ChatGPT's quality.

Related AI Lessons

Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →