Vicuna - 90% of ChatGPT quality by using a new dataset?
Key Takeaways
The video discusses Vicuna, an open-source chatbot that achieves 90% of ChatGPT's quality by using a new dataset, and explores the controversy surrounding Google's use of the dataset to train their Bard model. Vicuna is fine-tuned on a dataset of conversations from ChatGPT and Share GPT, and is benchmarked using a unique approach involving GPT-4.
Full Transcript
so another new model that got released towards the end of last week was vikunya the word open source used very Loosely at the moment with these models but they're calling it an open source chat bot impressing gpg4 with 90 chat gbt quality so we'll talk about what that actually means in a second so this comes out from a group of people from a number of procedures institutions in America who've been working on these kinds of things the first thing to note about this so I'm going to talk a little bit about the model we'll have a look at the model and we'll look at there's a scandal around the data set for the model as well because it turns out the model is just like so many of the other ones that we've looked at already it's basically just fine-tuning a llama model so they take a a llama model and then fine tune it now in this case they've taken the bigger 13 billion llama model and they're fine-tuned it and what they're fine-tuned it on is pretty interesting so they're fine-tuned it on this data set of conversations that are taken from chat GPT and actually they're taken from a site called share GPT so if we look up what share gbt is this is a site where people can post different conversations and you can see see you know okay what actually the chat GPT conversation was Etc and it turns out up until recently this site had a huge number of conversations that were easy to access and that you could do searches for I think they had an explore page all this kind of stuff now that's all gone right and we'll talk about in a minute why it's gone but let's let's get back to avikunya so vicuna I think is an animal that's kind of similar to an alpaca Lam pretty soon they're going to run out of animals for names for these models but anyway this is another one like I said it's tuned from llama and they've benchmarked it using an interesting kind of way of looking at it so what they've done is they've basically taken something and done a generation and they do the generation with a llama model with an alpaca model with Bard and with chechi BT and they feed this into GPT 4 with this prompt and get it to generate a score for each of these so not surprisingly llama comes at the lowest by itself because it's not really tuned you know there's no real fine tuning done for instructions with that model or very little of that done so they find that alpaca does well but then fukuno does way better right and gets very close to Bard when they do their tests and then of course Chachi PT is going to score 100 because that's what they're trying to get right so it's an interesting way of benchmarking they even make a good point here that it's not yet a rigorous approach and then they also make a nice Point here that building an evaluation system for chat Bots remains an open question requiring further research so this is definitely true at the moment is how do you Benchmark one of these language models against another because if you just run pure sort of evaluations on them you can find that one model will respond to one way of prompting and give you a really good result another model with that same style of prompting will give you a really bad result but with a different style of prompting will give you a better result than the first model so it comes down to how do you work out the best prompting strategies for the different models so that's also an interesting area of research at the moment that people are looking at so they have a bunch of examples in here of comparing it and so if we look at these we can see that okay comparing alpaca to bakuna the vikuna one is definitely generating longer text and I'm not sure if that's always going to be a good thing or a bad thing we know that chat gbt tended to be very verbose with its generation to the point where a lot of people wanted it to be shorter and I think that's one of the things they were aiming for more with the gbt4 stuff we can see here we can just basically run different questions through so you can look at some different questions and we can see the results that come out of these and see the compare and we can look at different models too so we can look at the Llama output we can see that's really short we can look at The Bard's output if we open these up and we can get a sense of how they compare at least just through our own sort of eyes or which do we prefer really is kind of the way of going one of the things that we haven't seen people do with language models as such but people do with speech models is do MOS scores which are mean opinion scores where with speech it's probably quite easy they would basically just say oh out of these two examples which one was spoken by a human and which one wasn't and they would build up scores that way and there's a bit of a simplification but it sort of gives you an idea maybe that's going to be one of the ways that people can do this with language models as well going forward anyway so they've got this sort of way of benchmarking and they've got a nice write-up of the overview of how they did it and they talk about that it's basically just fine-tuning and the key thing here is this data set and if we scroll down we can sort of look at the different data sets so we know llama is the base model so the Llama model again why are people using the Llama model they're using the Llama model because it's built and it seems to do so well because it's got a trillion tokens and it's been trained on when we compare it to other open source models that are out there that are trained on 300 billion tokens even if they're bigger than the sort of equivalent llama model they just don't do as well so this is definitely one of the key factors but anyway we've got alpaca trained on llama vicuno trained on llama obviously bud and GPT not trained on that one so the alpaca is trained with this 52 000 samples of the self-instruct style whereas vicuna is trained on 70 000 samples of conversations and even you could think of these as being longer than seventy thousand two because they're actually expand somewhere here yeah so here they talk about they actually expanded out from being 512 which is what alpaca was trying trained on a sequence length of 512 out to 2048 and they do that by having multi-round conversations in there so it could be like I ask a question I get the answer I ask a follow-up question I get the answer I ask a follow-up question I get the answer that's all in one Span in this so it's definitely a lot more data that they're training on compared with the alpaca in there so they showed a little bit about they've got their assessment of how they're doing this and then they talk a little bit about the sort of costing of doing this too so they've come up with a nice way of being able to use spot instances so that they can sort of change instances out I guess and that obviously reduces the cost for this so anyway you can go through and have a bit more in depth look and it shows you this the other interesting thing that came out around the same time and we'll have a play with the model in a second is this all this sort of controversy around shared GPT so share GPT last week tweeted out on their account they basically tweeted out that they were taking down somewhere here they were taking down their explore page one of the reasons why they were taking it down is because they believed that Google had been using this that using the data on their site to train The Bard model and so much so an online site called the information actually reported a story about one of Google's researchers quitting over this so this is the guy by the name of Jacob Devlin and supposedly he quit after sharing concerns with Sundar pichai the CEO and Jeff Dean and another senior managers on the bad team about the fact that they were using this data from share GPT which was really data from open ai's chat GPT one of the things that makes the story kind of interesting is that perhaps hasn't been reported as much is who is this guy Jacob Devlin well it turns out he is a very good researcher he's the first author on the book paper so many of you will have heard of Bert and this was one of Google's Star models it came out in 2018 and has been used in the search engines were used in many different uses across Google and across industry in general set the tone for Roberta for a lot of the other models that came along after it and so it's interesting that if he's deciding to quit because of this and we only know what people are reporting right I'm not saying any inside knowledge of this and in fact if we look at you know later on Google denied that it was trained with you know that bud was trained with chechi PT data so it could be that there are actually for sure there are multiple versions of Bard and perhaps one of the early ones they use that and then the final ones and they didn't or I'm not sure but they've come out very strongly and denied this and basically said that Bard is not trained on any data from share gbt or chat GPT anyway in regards to this they took this down share GPT took a lot of their data down and this is what fukuna was trained on right so bakuna had a lot of this data obviously which they do talk about in here that they basically converted to markdown as a way to use it and stuff like that and it does seem like that data turns out to be very useful again though this totally makes these models just not usable for commercial use in that we've basically got llama which is not allowed to be used for commercial we've got Chachi PT data which is not allowed to be used for commercial so they're nice to play with and hopefully sooner or later we will get an open source version of this that we can use commercially but for now that's not the case another interesting thing in this article just quickly is that it's kind of interesting is they talk about deepmind being pulled in to help with this project they're calling Gemini which is trying to take on some of these language models now that could be very interesting because deepmind has actually done some really interesting work with language models already and they built their own system called Sparrow which again has never been released publicly and one of the things it was that they were trying to incorporate with this was citations so here's a blog post to go along with the paper for this that talks about what Sparrow is and some of the key things that they were trying to do to influence this and some of these things are really interesting because if you think about things like everyone's talking about RL h f right reinforcement learning from Human feedback I will make a video about that going forward but a lot of those ideas originally came out of Deep Mind and then got picked up at open Ai and they ran with them probably better than a lot of other places did so it is interesting that if some of the idea is from deepmind Sparrow come out and get used that could also make something really interesting all right let's have a play with the models so here is the actual model so as always I've got the links in the description you can come along and have a play with this here is the model you can actually play with it and they're actually serving it and it seems to run quite quickly so I've done a few different versions of playing with this one of the ones I did was I think I did this for GPT for all as well where we asked it to write some limb rakes and it still didn't do that great this definitely does seem to do a bit better for this not always right sometimes it does well and sometimes it doesn't do well for this kind of thing another one that we used a lot was right an email explaining what a GPT 4 should be opens and you'll see that it generates it's generating pretty quickly and I would say that some of these things are definitely better than some of the other models that we've looked at now the Raven one is interesting in it would be great to see the Raven model trained on this shared GPT data set I'll come back to this in a second unfortunately that one of the things that they acknowledge here is that in their release they have released code for training and serving and evaluating they're planning to try and release the weights in some way but there is no plan to release the data set so the magic data set that they scraped before everyone else unfortunately it looks like it's not going to come out anytime soon anyway you can see that this is done quite a nice email for this go along have a play with this site and see for yourself what you think about it unfortunately the model is not out yet that's when the model comes out I'll look at how we could serve it ourselves and do stuff with it ourselves but until then have a play with it like this as always if you've got any questions please put them in the comments below I will try my best to go through and answer the comments in the first day or two after the video comes out and if you found this useful please click like And subscribe I will see you in the next video bye for now
Original Description
Vicuna Demo: https://chat.lmsys.org/
In this video, I go through the new LLaMa finetuning called Vicuna and how it uses a new dataset to supposedly get to 90% quality of ChatGPT. We also look at the scandal of whether Google used that dataset.
Vicuna post : https://vicuna.lmsys.org/
ShareGPT: https://sharegpt.com/
ShareGPT twitter: https://twitter.com/sharegpt
Google denies using ShareGPT: https://www.theverge.com/2023/3/29/23662621/google-bard-chatgpt-sharegpt-training-denies
Sparrow: https://www.deepmind.com/blog/building-safer-dialogue-agents
For more tutorials on using LLMs and building Agents, check out my Patreon:
Patreon: https://www.patreon.com/SamWitteveen
Twitter: https://twitter.com/Sam_Witteveen
My Links:
Linkedin: https://www.linkedin.com/in/samwitteveen/
Github:
https://github.com/samwit/langchain-tutorials
https://github.com/samwit/llm-tutorials
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Sam Witteveen · Sam Witteveen · 29 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
▶
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
Sam Witteveen
LangChain Basics Tutorial #2 Tools and Chains
Sam Witteveen
ChatGPT API Announcement & Code Walkthrough with LangChain
Sam Witteveen
Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Sam Witteveen
LangChain - Conversations with Memory (explanation & code walkthrough)
Sam Witteveen
LangChain Chat with Flan20B
Sam Witteveen
LangChain - Using Hugging Face Models locally (code walkthrough)
Sam Witteveen
PAL : Program-aided Language Models with LangChain code
Sam Witteveen
Building a Summarization System with LangChain and GPT-3 - Part 1
Sam Witteveen
Building a Summarization System with LangChain and GPT-3 - Part 2
Sam Witteveen
Microsoft's Visual ChatGPT using LangChain
Sam Witteveen
Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Sam Witteveen
LangChain Agents - Joining Tools and Chains with Decisions
Sam Witteveen
Investigating Alpaca 7B - Finetuned LLaMa LLM
Sam Witteveen
Comparing LLMs with LangChain
Sam Witteveen
Running Alpaca7B in Colab
Sam Witteveen
How to finetune your own Alpaca 7B
Sam Witteveen
How to make a custom dataset like Alpaca7B
Sam Witteveen
Understanding Constitutional AI - the paper and key concepts
Sam Witteveen
Using Constitutional AI in LangChain
Sam Witteveen
Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Sam Witteveen
Text-to-video-synthesis with Diffusers and Colab
Sam Witteveen
Meet Dolly the new Alpaca model
Sam Witteveen
Checking out the Cerebras-GPT family of models
Sam Witteveen
A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
Sam Witteveen
Is GPT4All your new personal ChatGPT?
Sam Witteveen
Raven - RWKV-7B RNN's LLM Strikes Back
Sam Witteveen
Talk to your CSV & Excel with LangChain
Sam Witteveen
Vicuna - 90% of ChatGPT quality by using a new dataset?
Sam Witteveen
Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Sam Witteveen
Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Sam Witteveen
BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
Sam Witteveen
Auto-GPT - How to Automate a Task Based AI with GPT-4
Sam Witteveen
Improve your BabyAGI with LangChain
Sam Witteveen
Generative Agents - Deep Dive and GPT-4 Recreation
Sam Witteveen
GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
Sam Witteveen
Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!
Sam Witteveen
Red Pajama - Operation: Freeing LLaMA
Sam Witteveen
Investigating Open Assistant - Models, Datasets and Addons
Sam Witteveen
Investigating MiniGPT-4 - The Secret behind GPT-V?
Sam Witteveen
Stable LM 3B - The new tiny kid on the block.
Sam Witteveen
Bard can now code and put that code in Colab for you.
Sam Witteveen
Checking out Bark: a Text to Speech system by Suno AI
Sam Witteveen
Fine-tuning LLMs with PEFT and LoRA
Sam Witteveen
Master PDF Chat with LangChain - Your essential guide to queries on documents
Sam Witteveen
Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Sam Witteveen
Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Sam Witteveen
StableVicuna: The New King of Open ChatGPTs?
Sam Witteveen
WizardLM: Evolving Instruction Datasets to Create a Better Model
Sam Witteveen
LaMini-LM - Mini Models Maxi Data!
Sam Witteveen
Finding the Best Free ChatGPT
Sam Witteveen
MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
Sam Witteveen
LangChain Retrieval QA Over Multiple Files with ChromaDB
Sam Witteveen
LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
Sam Witteveen
LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
Sam Witteveen
Transformers Agent - Is this Hugging Face's LangChain Competitor?
Sam Witteveen
StarCoder - The LLM to make you a coding star?
Sam Witteveen
Testing Starcoder for Reasoning with PAL
Sam Witteveen
The New Wizards - Unfiltered & Unaligned
Sam Witteveen
Camel + LangChain for Synthetic Data & Market Research
Sam Witteveen
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Debugging Benchmark: DeepSeek V4 Pro vs MiMo V2.5 Pro
Dev.to · Stanislav
How I'm re-discovering computer science with LLM revolution
Dev.to · popiol
I Asked ChatGPT to Fix My Life. It Couldn’t — Until I Changed One Thing
Medium · AI
I Asked ChatGPT to Fix My Life. It Couldn’t — Until I Changed One Thing
Medium · ChatGPT
🎓
Tutor Explanation
DeepCamp AI