Vicuna - 90% of ChatGPT quality by using a new dataset?

Sam Witteveen · Intermediate ·🧠 Large Language Models ·3y ago

Skills: LLM Foundations90%Fine-tuning LLMs80%RAG Basics70%Vector Stores60%

Key Takeaways

The video discusses Vicuna, an open-source chatbot that achieves 90% of ChatGPT's quality by using a new dataset, and explores the controversy surrounding Google's use of the dataset to train their Bard model. Vicuna is fine-tuned on a dataset of conversations from ChatGPT and Share GPT, and is benchmarked using a unique approach involving GPT-4.

Full Transcript

so another new model that got released towards the end of last week was vikunya the word open source used very Loosely at the moment with these models but they're calling it an open source chat bot impressing gpg4 with 90 chat gbt quality so we'll talk about what that actually means in a second so this comes out from a group of people from a number of procedures institutions in America who've been working on these kinds of things the first thing to note about this so I'm going to talk a little bit about the model we'll have a look at the model and we'll look at there's a scandal around the data set for the model as well because it turns out the model is just like so many of the other ones that we've looked at already it's basically just fine-tuning a llama model so they take a a llama model and then fine tune it now in this case they've taken the bigger 13 billion llama model and they're fine-tuned it and what they're fine-tuned it on is pretty interesting so they're fine-tuned it on this data set of conversations that are taken from chat GPT and actually they're taken from a site called share GPT so if we look up what share gbt is this is a site where people can post different conversations and you can see see you know okay what actually the chat GPT conversation was Etc and it turns out up until recently this site had a huge number of conversations that were easy to access and that you could do searches for I think they had an explore page all this kind of stuff now that's all gone right and we'll talk about in a minute why it's gone but let's let's get back to avikunya so vicuna I think is an animal that's kind of similar to an alpaca Lam pretty soon they're going to run out of animals for names for these models but anyway this is another one like I said it's tuned from llama and they've benchmarked it using an interesting kind of way of looking at it so what they've done is they've basically taken something and done a generation and they do the generation with a llama model with an alpaca model with Bard and with chechi BT and they feed this into GPT 4 with this prompt and get it to generate a score for each of these so not surprisingly llama comes at the lowest by itself because it's not really tuned you know there's no real fine tuning done for instructions with that model or very little of that done so they find that alpaca does well but then fukuno does way better right and gets very close to Bard when they do their tests and then of course Chachi PT is going to score 100 because that's what they're trying to get right so it's an interesting way of benchmarking they even make a good point here that it's not yet a rigorous approach and then they also make a nice Point here that building an evaluation system for chat Bots remains an open question requiring further research so this is definitely true at the moment is how do you Benchmark one of these language models against another because if you just run pure sort of evaluations on them you can find that one model will respond to one way of prompting and give you a really good result another model with that same style of prompting will give you a really bad result but with a different style of prompting will give you a better result than the first model so it comes down to how do you work out the best prompting strategies for the different models so that's also an interesting area of research at the moment that people are looking at so they have a bunch of examples in here of comparing it and so if we look at these we can see that okay comparing alpaca to bakuna the vikuna one is definitely generating longer text and I'm not sure if that's always going to be a good thing or a bad thing we know that chat gbt tended to be very verbose with its generation to the point where a lot of people wanted it to be shorter and I think that's one of the things they were aiming for more with the gbt4 stuff we can see here we can just basically run different questions through so you can look at some different questions and we can see the results that come out of these and see the compare and we can look at different models too so we can look at the Llama output we can see that's really short we can look at The Bard's output if we open these up and we can get a sense of how they compare at least just through our own sort of eyes or which do we prefer really is kind of the way of going one of the things that we haven't seen people do with language models as such but people do with speech models is do MOS scores which are mean opinion scores where with speech it's probably quite easy they would basically just say oh out of these two examples which one was spoken by a human and which one wasn't and they would build up scores that way and there's a bit of a simplification but it sort of gives you an idea maybe that's going to be one of the ways that people can do this with language models as well going forward anyway so they've got this sort of way of benchmarking and they've got a nice write-up of the overview of how they did it and they talk about that it's basically just fine-tuning and the key thing here is this data set and if we scroll down we can sort of look at the different data sets so we know llama is the base model so the Llama model again why are people using the Llama model they're using the Llama model because it's built and it seems to do so well because it's got a trillion tokens and it's been trained on when we compare it to other open source models that are out there that are trained on 300 billion tokens even if they're bigger than the sort of equivalent llama model they just don't do as well so this is definitely one of the key factors but anyway we've got alpaca trained on llama vicuno trained on llama obviously bud and GPT not trained on that one so the alpaca is trained with this 52 000 samples of the self-instruct style whereas vicuna is trained on 70 000 samples of conversations and even you could think of these as being longer than seventy thousand two because they're actually expand somewhere here yeah so here they talk about they actually expanded out from being 512 which is what alpaca was trying trained on a sequence length of 512 out to 2048 and they do that by having multi-round conversations in there so it could be like I ask a question I get the answer I ask a follow-up question I get the answer I ask a follow-up question I get the answer that's all in one Span in this so it's definitely a lot more data that they're training on compared with the alpaca in there so they showed a little bit about they've got their assessment of how they're doing this and then they talk a little bit about the sort of costing of doing this too so they've come up with a nice way of being able to use spot instances so that they can sort of change instances out I guess and that obviously reduces the cost for this so anyway you can go through and have a bit more in depth look and it shows you this the other interesting thing that came out around the same time and we'll have a play with the model in a second is this all this sort of controversy around shared GPT so share GPT last week tweeted out on their account they basically tweeted out that they were taking down somewhere here they were taking down their explore page one of the reasons why they were taking it down is because they believed that Google had been using this that using the data on their site to train The Bard model and so much so an online site called the information actually reported a story about one of Google's researchers quitting over this so this is the guy by the name of Jacob Devlin and supposedly he quit after sharing concerns with Sundar pichai the CEO and Jeff Dean and another senior managers on the bad team about the fact that they were using this data from share GPT which was really data from open ai's chat GPT one of the things that makes the story kind of interesting is that perhaps hasn't been reported as much is who is this guy Jacob Devlin well it turns out he is a very good researcher he's the first author on the book paper so many of you will have heard of Bert and this was one of Google's Star models it came out in 2018 and has been used in the search engines were used in many different uses across Google and across industry in general set the tone for Roberta for a lot of the other models that came along after it and so it's interesting that if he's deciding to quit because of this and we only know what people are reporting right I'm not saying any inside knowledge of this and in fact if we look at you know later on Google denied that it was trained with you know that bud was trained with chechi PT data so it could be that there are actually for sure there are multiple versions of Bard and perhaps one of the early ones they use that and then the final ones and they didn't or I'm not sure but they've come out very strongly and denied this and basically said that Bard is not trained on any data from share gbt or chat GPT anyway in regards to this they took this down share GPT took a lot of their data down and this is what fukuna was trained on right so bakuna had a lot of this data obviously which they do talk about in here that they basically converted to markdown as a way to use it and stuff like that and it does seem like that data turns out to be very useful again though this totally makes these models just not usable for commercial use in that we've basically got llama which is not allowed to be used for commercial we've got Chachi PT data which is not allowed to be used for commercial so they're nice to play with and hopefully sooner or later we will get an open source version of this that we can use commercially but for now that's not the case another interesting thing in this article just quickly is that it's kind of interesting is they talk about deepmind being pulled in to help with this project they're calling Gemini which is trying to take on some of these language models now that could be very interesting because deepmind has actually done some really interesting work with language models already and they built their own system called Sparrow which again has never been released publicly and one of the things it was that they were trying to incorporate with this was citations so here's a blog post to go along with the paper for this that talks about what Sparrow is and some of the key things that they were trying to do to influence this and some of these things are really interesting because if you think about things like everyone's talking about RL h f right reinforcement learning from Human feedback I will make a video about that going forward but a lot of those ideas originally came out of Deep Mind and then got picked up at open Ai and they ran with them probably better than a lot of other places did so it is interesting that if some of the idea is from deepmind Sparrow come out and get used that could also make something really interesting all right let's have a play with the models so here is the actual model so as always I've got the links in the description you can come along and have a play with this here is the model you can actually play with it and they're actually serving it and it seems to run quite quickly so I've done a few different versions of playing with this one of the ones I did was I think I did this for GPT for all as well where we asked it to write some limb rakes and it still didn't do that great this definitely does seem to do a bit better for this not always right sometimes it does well and sometimes it doesn't do well for this kind of thing another one that we used a lot was right an email explaining what a GPT 4 should be opens and you'll see that it generates it's generating pretty quickly and I would say that some of these things are definitely better than some of the other models that we've looked at now the Raven one is interesting in it would be great to see the Raven model trained on this shared GPT data set I'll come back to this in a second unfortunately that one of the things that they acknowledge here is that in their release they have released code for training and serving and evaluating they're planning to try and release the weights in some way but there is no plan to release the data set so the magic data set that they scraped before everyone else unfortunately it looks like it's not going to come out anytime soon anyway you can see that this is done quite a nice email for this go along have a play with this site and see for yourself what you think about it unfortunately the model is not out yet that's when the model comes out I'll look at how we could serve it ourselves and do stuff with it ourselves but until then have a play with it like this as always if you've got any questions please put them in the comments below I will try my best to go through and answer the comments in the first day or two after the video comes out and if you found this useful please click like And subscribe I will see you in the next video bye for now

Original Description

Vicuna Demo: https://chat.lmsys.org/ In this video, I go through the new LLaMa finetuning called Vicuna and how it uses a new dataset to supposedly get to 90% quality of ChatGPT. We also look at the scandal of whether Google used that dataset. Vicuna post : https://vicuna.lmsys.org/ ShareGPT: https://sharegpt.com/ ShareGPT twitter: https://twitter.com/sharegpt Google denies using ShareGPT: https://www.theverge.com/2023/3/29/23662621/google-bard-chatgpt-sharegpt-training-denies Sparrow: https://www.deepmind.com/blog/building-safer-dialogue-agents For more tutorials on using LLMs and building Agents, check out my Patreon: Patreon: https://www.patreon.com/SamWitteveen Twitter: https://twitter.com/Sam_Witteveen My Links: Linkedin: https://www.linkedin.com/in/samwitteveen/ Github: https://github.com/samwit/langchain-tutorials https://github.com/samwit/llm-tutorials

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sam Witteveen · Sam Witteveen · 29 of 60

← Previous Next →

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #2 Tools and Chains

LangChain Basics Tutorial #2 Tools and Chains

ChatGPT API Announcement & Code Walkthrough with LangChain

ChatGPT API Announcement & Code Walkthrough with LangChain

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain Chat with Flan20B

LangChain Chat with Flan20B

LangChain - Using Hugging Face Models locally (code walkthrough)

LangChain - Using Hugging Face Models locally (code walkthrough)

PAL : Program-aided Language Models with LangChain code

PAL : Program-aided Language Models with LangChain code

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 2

Building a Summarization System with LangChain and GPT-3 - Part 2

Microsoft's Visual ChatGPT using LangChain

Microsoft's Visual ChatGPT using LangChain

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

LangChain Agents - Joining Tools and Chains with Decisions

LangChain Agents - Joining Tools and Chains with Decisions

Investigating Alpaca 7B - Finetuned LLaMa LLM

Investigating Alpaca 7B - Finetuned LLaMa LLM

Comparing LLMs with LangChain

Comparing LLMs with LangChain

Running Alpaca7B in Colab

Running Alpaca7B in Colab

How to finetune your own Alpaca 7B

How to finetune your own Alpaca 7B

How to make a custom dataset like Alpaca7B

How to make a custom dataset like Alpaca7B

Understanding Constitutional AI - the paper and key concepts

Understanding Constitutional AI - the paper and key concepts

Using Constitutional AI in LangChain

Using Constitutional AI in LangChain

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Text-to-video-synthesis with Diffusers and Colab

Text-to-video-synthesis with Diffusers and Colab

Meet Dolly the new Alpaca model

Meet Dolly the new Alpaca model

Checking out the Cerebras-GPT family of models

Checking out the Cerebras-GPT family of models

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

Is GPT4All your new personal ChatGPT?

Is GPT4All your new personal ChatGPT?

Raven - RWKV-7B RNN's LLM Strikes Back

Raven - RWKV-7B RNN's LLM Strikes Back

Talk to your CSV & Excel with LangChain

Talk to your CSV & Excel with LangChain

Vicuna - 90% of ChatGPT quality by using a new dataset?

Vicuna - 90% of ChatGPT quality by using a new dataset?

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

Auto-GPT - How to Automate a Task Based AI with GPT-4

Auto-GPT - How to Automate a Task Based AI with GPT-4

Improve your BabyAGI with LangChain

Improve your BabyAGI with LangChain

Generative Agents - Deep Dive and GPT-4 Recreation

Generative Agents - Deep Dive and GPT-4 Recreation

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Red Pajama - Operation: Freeing LLaMA

Red Pajama - Operation: Freeing LLaMA

Investigating Open Assistant - Models, Datasets and Addons

Investigating Open Assistant - Models, Datasets and Addons

Investigating MiniGPT-4 - The Secret behind GPT-V?

Investigating MiniGPT-4 - The Secret behind GPT-V?

Stable LM 3B - The new tiny kid on the block.

Stable LM 3B - The new tiny kid on the block.

Bard can now code and put that code in Colab for you.

Bard can now code and put that code in Colab for you.

Checking out Bark: a Text to Speech system by Suno AI

Checking out Bark: a Text to Speech system by Suno AI

Fine-tuning LLMs with PEFT and LoRA

Fine-tuning LLMs with PEFT and LoRA

Master PDF Chat with LangChain - Your essential guide to queries on documents

Master PDF Chat with LangChain - Your essential guide to queries on documents

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

StableVicuna: The New King of Open ChatGPTs?

StableVicuna: The New King of Open ChatGPTs?

WizardLM: Evolving Instruction Datasets to Create a Better Model

WizardLM: Evolving Instruction Datasets to Create a Better Model

LaMini-LM - Mini Models Maxi Data!

LaMini-LM - Mini Models Maxi Data!

Finding the Best Free ChatGPT

Finding the Best Free ChatGPT

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

Transformers Agent - Is this Hugging Face's LangChain Competitor?

Transformers Agent - Is this Hugging Face's LangChain Competitor?

StarCoder - The LLM to make you a coding star?

StarCoder - The LLM to make you a coding star?

Testing Starcoder for Reasoning with PAL

Testing Starcoder for Reasoning with PAL

The New Wizards - Unfiltered & Unaligned

The New Wizards - Unfiltered & Unaligned

Camel + LangChain for Synthetic Data & Market Research

Camel + LangChain for Synthetic Data & Market Research

The video discusses Vicuna, an open-source chatbot that achieves 90% of ChatGPT's quality by using a new dataset, and explores the controversy surrounding Google's use of the dataset to train their Bard model. Viewers can learn how to fine-tune a language model on a new dataset and improve the performance of a chatbot. The video also covers the evaluation of chatbots and the use of retrieval augmented generation and vector stores.

Key Takeaways

Fine-tune a language model on a new dataset
Use retrieval augmented generation to improve chatbot performance
Use vector stores to improve chatbot performance
Evaluate the performance of a chatbot using a unique approach
Use GPT-4 to benchmark the performance of a chatbot

💡 The use of a new dataset can significantly improve the performance of a chatbot, and fine-tuning a language model on this dataset can achieve 90% of ChatGPT's quality.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

The 2026 AI Model Release Race: Every Major LLM Launch You Need to Know

Stay updated on the 2026 AI model release race, including major LLM launches like Claude Sonnet 5 and GPT-5.6, to leverage the latest advancements in AI technology

Call GPT, Claude, and Gemini from one API key — a 3-step setup

Access GPT, Claude, and Gemini through one API key with a 3-step setup using Modelishub

Your LLM Doesn’t Pick Stocks — It Remembers Them

Discover how LLMs remember stock picks rather than making actual predictions, and why this matters for AI-driven investment strategies

Medium · Machine Learning

Word Representation

Learn how word representation works in NLP and its importance in understanding human language, enabling applications like text classification and language translation

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)