Finding the Best Free ChatGPT
Key Takeaways
The video demonstrates the use of ChatArena, a platform developed by the creators of the Vicuna model, to test and benchmark various ChatGPT models, providing a leaderboard and statistics to compare their performance. The platform allows users to access data, run their own prompts, and contribute to the chatbot Arena Battle.
Full Transcript
okay this week the people who created vacuna released a really nice tool that I thought I'd just make a quick post about and this is the chat Arena that they've got here so this is the the team that made vikuna they've got a nice blog just talking about different things that they're doing different projects they're doing and here they've basically created this chatbot arena for benchmarking llms and really it's aimed at benchmarking the results of llms in the wild so not using traditional academic testings and benchmarks for that kind of thing it's really just getting the the models themselves against each other and and then people can sort of decide for this so if we click into the chatbot Arena you'll see that there's a number of different uh things in here so first off you can you know one of the first things you can do is you can actually select two models to compare them yourself let's say you've got a set of prompts that are for a particular domain in and you want to be able to know which kind of model is best for this this is something that's really interesting to check out is that here I've selected the koala model and the vicuna model and you can see in here what I've basically asked them to do is tell me a story about a cat called Max who was shrewd and cunning and it basically sends it off at the same time to both models they write a story and then I can read both of the stories and go through uh them I can decide which one I think is better in this case I think probably the koala one is better at the story telling so I'm gonna register a vote for that one and then I could go through and go through them I can look at that and then I can Benchmark other models so if I want to come up here I could Benchmark koala against the open Assistant model so that Unfortunately they don't have every model that's out there this is the open Assistant pythia model they don't have the 30 billion llama version of that but they do have like the alpaca uh 13 billion again from llama the original llama model itself is good for you to play with and just see okay what it's like the stable LM model remember this is not a finished train model this is still still got a long way to go from where they were we've got the dolly model so it allows us to compare a lot of these different models and and see what's going on here and you know this is a fantastic sort of thing now this probably voting like this is not a great thing so the next thing up that they've got is where you can actually have a chatbot Arena where here these models are Anonymous and I've basically asked them describe how we get energy at a cellular level so I just come down here I put my prompt in here I click Send and we can see one of the models that said well at a cellular level energy is provided by a breakdown of molecules to glucose which releases ATP okay that is quite nice and succinct here we've got a much longer answer now this will depend do I want a lot of information or do I want you know something that's succinct in this case my guess is this is probably a vacuna model or a koala model this is definitely given a more detailed one so we'll go for this one right so let's say we say B is better in the this case it will basically tell us oh okay so that was the original alpaca model and this was the open Assistant pythia model there so that's kind of interesting to look at and then we could do another one in here if we you know came in put in another prompt and here we could go through it and and do that then the next thing up is you can actually look at the leaderboard and so they've got a leaderboard where they're calculating an ELO rating so this is similar to how people calculate chess ratings here they've got this ELO score going on and we can see what are the top models in here we can see what their ratings are and I'm hoping that they're going to update this over time so I would encourage all of you to come in here do a number of these chatbot Arena ones get sort of if you could do five ten of these that's going to then really give them a lot of data to be able to work out really what's the leader at the moment bakuna is the leader and the Dead one day the guys who created pecuna I'm not sure if this is biased in any way or not I nurse koalas number two I call it certainly one of the models that I thought was the best out there koala and vikuna uh have been really good Unfortunately they don't have things like stable bikini here and they don't have some of the uh newer models like wizard LM or some of the sort of more unusual fine tunings of some of these things but at least it gives us a way to sort of Benchmark exactly how much better some of these models are so it's very interesting to see like we know that this model and this model cannot be used for commercial use but this model can so in some ways this is also telling us that this model is the current you know leader of a commercial use chat bot or a free chat GPT open source chat gbt alternative in in this case also the chat glims this is a model that I I like quite a bit as well I've never really looked at fast chat before so I will definitely go and after this go and look at this fast chat T5 model it's quite nice and small if that performs pretty well we could even just come in here and and choose that we want to do a test with that if we come in here and just pick the the ones that I wanted to attest with if I say okay the fast chat on that side and let's pick Akuna on this side actually let's pick the open assistant on this side and then we'll come in and we'll ask it what will the main few minutes okay and you see that it's gone off it's started processing we can see that our prompt has gone into both of these it's a great tool for just letting you test out the different models and see which ones do you think are best which ones do you think are worth following up with so we can see also the speed of some of these so world one I don't think really was was a main event of World War II in there so that hasn't done a great job of that although it's got some parts of you know the key thing all right I think I'm going to give it a tie because while it had some inaccuracies or gave us some sort of not useful information at the start it certainly made up for it a lot more than the this one but just gave us a list of things now this is all preference right you could argue that actually no this one is more concise and to the point and you know it really comes down to personal opinion that's why I would say for this kind of thing come in and do it the more people that do it the more we will get a score and a leaderboard of like actually what is really useful out there so let's just go back to the leaderboard so we've got the ELO scores you can also come down here and you can actually see the matchup of certain models versus other certain models too so that's quite nice to be able to sort of see like okay okay how well did a Model do against other models we can see how many times they've actually been benchmarks and stuff like that the numbers here are still pretty low from what I'm seeing so it would be great to get a lot of people coming in here testing this and hopefully you know in the next week or so they update all these scores so that we can see you know how the models actually do over time that would be something to to get a good sense about anyway they also give you a notebook if you want to come in here or you can go into the notebook and calculate what's going on so you can actually get access to the data yourself you can have a play with it overall I just wanted to say come in and test out some of these models it's a great place for testing out models and trying things out but then contribute to the actual chatbot Arena Battle by running some of your favorite prompts and seeing okay what you know models did you find to be the best and you could actually just keep running the same prompt with different models too that's something that you could do this one is fully double blind whereas the the other one is not so this is probably the one you want to go on and try out a bit more anyway as always if you've got any questions or comments please put them in the comments below if you like this video please click like And subscribe I will talk to you in the next video bye for now
Original Description
ChatArena: https://chat.lmsys.org/?arena
Blog post: https://lmsys.org/blog/2023-05-03-arena/
In this video I look at a new site by the creators of the Vicuna model which allows you to test and benchmark a wide variety of models to see which are best. They also have a leaderboard and stats about how each model does.
For more tutorials on using LLMs and building Agents, check out my Patreon:
Patreon: https://www.patreon.com/SamWitteveen
Twitter: https://twitter.com/Sam_Witteveen
My Links:
Linkedin: https://www.linkedin.com/in/samwitteveen/
00:00 Intro
00:41 ChatBot Arena
03:52 Leaderboard
07:25 More Statistics Shown
Github:
https://github.com/samwit/langchain-tutorials
https://github.com/samwit/llm-tutorials
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Sam Witteveen · Sam Witteveen · 51 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
▶
52
53
54
55
56
57
58
59
60
LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
Sam Witteveen
LangChain Basics Tutorial #2 Tools and Chains
Sam Witteveen
ChatGPT API Announcement & Code Walkthrough with LangChain
Sam Witteveen
Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Sam Witteveen
LangChain - Conversations with Memory (explanation & code walkthrough)
Sam Witteveen
LangChain Chat with Flan20B
Sam Witteveen
LangChain - Using Hugging Face Models locally (code walkthrough)
Sam Witteveen
PAL : Program-aided Language Models with LangChain code
Sam Witteveen
Building a Summarization System with LangChain and GPT-3 - Part 1
Sam Witteveen
Building a Summarization System with LangChain and GPT-3 - Part 2
Sam Witteveen
Microsoft's Visual ChatGPT using LangChain
Sam Witteveen
Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Sam Witteveen
LangChain Agents - Joining Tools and Chains with Decisions
Sam Witteveen
Investigating Alpaca 7B - Finetuned LLaMa LLM
Sam Witteveen
Comparing LLMs with LangChain
Sam Witteveen
Running Alpaca7B in Colab
Sam Witteveen
How to finetune your own Alpaca 7B
Sam Witteveen
How to make a custom dataset like Alpaca7B
Sam Witteveen
Understanding Constitutional AI - the paper and key concepts
Sam Witteveen
Using Constitutional AI in LangChain
Sam Witteveen
Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Sam Witteveen
Text-to-video-synthesis with Diffusers and Colab
Sam Witteveen
Meet Dolly the new Alpaca model
Sam Witteveen
Checking out the Cerebras-GPT family of models
Sam Witteveen
A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
Sam Witteveen
Is GPT4All your new personal ChatGPT?
Sam Witteveen
Raven - RWKV-7B RNN's LLM Strikes Back
Sam Witteveen
Talk to your CSV & Excel with LangChain
Sam Witteveen
Vicuna - 90% of ChatGPT quality by using a new dataset?
Sam Witteveen
Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Sam Witteveen
Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Sam Witteveen
BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
Sam Witteveen
Auto-GPT - How to Automate a Task Based AI with GPT-4
Sam Witteveen
Improve your BabyAGI with LangChain
Sam Witteveen
Generative Agents - Deep Dive and GPT-4 Recreation
Sam Witteveen
GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
Sam Witteveen
Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!
Sam Witteveen
Red Pajama - Operation: Freeing LLaMA
Sam Witteveen
Investigating Open Assistant - Models, Datasets and Addons
Sam Witteveen
Investigating MiniGPT-4 - The Secret behind GPT-V?
Sam Witteveen
Stable LM 3B - The new tiny kid on the block.
Sam Witteveen
Bard can now code and put that code in Colab for you.
Sam Witteveen
Checking out Bark: a Text to Speech system by Suno AI
Sam Witteveen
Fine-tuning LLMs with PEFT and LoRA
Sam Witteveen
Master PDF Chat with LangChain - Your essential guide to queries on documents
Sam Witteveen
Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Sam Witteveen
Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Sam Witteveen
StableVicuna: The New King of Open ChatGPTs?
Sam Witteveen
WizardLM: Evolving Instruction Datasets to Create a Better Model
Sam Witteveen
LaMini-LM - Mini Models Maxi Data!
Sam Witteveen
Finding the Best Free ChatGPT
Sam Witteveen
MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
Sam Witteveen
LangChain Retrieval QA Over Multiple Files with ChromaDB
Sam Witteveen
LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
Sam Witteveen
LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
Sam Witteveen
Transformers Agent - Is this Hugging Face's LangChain Competitor?
Sam Witteveen
StarCoder - The LLM to make you a coding star?
Sam Witteveen
Testing Starcoder for Reasoning with PAL
Sam Witteveen
The New Wizards - Unfiltered & Unaligned
Sam Witteveen
Camel + LangChain for Synthetic Data & Market Research
Sam Witteveen
More on: LLM Foundations
View skill →Related Reads
📰
📰
📰
📰
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Medium · AI
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Medium · Machine Learning
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Medium · LLM
Claude Sonnet 5 Didn’t Just Get Smarter. It Changed the Economics of AI.
Medium · LLM
Chapters (4)
Intro
0:41
ChatBot Arena
3:52
Leaderboard
7:25
More Statistics Shown
🎓
Tutor Explanation
DeepCamp AI