Finding the Best Free ChatGPT

Sam Witteveen · Beginner ·🧠 Large Language Models ·3y ago

Key Takeaways

The video demonstrates the use of ChatArena, a platform developed by the creators of the Vicuna model, to test and benchmark various ChatGPT models, providing a leaderboard and statistics to compare their performance. The platform allows users to access data, run their own prompts, and contribute to the chatbot Arena Battle.

Full Transcript

okay this week the people who created vacuna released a really nice tool that I thought I'd just make a quick post about and this is the chat Arena that they've got here so this is the the team that made vikuna they've got a nice blog just talking about different things that they're doing different projects they're doing and here they've basically created this chatbot arena for benchmarking llms and really it's aimed at benchmarking the results of llms in the wild so not using traditional academic testings and benchmarks for that kind of thing it's really just getting the the models themselves against each other and and then people can sort of decide for this so if we click into the chatbot Arena you'll see that there's a number of different uh things in here so first off you can you know one of the first things you can do is you can actually select two models to compare them yourself let's say you've got a set of prompts that are for a particular domain in and you want to be able to know which kind of model is best for this this is something that's really interesting to check out is that here I've selected the koala model and the vicuna model and you can see in here what I've basically asked them to do is tell me a story about a cat called Max who was shrewd and cunning and it basically sends it off at the same time to both models they write a story and then I can read both of the stories and go through uh them I can decide which one I think is better in this case I think probably the koala one is better at the story telling so I'm gonna register a vote for that one and then I could go through and go through them I can look at that and then I can Benchmark other models so if I want to come up here I could Benchmark koala against the open Assistant model so that Unfortunately they don't have every model that's out there this is the open Assistant pythia model they don't have the 30 billion llama version of that but they do have like the alpaca uh 13 billion again from llama the original llama model itself is good for you to play with and just see okay what it's like the stable LM model remember this is not a finished train model this is still still got a long way to go from where they were we've got the dolly model so it allows us to compare a lot of these different models and and see what's going on here and you know this is a fantastic sort of thing now this probably voting like this is not a great thing so the next thing up that they've got is where you can actually have a chatbot Arena where here these models are Anonymous and I've basically asked them describe how we get energy at a cellular level so I just come down here I put my prompt in here I click Send and we can see one of the models that said well at a cellular level energy is provided by a breakdown of molecules to glucose which releases ATP okay that is quite nice and succinct here we've got a much longer answer now this will depend do I want a lot of information or do I want you know something that's succinct in this case my guess is this is probably a vacuna model or a koala model this is definitely given a more detailed one so we'll go for this one right so let's say we say B is better in the this case it will basically tell us oh okay so that was the original alpaca model and this was the open Assistant pythia model there so that's kind of interesting to look at and then we could do another one in here if we you know came in put in another prompt and here we could go through it and and do that then the next thing up is you can actually look at the leaderboard and so they've got a leaderboard where they're calculating an ELO rating so this is similar to how people calculate chess ratings here they've got this ELO score going on and we can see what are the top models in here we can see what their ratings are and I'm hoping that they're going to update this over time so I would encourage all of you to come in here do a number of these chatbot Arena ones get sort of if you could do five ten of these that's going to then really give them a lot of data to be able to work out really what's the leader at the moment bakuna is the leader and the Dead one day the guys who created pecuna I'm not sure if this is biased in any way or not I nurse koalas number two I call it certainly one of the models that I thought was the best out there koala and vikuna uh have been really good Unfortunately they don't have things like stable bikini here and they don't have some of the uh newer models like wizard LM or some of the sort of more unusual fine tunings of some of these things but at least it gives us a way to sort of Benchmark exactly how much better some of these models are so it's very interesting to see like we know that this model and this model cannot be used for commercial use but this model can so in some ways this is also telling us that this model is the current you know leader of a commercial use chat bot or a free chat GPT open source chat gbt alternative in in this case also the chat glims this is a model that I I like quite a bit as well I've never really looked at fast chat before so I will definitely go and after this go and look at this fast chat T5 model it's quite nice and small if that performs pretty well we could even just come in here and and choose that we want to do a test with that if we come in here and just pick the the ones that I wanted to attest with if I say okay the fast chat on that side and let's pick Akuna on this side actually let's pick the open assistant on this side and then we'll come in and we'll ask it what will the main few minutes okay and you see that it's gone off it's started processing we can see that our prompt has gone into both of these it's a great tool for just letting you test out the different models and see which ones do you think are best which ones do you think are worth following up with so we can see also the speed of some of these so world one I don't think really was was a main event of World War II in there so that hasn't done a great job of that although it's got some parts of you know the key thing all right I think I'm going to give it a tie because while it had some inaccuracies or gave us some sort of not useful information at the start it certainly made up for it a lot more than the this one but just gave us a list of things now this is all preference right you could argue that actually no this one is more concise and to the point and you know it really comes down to personal opinion that's why I would say for this kind of thing come in and do it the more people that do it the more we will get a score and a leaderboard of like actually what is really useful out there so let's just go back to the leaderboard so we've got the ELO scores you can also come down here and you can actually see the matchup of certain models versus other certain models too so that's quite nice to be able to sort of see like okay okay how well did a Model do against other models we can see how many times they've actually been benchmarks and stuff like that the numbers here are still pretty low from what I'm seeing so it would be great to get a lot of people coming in here testing this and hopefully you know in the next week or so they update all these scores so that we can see you know how the models actually do over time that would be something to to get a good sense about anyway they also give you a notebook if you want to come in here or you can go into the notebook and calculate what's going on so you can actually get access to the data yourself you can have a play with it overall I just wanted to say come in and test out some of these models it's a great place for testing out models and trying things out but then contribute to the actual chatbot Arena Battle by running some of your favorite prompts and seeing okay what you know models did you find to be the best and you could actually just keep running the same prompt with different models too that's something that you could do this one is fully double blind whereas the the other one is not so this is probably the one you want to go on and try out a bit more anyway as always if you've got any questions or comments please put them in the comments below if you like this video please click like And subscribe I will talk to you in the next video bye for now

Original Description

ChatArena: https://chat.lmsys.org/?arena Blog post: https://lmsys.org/blog/2023-05-03-arena/ In this video I look at a new site by the creators of the Vicuna model which allows you to test and benchmark a wide variety of models to see which are best. They also have a leaderboard and stats about how each model does. For more tutorials on using LLMs and building Agents, check out my Patreon: Patreon: https://www.patreon.com/SamWitteveen Twitter: https://twitter.com/Sam_Witteveen My Links: Linkedin: https://www.linkedin.com/in/samwitteveen/ 00:00 Intro 00:41 ChatBot Arena 03:52 Leaderboard 07:25 More Statistics Shown Github: https://github.com/samwit/langchain-tutorials https://github.com/samwit/llm-tutorials
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sam Witteveen · Sam Witteveen · 51 of 60

1 LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
Sam Witteveen
2 LangChain Basics Tutorial #2 Tools and Chains
LangChain Basics Tutorial #2 Tools and Chains
Sam Witteveen
3 ChatGPT API Announcement & Code Walkthrough with LangChain
ChatGPT API Announcement & Code Walkthrough with LangChain
Sam Witteveen
4 Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference
Sam Witteveen
5 LangChain - Conversations with Memory (explanation & code walkthrough)
LangChain - Conversations with Memory (explanation & code walkthrough)
Sam Witteveen
6 LangChain Chat with Flan20B
LangChain Chat with Flan20B
Sam Witteveen
7 LangChain - Using Hugging Face Models locally (code walkthrough)
LangChain - Using Hugging Face Models locally (code walkthrough)
Sam Witteveen
8 PAL : Program-aided Language Models with LangChain code
PAL : Program-aided Language Models with LangChain code
Sam Witteveen
9 Building a Summarization System with LangChain and GPT-3 - Part 1
Building a Summarization System with LangChain and GPT-3 - Part 1
Sam Witteveen
10 Building a Summarization System with LangChain and GPT-3 - Part 2
Building a Summarization System with LangChain and GPT-3 - Part 2
Sam Witteveen
11 Microsoft's Visual ChatGPT using LangChain
Microsoft's Visual ChatGPT using LangChain
Sam Witteveen
12 Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo
Sam Witteveen
13 LangChain Agents - Joining Tools and Chains with Decisions
LangChain Agents - Joining Tools and Chains with Decisions
Sam Witteveen
14 Investigating Alpaca 7B - Finetuned LLaMa LLM
Investigating Alpaca 7B - Finetuned LLaMa LLM
Sam Witteveen
15 Comparing LLMs with LangChain
Comparing LLMs with LangChain
Sam Witteveen
16 Running Alpaca7B in Colab
Running Alpaca7B in Colab
Sam Witteveen
17 How to finetune your own Alpaca 7B
How to finetune your own Alpaca 7B
Sam Witteveen
18 How to make a custom dataset like Alpaca7B
How to make a custom dataset like Alpaca7B
Sam Witteveen
19 Understanding Constitutional AI - the paper and key concepts
Understanding Constitutional AI - the paper and key concepts
Sam Witteveen
20 Using Constitutional AI in LangChain
Using Constitutional AI in LangChain
Sam Witteveen
21 Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Sam Witteveen
22 Text-to-video-synthesis with Diffusers and Colab
Text-to-video-synthesis with Diffusers and Colab
Sam Witteveen
23 Meet Dolly the new Alpaca model
Meet Dolly the new Alpaca model
Sam Witteveen
24 Checking out the Cerebras-GPT family of models
Checking out the Cerebras-GPT family of models
Sam Witteveen
25 A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)
Sam Witteveen
26 Is GPT4All your new personal ChatGPT?
Is GPT4All your new personal ChatGPT?
Sam Witteveen
27 Raven - RWKV-7B RNN's LLM Strikes Back
Raven - RWKV-7B RNN's LLM Strikes Back
Sam Witteveen
28 Talk to your CSV & Excel with LangChain
Talk to your CSV & Excel with LangChain
Sam Witteveen
29 Vicuna - 90% of ChatGPT quality by using a new dataset?
Vicuna - 90% of ChatGPT quality by using a new dataset?
Sam Witteveen
30 Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍
Sam Witteveen
31 Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)
Sam Witteveen
32 BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
Sam Witteveen
33 Auto-GPT - How to Automate a Task Based AI with GPT-4
Auto-GPT - How to Automate a Task Based AI with GPT-4
Sam Witteveen
34 Improve your BabyAGI with LangChain
Improve your BabyAGI with LangChain
Sam Witteveen
35 Generative Agents - Deep Dive and GPT-4 Recreation
Generative Agents - Deep Dive and GPT-4 Recreation
Sam Witteveen
36 GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
GPT4ALLv2: The Improvements and Drawbacks You Need to Know!
Sam Witteveen
37 Dolly 2.0 by Databricks: Open for Business but is it  Ready to Impress!
Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!
Sam Witteveen
38 Red Pajama - Operation: Freeing LLaMA
Red Pajama - Operation: Freeing LLaMA
Sam Witteveen
39 Investigating Open Assistant - Models, Datasets and Addons
Investigating Open Assistant - Models, Datasets and Addons
Sam Witteveen
40 Investigating MiniGPT-4 - The Secret behind GPT-V?
Investigating MiniGPT-4 - The Secret behind GPT-V?
Sam Witteveen
41 Stable LM 3B - The new tiny kid on the block.
Stable LM 3B - The new tiny kid on the block.
Sam Witteveen
42 Bard can now code and put that code in Colab for you.
Bard can now code and put that code in Colab for you.
Sam Witteveen
43 Checking out Bark: a Text to Speech system by Suno AI
Checking out Bark: a Text to Speech system by Suno AI
Sam Witteveen
44 Fine-tuning LLMs with PEFT and LoRA
Fine-tuning LLMs with PEFT and LoRA
Sam Witteveen
45 Master PDF Chat with LangChain - Your essential guide to queries on documents
Master PDF Chat with LangChain - Your essential guide to queries on documents
Sam Witteveen
46 Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
Sam Witteveen
47 Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Sam Witteveen
48 StableVicuna: The New King of Open ChatGPTs?
StableVicuna: The New King of Open ChatGPTs?
Sam Witteveen
49 WizardLM: Evolving Instruction Datasets to Create a Better Model
WizardLM: Evolving Instruction Datasets to Create a Better Model
Sam Witteveen
50 LaMini-LM - Mini Models Maxi Data!
LaMini-LM - Mini Models Maxi Data!
Sam Witteveen
Finding the Best Free ChatGPT
Finding the Best Free ChatGPT
Sam Witteveen
52 MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model
Sam Witteveen
53 LangChain Retrieval QA Over Multiple Files with ChromaDB
LangChain Retrieval QA Over Multiple Files with ChromaDB
Sam Witteveen
54 LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs
Sam Witteveen
55 LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
Sam Witteveen
56 Transformers Agent - Is this Hugging Face's LangChain Competitor?
Transformers Agent - Is this Hugging Face's LangChain Competitor?
Sam Witteveen
57 StarCoder - The LLM to make you a coding star?
StarCoder - The LLM to make you a coding star?
Sam Witteveen
58 Testing Starcoder for Reasoning with PAL
Testing Starcoder for Reasoning with PAL
Sam Witteveen
59 The New Wizards - Unfiltered & Unaligned
The New Wizards - Unfiltered & Unaligned
Sam Witteveen
60 Camel + LangChain for Synthetic Data & Market Research
Camel + LangChain for Synthetic Data & Market Research
Sam Witteveen

This video teaches how to use ChatArena to test and compare the performance of various ChatGPT models, and how to contribute to the chatbot Arena Battle by running custom prompts. The platform provides a unique opportunity to evaluate and improve LLMs in a fair and transparent way.

Key Takeaways
  1. Access ChatArena
  2. Explore the leaderboard
  3. Run custom prompts
  4. Evaluate model responses
  5. Contribute to the chatbot Arena Battle
💡 The ChatArena platform provides a double-blind testing environment to ensure fairness and accuracy in evaluating LLMs.

Related Reads

📰
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Learn how Claude Sonnet 5 compares to other models like Opus 4.8 and GPT 5.6 in terms of pricing, performance, and benchmarking, and understand what these differences mean for your projects
Medium · AI
📰
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Learn how Claude Sonnet 5 compares to Frontier models in pricing, performance, and benchmarking, and what this means for your ML projects
Medium · Machine Learning
📰
Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?
Learn how Claude Sonnet 5 compares to Frontier models in terms of pricing, performance, and benchmarking, and understand what these differences mean for your projects
Medium · LLM
📰
Claude Sonnet 5 Didn’t Just Get Smarter. It Changed the Economics of AI.
Claude Sonnet 5's advancements have transformed the economics of AI, making it more viable for production
Medium · LLM

Chapters (4)

Intro
0:41 ChatBot Arena
3:52 Leaderboard
7:25 More Statistics Shown
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →