Finding the Best Free ChatGPT

Sam Witteveen · Beginner ·🧠 Large Language Models ·3y ago

Skills: LLM Foundations80%Prompt Craft60%

Key Takeaways

The video demonstrates the use of ChatArena, a platform developed by the creators of the Vicuna model, to test and benchmark various ChatGPT models, providing a leaderboard and statistics to compare their performance. The platform allows users to access data, run their own prompts, and contribute to the chatbot Arena Battle.

Full Transcript

okay this week the people who created vacuna released a really nice tool that I thought I'd just make a quick post about and this is the chat Arena that they've got here so this is the the team that made vikuna they've got a nice blog just talking about different things that they're doing different projects they're doing and here they've basically created this chatbot arena for benchmarking llms and really it's aimed at benchmarking the results of llms in the wild so not using traditional academic testings and benchmarks for that kind of thing it's really just getting the the models themselves against each other and and then people can sort of decide for this so if we click into the chatbot Arena you'll see that there's a number of different uh things in here so first off you can you know one of the first things you can do is you can actually select two models to compare them yourself let's say you've got a set of prompts that are for a particular domain in and you want to be able to know which kind of model is best for this this is something that's really interesting to check out is that here I've selected the koala model and the vicuna model and you can see in here what I've basically asked them to do is tell me a story about a cat called Max who was shrewd and cunning and it basically sends it off at the same time to both models they write a story and then I can read both of the stories and go through uh them I can decide which one I think is better in this case I think probably the koala one is better at the story telling so I'm gonna register a vote for that one and then I could go through and go through them I can look at that and then I can Benchmark other models so if I want to come up here I could Benchmark koala against the open Assistant model so that Unfortunately they don't have every model that's out there this is the open Assistant pythia model they don't have the 30 billion llama version of that but they do have like the alpaca uh 13 billion again from llama the original llama model itself is good for you to play with and just see okay what it's like the stable LM model remember this is not a finished train model this is still still got a long way to go from where they were we've got the dolly model so it allows us to compare a lot of these different models and and see what's going on here and you know this is a fantastic sort of thing now this probably voting like this is not a great thing so the next thing up that they've got is where you can actually have a chatbot Arena where here these models are Anonymous and I've basically asked them describe how we get energy at a cellular level so I just come down here I put my prompt in here I click Send and we can see one of the models that said well at a cellular level energy is provided by a breakdown of molecules to glucose which releases ATP okay that is quite nice and succinct here we've got a much longer answer now this will depend do I want a lot of information or do I want you know something that's succinct in this case my guess is this is probably a vacuna model or a koala model this is definitely given a more detailed one so we'll go for this one right so let's say we say B is better in the this case it will basically tell us oh okay so that was the original alpaca model and this was the open Assistant pythia model there so that's kind of interesting to look at and then we could do another one in here if we you know came in put in another prompt and here we could go through it and and do that then the next thing up is you can actually look at the leaderboard and so they've got a leaderboard where they're calculating an ELO rating so this is similar to how people calculate chess ratings here they've got this ELO score going on and we can see what are the top models in here we can see what their ratings are and I'm hoping that they're going to update this over time so I would encourage all of you to come in here do a number of these chatbot Arena ones get sort of if you could do five ten of these that's going to then really give them a lot of data to be able to work out really what's the leader at the moment bakuna is the leader and the Dead one day the guys who created pecuna I'm not sure if this is biased in any way or not I nurse koalas number two I call it certainly one of the models that I thought was the best out there koala and vikuna uh have been really good Unfortunately they don't have things like stable bikini here and they don't have some of the uh newer models like wizard LM or some of the sort of more unusual fine tunings of some of these things but at least it gives us a way to sort of Benchmark exactly how much better some of these models are so it's very interesting to see like we know that this model and this model cannot be used for commercial use but this model can so in some ways this is also telling us that this model is the current you know leader of a commercial use chat bot or a free chat GPT open source chat gbt alternative in in this case also the chat glims this is a model that I I like quite a bit as well I've never really looked at fast chat before so I will definitely go and after this go and look at this fast chat T5 model it's quite nice and small if that performs pretty well we could even just come in here and and choose that we want to do a test with that if we come in here and just pick the the ones that I wanted to attest with if I say okay the fast chat on that side and let's pick Akuna on this side actually let's pick the open assistant on this side and then we'll come in and we'll ask it what will the main few minutes okay and you see that it's gone off it's started processing we can see that our prompt has gone into both of these it's a great tool for just letting you test out the different models and see which ones do you think are best which ones do you think are worth following up with so we can see also the speed of some of these so world one I don't think really was was a main event of World War II in there so that hasn't done a great job of that although it's got some parts of you know the key thing all right I think I'm going to give it a tie because while it had some inaccuracies or gave us some sort of not useful information at the start it certainly made up for it a lot more than the this one but just gave us a list of things now this is all preference right you could argue that actually no this one is more concise and to the point and you know it really comes down to personal opinion that's why I would say for this kind of thing come in and do it the more people that do it the more we will get a score and a leaderboard of like actually what is really useful out there so let's just go back to the leaderboard so we've got the ELO scores you can also come down here and you can actually see the matchup of certain models versus other certain models too so that's quite nice to be able to sort of see like okay okay how well did a Model do against other models we can see how many times they've actually been benchmarks and stuff like that the numbers here are still pretty low from what I'm seeing so it would be great to get a lot of people coming in here testing this and hopefully you know in the next week or so they update all these scores so that we can see you know how the models actually do over time that would be something to to get a good sense about anyway they also give you a notebook if you want to come in here or you can go into the notebook and calculate what's going on so you can actually get access to the data yourself you can have a play with it overall I just wanted to say come in and test out some of these models it's a great place for testing out models and trying things out but then contribute to the actual chatbot Arena Battle by running some of your favorite prompts and seeing okay what you know models did you find to be the best and you could actually just keep running the same prompt with different models too that's something that you could do this one is fully double blind whereas the the other one is not so this is probably the one you want to go on and try out a bit more anyway as always if you've got any questions or comments please put them in the comments below if you like this video please click like And subscribe I will talk to you in the next video bye for now

Original Description

ChatArena: https://chat.lmsys.org/?arena Blog post: https://lmsys.org/blog/2023-05-03-arena/ In this video I look at a new site by the creators of the Vicuna model which allows you to test and benchmark a wide variety of models to see which are best. They also have a leaderboard and stats about how each model does. For more tutorials on using LLMs and building Agents, check out my Patreon: Patreon: https://www.patreon.com/SamWitteveen Twitter: https://twitter.com/Sam_Witteveen My Links: Linkedin: https://www.linkedin.com/in/samwitteveen/ 00:00 Intro 00:41 ChatBot Arena 03:52 Leaderboard 07:25 More Statistics Shown Github: https://github.com/samwit/langchain-tutorials https://github.com/samwit/llm-tutorials

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sam Witteveen · Sam Witteveen · 51 of 60

← Previous Next →

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

LangChain Basics Tutorial #2 Tools and Chains

LangChain Basics Tutorial #2 Tools and Chains

ChatGPT API Announcement & Code Walkthrough with LangChain

ChatGPT API Announcement & Code Walkthrough with LangChain

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

Trying Out Flan 20B with UL2 - Working in Colab with 8Bit Inference

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain - Conversations with Memory (explanation & code walkthrough)

LangChain Chat with Flan20B

LangChain Chat with Flan20B

LangChain - Using Hugging Face Models locally (code walkthrough)

LangChain - Using Hugging Face Models locally (code walkthrough)

PAL : Program-aided Language Models with LangChain code

PAL : Program-aided Language Models with LangChain code

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 1

Building a Summarization System with LangChain and GPT-3 - Part 2

Building a Summarization System with LangChain and GPT-3 - Part 2

Microsoft's Visual ChatGPT using LangChain

Microsoft's Visual ChatGPT using LangChain

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

Building a Summarization System with LangChain - Part 3 Using ChatGPT Turbo

LangChain Agents - Joining Tools and Chains with Decisions

LangChain Agents - Joining Tools and Chains with Decisions

Investigating Alpaca 7B - Finetuned LLaMa LLM

Investigating Alpaca 7B - Finetuned LLaMa LLM

Comparing LLMs with LangChain

Comparing LLMs with LangChain

Running Alpaca7B in Colab

Running Alpaca7B in Colab

How to finetune your own Alpaca 7B

How to finetune your own Alpaca 7B

How to make a custom dataset like Alpaca7B

How to make a custom dataset like Alpaca7B

Understanding Constitutional AI - the paper and key concepts

Understanding Constitutional AI - the paper and key concepts

Using Constitutional AI in LangChain

Using Constitutional AI in LangChain

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Text-to-video-synthesis with Diffusers and Colab

Text-to-video-synthesis with Diffusers and Colab

Meet Dolly the new Alpaca model

Meet Dolly the new Alpaca model

Checking out the Cerebras-GPT family of models

Checking out the Cerebras-GPT family of models

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

A Step-by-Step Guide to Fine-Tuning Your Dolly Model (tutorial)

Is GPT4All your new personal ChatGPT?

Is GPT4All your new personal ChatGPT?

Raven - RWKV-7B RNN's LLM Strikes Back

Raven - RWKV-7B RNN's LLM Strikes Back

Talk to your CSV & Excel with LangChain

Talk to your CSV & Excel with LangChain

Vicuna - 90% of ChatGPT quality by using a new dataset?

Vicuna - 90% of ChatGPT quality by using a new dataset?

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Koala Revealed: The ChatGPT Alternative You Need to Know! 🔍

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

Running Koala for free in Colab. Your own personal ChatGPT? (tutorial)

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

Auto-GPT - How to Automate a Task Based AI with GPT-4

Auto-GPT - How to Automate a Task Based AI with GPT-4

Improve your BabyAGI with LangChain

Improve your BabyAGI with LangChain

Generative Agents - Deep Dive and GPT-4 Recreation

Generative Agents - Deep Dive and GPT-4 Recreation

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

GPT4ALLv2: The Improvements and Drawbacks You Need to Know!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Dolly 2.0 by Databricks: Open for Business but is it Ready to Impress!

Red Pajama - Operation: Freeing LLaMA

Red Pajama - Operation: Freeing LLaMA

Investigating Open Assistant - Models, Datasets and Addons

Investigating Open Assistant - Models, Datasets and Addons

Investigating MiniGPT-4 - The Secret behind GPT-V?

Investigating MiniGPT-4 - The Secret behind GPT-V?

Stable LM 3B - The new tiny kid on the block.

Stable LM 3B - The new tiny kid on the block.

Bard can now code and put that code in Colab for you.

Bard can now code and put that code in Colab for you.

Checking out Bark: a Text to Speech system by Suno AI

Checking out Bark: a Text to Speech system by Suno AI

Fine-tuning LLMs with PEFT and LoRA

Fine-tuning LLMs with PEFT and LoRA

Master PDF Chat with LangChain - Your essential guide to queries on documents

Master PDF Chat with LangChain - Your essential guide to queries on documents

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

StableVicuna: The New King of Open ChatGPTs?

StableVicuna: The New King of Open ChatGPTs?

WizardLM: Evolving Instruction Datasets to Create a Better Model

WizardLM: Evolving Instruction Datasets to Create a Better Model

LaMini-LM - Mini Models Maxi Data!

LaMini-LM - Mini Models Maxi Data!

Finding the Best Free ChatGPT

Finding the Best Free ChatGPT

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA Over Multiple Files with ChromaDB

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

Transformers Agent - Is this Hugging Face's LangChain Competitor?

Transformers Agent - Is this Hugging Face's LangChain Competitor?

StarCoder - The LLM to make you a coding star?

StarCoder - The LLM to make you a coding star?

Testing Starcoder for Reasoning with PAL

Testing Starcoder for Reasoning with PAL

The New Wizards - Unfiltered & Unaligned

The New Wizards - Unfiltered & Unaligned

Camel + LangChain for Synthetic Data & Market Research

Camel + LangChain for Synthetic Data & Market Research

This video teaches how to use ChatArena to test and compare the performance of various ChatGPT models, and how to contribute to the chatbot Arena Battle by running custom prompts. The platform provides a unique opportunity to evaluate and improve LLMs in a fair and transparent way.

Key Takeaways

Access ChatArena
Explore the leaderboard
Run custom prompts
Evaluate model responses
Contribute to the chatbot Arena Battle

💡 The ChatArena platform provides a double-blind testing environment to ensure fairness and accuracy in evaluating LLMs.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?

Learn how Claude Sonnet 5 compares to other models like Opus 4.8 and GPT 5.6 in terms of pricing, performance, and benchmarking, and understand what these differences mean for your projects

Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?

Learn how Claude Sonnet 5 compares to Frontier models in pricing, performance, and benchmarking, and what this means for your ML projects

Medium · Machine Learning

Claude Sonnet 5 Just Launched. Is It Actually Better Or Just Newer?

Learn how Claude Sonnet 5 compares to Frontier models in terms of pricing, performance, and benchmarking, and understand what these differences mean for your projects

Claude Sonnet 5 Didn’t Just Get Smarter. It Changed the Economics of AI.

Claude Sonnet 5's advancements have transformed the economics of AI, making it more viable for production

Chapters (4)

Intro

0:41 ChatBot Arena

3:52 Leaderboard

7:25 More Statistics Shown

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)