Understanding LLM Settings

Elvis Saravia · Beginner ·🧠 Large Language Models ·2y ago

Skills: LLM Foundations90%Prompt Craft60%LLM Engineering50%

Key Takeaways

The video explains various LLM settings such as temperature, top P, max length, stop sequences, frequency penalty, and presence penalty, and how to use them to get desirable results in different use cases, including fact-based question answering and creative tasks like email generation or lyrics generation, using tools like Open AI Playground and LLM providers.

Full Transcript

hi everyone in this video I want to talk about llm settings so the idea of this section in our prompting guide is to tell you a little bit about how to use these llm settings so when you're exploring and experimenting and prompting these models there are a couple of settings that you can tune to get the desirable results that you want now if you are coming from the world of chat GPT right if you use chat GPT the conversational chatbot from openi you may not know that these models are actually using some specific fixed settings uh you don't see them you cannot really tweak those you cannot configure them but if you come from the world of apis you do have access to certain settings that you can configure and you can adjust to get the results that you want so this is is very popular among developers so this only applies to you if you're using some type of llm apis right this could be any provider it could be openc or any of these other llm providers so what I want to do in this video is to go through a few of these settings and explain to you with some examples how you can leverage these settings there are a couple of settings that do stand out here when using large language models via apis if you go to the playground you pretty much have an idea on what are these important settings so you have them right here for instance in the open ey playground you have what's called temperature uh maximum length stop sequencies stop B frequency penalty presence penalty and what we have done in our guide is basically provide you some explanations as to what these are now in this video what I want to do is I want to kind of quickly go over these ideas and try to explain to you how you can leverage them when you're developing with these models I must say that we often don't really talk about temperature or topy or you know most of these settings but they're actually quite important and useful uh but it really depends on what you're aiming to achieve so let's go through some of these so I'll start with temperature now temperature basically helps you to it's it's a value right and and you can see here in the playground is a value that ranges from zero all the way to two um the default is one right so this is the default that open eye playground has set for you right and sometimes when we are doing the examples in the playground we don't even look at this but that's there for you and you can see the definition here right it actually controls Randomness so what does it mean by that basically the way I understand temperature is you can increase it or decrease it right and this decreases or increases the the confidence a model has in its most likely response so if you look at our uh definition for it here right you can see that you're essentially increasing the weights of the other possible tokens if you are increasing the temperature value and why is this useful so it's useful because it really depends on the task right so let's say we were dealing with some kind of fact-based question answering you know task or application right we want to encourage them all to be more factual and less random in these responses right or less diverse in what it is outputting right at the end of the day it's outputting these sequence of tokens right and we want those tokens to be what the mall is confident in generating and so if we want that what we do is we basically decrease the temperature right the closer it is to zero right the less random those opos are going to be so you can imagine that yes for fact-based question answering it's pretty useful to have those low temperature values or use those temperature values that are kind of kind of lower closer to zero now if you're doing something like email generation or some kind of Point generation or you're generating lyrics or something like that that's more creative on a creative side it is beneficial to increase the temperature value and experiment with increasing those however do note that as you increase the temperature value something that we have seen in our experiments right by that I mean that you can increase it all the way to two something that we have seen is that they become so Random to the point where the model is basically producing like giberish right producing something that doesn't make any sense nonsensical sequence of tokens so be very careful when you're setting these temperature values really high when you're doing it low you know this is less of a problem right because it's less random but when you're doing it you know above one and 1.5 or something like that um be very careful about that and you have to do a lot of experimentation to to see what the model is up putting for your application hopefully that makes sense now I think temperature is one of the more important llm settings there are other configurations as well like top p and I see this with all the language model providers right so it's really good to be familiar with these Concepts and top P basically is you could consider it like a sampling technique it's almost like an alternative in a way and the reason I say that is because um it is a very similar concept to temperature um and actually if you look at the documentation of opening eye you can see that they're telling you that it's basically you know they recommend to use stop P if you're using top P don't use temperature and if you're using temperature don't use top P right so do not use both at the same time just try to set one and that should be fine and that tells you that it's basically an alternative sampling technique here um with temperature so the idea of top the way I understood top p is that you have a high top p uh value this basically enables the model to look at more possible words right including the ones that are less likely uh which leads to more diverse output so it has very similar effect to temperature although you may get obviously different results when you use temperature compared to when you stop B so if you're if you're experimenting with temperature you're not getting those desired results then maybe you can you know just leave temperature default value and then kind of go to topy and experiment with topy that's how I generally use it I never use both at the same time in fact these days um I focus a lot on promp engineering like optimizing The Prompt as opposed to messing around with the temperature or these uh top P values so that's just something to note here uh you can read the full definition here um there's a lot of good content that goes into like the technical details of these configurations but I think it's what I've explained is is good enough uh just like the intuition of it and when you may want to use it and when not so you can see here the general recommendation is to alter temperature or top people that both and I think this does apply to most of the llm providers so if you're using something like fireworks if you're using like a here or Cloud uh Gemini whatever that may be um I think you you might consider this recommendation when you're doing that um now I've heard I've read in some forums that actually some developers combine both of them and they are getting good quality responses from these models but that's something that's an exception I really rarely see this to be the case and we rarely use it this way now there are other settings like Max land stop sequencies frequency penalty uh presence penalty and so on um I'll just go briefly through each one of these these we use less it really depends really on the circumstances or our use cases so let's say we are trying to prevent some irrelevant responses which is I would say less of a problem now with these models however there is the problem of cost right molds are getting cheaper to you so so you can make an argument that this is less important however when we started with these language models right we they started really expensive and it was really nice to be able to control like how much tokens uh you know how much tokens the model can generate uh so that you can control cost right so the model can go on and on and on generating taxt and and so and it doesn't finish and then next thing you know you have a really high bill so try to you know use this and and it really depends again on the use case and your needs now stop sequence is another interesting one basically you define a string right that stops the model from generating tokens right so you can you can have for instance in the open ey playground there is a stop sequence here right and they even explain to you what it is so here you you can provide whatever sequence you you are using or whatever sequence you are expecting them all to Output as the final token right um again we rarely use this one it it I think it's very Niche and and it really applies only to some type of of task and we have used it for instance for like when we are generating code that it's really interesting to use it in that setting uh because we want the model to like don't explain the code just kind of output the code and we know what the stop sequencies are going to be and so on now we have this frequency penalty presence penalty now if you are familiar with language models um if you go by back a few years you will know that these language models used to generate a lot of like repeated text right and that was in very common issue with these models today it's less of a problem I would say and now if you are still having that problem if you're still facing that problem with some of these language modes it could be the case that you're seeing this um that the mod is repeating certain tokens or using certain words in its response a lot if you want to control for that what you can do is you can use the frequency penalty and it's available right on the playground right so the more you increase this the more it penalizes the the model and it avoids the model from outputting or repeating you know certain words right um so that's the idea of the frequency the presence is very similar so basically this one prevents the mod from repeating phrases often it's in its response right so it it you know unlike the other one which is is a frequency penalty uh the penalty is the same for all repeated tokens which means you know it's going to avoid this is a good way to avoid um the model from repeating certain sequences or certain phrases too often so yes that would be it for the explanation here hopefully it was a bit more clear and the intuition is there for you because it's important to be aware of these when you are developing with language models uh today in my experience we use them less so like we temperature still right sometimes we experiment with topy u Maxin sometimes because of cost to control cost um but you know and and this one is more specific to some use cases like code generation and this one we use it less because it's these models have less issues like generating repeated tokens or repeated words so hopefully that was useful if you have any questions please leave a comment on the YouTube page and I'll be looking at those and I'll try to provide you more guidance if that if there's a need or try to send you to some kind of link for you to get a more technical explanation if you're interested in that just let me know and I'll see you in the next one

Original Description

To learn how to build with LLMs, check out my new courses here: https://dair-ai.thinkific.com/ Use code YOUTUBE20 to get an extra 20% off. The discount is limited to the first 500 students so make sure to enroll early. --- An explainer for understanding various LLM settings such as temperature, top_p, frequency penalty, stop sequence, and more. More in our guide: https://www.promptingguide.ai/introduction/settings #llms #ai #chatgpt #machinelearning #programming

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Elvis Saravia · Elvis Saravia · 37 of 60

← Previous Next →

101 ways to solve search (by Pratik Bhavsar)

101 ways to solve search (by Pratik Bhavsar)

TLDR Generation of Scientific Documents | ML Interview #1 with Isabel Cachola

TLDR Generation of Scientific Documents | ML Interview #1 with Isabel Cachola

Sentiment Analysis: Key Milestones, Challenges and New Directions

Sentiment Analysis: Key Milestones, Challenges and New Directions

Discriminative Adversarial Search for Abstractive Summarization (by Thomas Scialom)

Discriminative Adversarial Search for Abstractive Summarization (by Thomas Scialom)

Question Understanding: COVID-Q: 1,600+ Questions about COVID-19

Question Understanding: COVID-Q: 1,600+ Questions about COVID-19

Getting Started with NLP

Getting Started with NLP

Building tools and frameworks for large-scale social media mining (by Dr. Juan M. Banda)

Building tools and frameworks for large-scale social media mining (by Dr. Juan M. Banda)

TextAttack: A Framework for Data Augmentation and Adversarial Training in NLP

TextAttack: A Framework for Data Augmentation and Adversarial Training in NLP

Dive into Deep Learning (Study Group): Introduction to Deep Learning | Session 1

Dive into Deep Learning (Study Group): Introduction to Deep Learning | Session 1

Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4

Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4

How I read and annotate ML papers

How I read and annotate ML papers

Keep Learning ML (Session 1) | DSV, CompLex, Modern tools for emotions

Keep Learning ML (Session 1) | DSV, CompLex, Modern tools for emotions

Dive into Deep Learning (Study Group): Preliminaries | Session 2

Dive into Deep Learning (Study Group): Preliminaries | Session 2

Keep Learning ML #2 | Language-conditioned policy learning, Effective ML Testing, EagerPy

Keep Learning ML #2 | Language-conditioned policy learning, Effective ML Testing, EagerPy

Dive into Deep Learning (Study Group): Linear Neural Networks | Session 3

Dive into Deep Learning (Study Group): Linear Neural Networks | Session 3

Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4

Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4

Keep Learning ML #3 | Contrastively Trained Structured World Models

Keep Learning ML #3 | Contrastively Trained Structured World Models

Dive into Deep Learning (Study Group): Deep Learning Computation with PyTorch | Session 5

Dive into Deep Learning (Study Group): Deep Learning Computation with PyTorch | Session 5

Dive into Deep Learning (Study Group): Convolutional Neural Networks | Session 6

Dive into Deep Learning (Study Group): Convolutional Neural Networks | Session 6

Dive into Deep Learning (Study Group): Modern CNNs | Session 7

Dive into Deep Learning (Study Group): Modern CNNs | Session 7

101 ways to solve neural search with Jina

101 ways to solve neural search with Jina

(Hopefully-Reusable) Life Lessons for PhD Students in NLP

(Hopefully-Reusable) Life Lessons for PhD Students in NLP

How to save the world and forward your career in 5 easy steps | Women in NLP Talks

How to save the world and forward your career in 5 easy steps | Women in NLP Talks

Prompt Engineering Overview

Prompt Engineering Overview

Getting Started with the OpenAI Playground

Getting Started with the OpenAI Playground

LM-Guided Chain of Thought

LM-Guided Chain of Thought

Elements of a Prompt

Elements of a Prompt

Reasoning with Intermediate Revision and Search with LLMs #chatgpt #ai #llms #science #programming

Reasoning with Intermediate Revision and Search with LLMs #chatgpt #ai #llms #science #programming

General Tips for Designing Prompts

General Tips for Designing Prompts

Efficient Infinite Context Transformers #ai #machinelearning #research #llms #science

Efficient Infinite Context Transformers #ai #machinelearning #research #llms #science

Best Practices and Lessons Learned on Synthetic Data for Language Models #ai #machinelearning #genai

Best Practices and Lessons Learned on Synthetic Data for Language Models #ai #machinelearning #genai

Reducing Hallucinations in Structured Outputs via RAG #chatgpt #ai #llms #programming

Reducing Hallucinations in Structured Outputs via RAG #chatgpt #ai #llms #programming

Basic Prompt Examples for LLMs

Basic Prompt Examples for LLMs

LLM In Context Recall is Prompt Dependent #llms #ai #chatgpt #machinelearning

LLM In Context Recall is Prompt Dependent #llms #ai #chatgpt #machinelearning

Zero-shot Prompting Explained

Zero-shot Prompting Explained

RAG Faithfulness #llms #ai #gpt4

RAG Faithfulness #llms #ai #gpt4

Understanding LLM Settings

Understanding LLM Settings

Llama 3 is here! | First impressions and thoughts

Llama 3 is here! | First impressions and thoughts

Llama 3 is Here! #ai #llms #llama3

Llama 3 is Here! #ai #llms #llama3

Microsoft introduces Phi-3 | The most capable small language model?

Microsoft introduces Phi-3 | The most capable small language model?

Microsoft introduces Phi-3! #ai #llms #microsoft

Microsoft introduces Phi-3! #ai #llms #microsoft

Make Your LLM Fully Utilize the Context #ai #llms #machinelearning

Make Your LLM Fully Utilize the Context #ai #llms #machinelearning

When to Retrieve? #ai #llms #machinelearning

When to Retrieve? #ai #llms #machinelearning

Training an LLM to effectively use information retrieval

Training an LLM to effectively use information retrieval

State-of-the-art open-source LLM judges #ai #machinelearning #gpt4

State-of-the-art open-source LLM judges #ai #machinelearning #gpt4

Better and Faster LLMs via Multi-token Prediction

Better and Faster LLMs via Multi-token Prediction

AlphaMath Almost Zero #ai #science #machinelearning

AlphaMath Almost Zero #ai #science #machinelearning

SWE-Agent | An LLM-based Software Engineering Agent

SWE-Agent | An LLM-based Software Engineering Agent

[LLM NEWS] AlphaFold 3, xLSTM, OpenAI's Model Spec, DeepSeek-V2, OpenDevin CodeAct 1.0

[LLM NEWS] AlphaFold 3, xLSTM, OpenAI's Model Spec, DeepSeek-V2, OpenDevin CodeAct 1.0

LLM-powered tool for web scraping #ai #chatgpt #engineering

LLM-powered tool for web scraping #ai #chatgpt #engineering

Learn about LLMs in this NEW course #ai #chatgpt #engineering

Learn about LLMs in this NEW course #ai #chatgpt #engineering

[LLM NEWS] KANs, Gemma 10M Context, OpenAI Updates?, Automatic Prompt Engineering, Tokenizer Arena

[LLM NEWS] KANs, Gemma 10M Context, OpenAI Updates?, Automatic Prompt Engineering, Tokenizer Arena

[LLM News] GPT4-o, Project Astra, Veo, Copilot+ PCs, Gemini 1.5 Flash, Chameleon

[LLM News] GPT4-o, Project Astra, Veo, Copilot+ PCs, Gemini 1.5 Flash, Chameleon

Enhancing Answer Selection in LLMs #ai #machinelearning #engineering

Enhancing Answer Selection in LLMs #ai #machinelearning #engineering

On exploring LLMs #ai #promptengineering #chatgpt

On exploring LLMs #ai #promptengineering #chatgpt

Transformers Can Do Arithmetic with the Right Embeddings #ai #machinelearning #engineering

Transformers Can Do Arithmetic with the Right Embeddings #ai #machinelearning #engineering

[LLM News] xAI Series B, Codestral, LLM Guide, AutoGen Course, Symbolic Chain-of-Thought

[LLM News] xAI Series B, Codestral, LLM Guide, AutoGen Course, Symbolic Chain-of-Thought

PR-Agent #ai #gpt4 #software

PR-Agent #ai #gpt4 #software

Extracting features from Claude 3 Sonnet

Extracting features from Claude 3 Sonnet

Has prompt engineering been solved?

Has prompt engineering been solved?

The video teaches how to understand and use various LLM settings to get desirable results in different use cases, including fact-based question answering and creative tasks like email generation or lyrics generation. It explains how to use tools like Open AI Playground and LLM providers to experiment with different settings. By watching this video, viewers can learn how to build and tune LLM models for optimal performance.

Key Takeaways

Experiment with different temperature values to control randomness and confidence in model responses
Use top P sampling technique to enable the model to look at more possible words and increase diversity in output
Adjust max length, stop sequences, frequency penalty, and presence penalty settings depending on the use case
Use LLM providers like Open AI Playground to test and refine LLM models
Fine-tune LLM models for specific tasks like fact-based question answering or creative tasks like email generation or lyrics generation

💡 The key to getting desirable results from LLM models is to experiment with different settings and use cases to find the optimal combination for the specific task at hand.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

Sub-10ms AI Workflows: Accelerating sim.ai with On-Device Semantic Search using Moss

Learn how to accelerate AI workflows with on-device semantic search using Moss, achieving sub-10ms response times and improving user experience

Medium · Machine Learning

Anthropic Built a $100M Club for Its Smartest AI. You’re Probably Not In It.

Learn about Anthropic's Project Glasswing, a $100M club for its smartest AI, and understand the strategy behind it

Stop Guessing: Guaranteed Structured Output from LLMs in Node.js

Learn to guarantee structured output from LLMs in Node.js and stop parsing JSON manually

Dev.to · Hardik Mehta

Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)

Build a REST endpoint with Spring Boot 3 and OpenAI to create an LLM-powered API, leveraging the power of AI in your applications

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)