Understanding LLM Settings
Key Takeaways
The video explains various LLM settings such as temperature, top P, max length, stop sequences, frequency penalty, and presence penalty, and how to use them to get desirable results in different use cases, including fact-based question answering and creative tasks like email generation or lyrics generation, using tools like Open AI Playground and LLM providers.
Full Transcript
hi everyone in this video I want to talk about llm settings so the idea of this section in our prompting guide is to tell you a little bit about how to use these llm settings so when you're exploring and experimenting and prompting these models there are a couple of settings that you can tune to get the desirable results that you want now if you are coming from the world of chat GPT right if you use chat GPT the conversational chatbot from openi you may not know that these models are actually using some specific fixed settings uh you don't see them you cannot really tweak those you cannot configure them but if you come from the world of apis you do have access to certain settings that you can configure and you can adjust to get the results that you want so this is is very popular among developers so this only applies to you if you're using some type of llm apis right this could be any provider it could be openc or any of these other llm providers so what I want to do in this video is to go through a few of these settings and explain to you with some examples how you can leverage these settings there are a couple of settings that do stand out here when using large language models via apis if you go to the playground you pretty much have an idea on what are these important settings so you have them right here for instance in the open ey playground you have what's called temperature uh maximum length stop sequencies stop B frequency penalty presence penalty and what we have done in our guide is basically provide you some explanations as to what these are now in this video what I want to do is I want to kind of quickly go over these ideas and try to explain to you how you can leverage them when you're developing with these models I must say that we often don't really talk about temperature or topy or you know most of these settings but they're actually quite important and useful uh but it really depends on what you're aiming to achieve so let's go through some of these so I'll start with temperature now temperature basically helps you to it's it's a value right and and you can see here in the playground is a value that ranges from zero all the way to two um the default is one right so this is the default that open eye playground has set for you right and sometimes when we are doing the examples in the playground we don't even look at this but that's there for you and you can see the definition here right it actually controls Randomness so what does it mean by that basically the way I understand temperature is you can increase it or decrease it right and this decreases or increases the the confidence a model has in its most likely response so if you look at our uh definition for it here right you can see that you're essentially increasing the weights of the other possible tokens if you are increasing the temperature value and why is this useful so it's useful because it really depends on the task right so let's say we were dealing with some kind of fact-based question answering you know task or application right we want to encourage them all to be more factual and less random in these responses right or less diverse in what it is outputting right at the end of the day it's outputting these sequence of tokens right and we want those tokens to be what the mall is confident in generating and so if we want that what we do is we basically decrease the temperature right the closer it is to zero right the less random those opos are going to be so you can imagine that yes for fact-based question answering it's pretty useful to have those low temperature values or use those temperature values that are kind of kind of lower closer to zero now if you're doing something like email generation or some kind of Point generation or you're generating lyrics or something like that that's more creative on a creative side it is beneficial to increase the temperature value and experiment with increasing those however do note that as you increase the temperature value something that we have seen in our experiments right by that I mean that you can increase it all the way to two something that we have seen is that they become so Random to the point where the model is basically producing like giberish right producing something that doesn't make any sense nonsensical sequence of tokens so be very careful when you're setting these temperature values really high when you're doing it low you know this is less of a problem right because it's less random but when you're doing it you know above one and 1.5 or something like that um be very careful about that and you have to do a lot of experimentation to to see what the model is up putting for your application hopefully that makes sense now I think temperature is one of the more important llm settings there are other configurations as well like top p and I see this with all the language model providers right so it's really good to be familiar with these Concepts and top P basically is you could consider it like a sampling technique it's almost like an alternative in a way and the reason I say that is because um it is a very similar concept to temperature um and actually if you look at the documentation of opening eye you can see that they're telling you that it's basically you know they recommend to use stop P if you're using top P don't use temperature and if you're using temperature don't use top P right so do not use both at the same time just try to set one and that should be fine and that tells you that it's basically an alternative sampling technique here um with temperature so the idea of top the way I understood top p is that you have a high top p uh value this basically enables the model to look at more possible words right including the ones that are less likely uh which leads to more diverse output so it has very similar effect to temperature although you may get obviously different results when you use temperature compared to when you stop B so if you're if you're experimenting with temperature you're not getting those desired results then maybe you can you know just leave temperature default value and then kind of go to topy and experiment with topy that's how I generally use it I never use both at the same time in fact these days um I focus a lot on promp engineering like optimizing The Prompt as opposed to messing around with the temperature or these uh top P values so that's just something to note here uh you can read the full definition here um there's a lot of good content that goes into like the technical details of these configurations but I think it's what I've explained is is good enough uh just like the intuition of it and when you may want to use it and when not so you can see here the general recommendation is to alter temperature or top people that both and I think this does apply to most of the llm providers so if you're using something like fireworks if you're using like a here or Cloud uh Gemini whatever that may be um I think you you might consider this recommendation when you're doing that um now I've heard I've read in some forums that actually some developers combine both of them and they are getting good quality responses from these models but that's something that's an exception I really rarely see this to be the case and we rarely use it this way now there are other settings like Max land stop sequencies frequency penalty uh presence penalty and so on um I'll just go briefly through each one of these these we use less it really depends really on the circumstances or our use cases so let's say we are trying to prevent some irrelevant responses which is I would say less of a problem now with these models however there is the problem of cost right molds are getting cheaper to you so so you can make an argument that this is less important however when we started with these language models right we they started really expensive and it was really nice to be able to control like how much tokens uh you know how much tokens the model can generate uh so that you can control cost right so the model can go on and on and on generating taxt and and so and it doesn't finish and then next thing you know you have a really high bill so try to you know use this and and it really depends again on the use case and your needs now stop sequence is another interesting one basically you define a string right that stops the model from generating tokens right so you can you can have for instance in the open ey playground there is a stop sequence here right and they even explain to you what it is so here you you can provide whatever sequence you you are using or whatever sequence you are expecting them all to Output as the final token right um again we rarely use this one it it I think it's very Niche and and it really applies only to some type of of task and we have used it for instance for like when we are generating code that it's really interesting to use it in that setting uh because we want the model to like don't explain the code just kind of output the code and we know what the stop sequencies are going to be and so on now we have this frequency penalty presence penalty now if you are familiar with language models um if you go by back a few years you will know that these language models used to generate a lot of like repeated text right and that was in very common issue with these models today it's less of a problem I would say and now if you are still having that problem if you're still facing that problem with some of these language modes it could be the case that you're seeing this um that the mod is repeating certain tokens or using certain words in its response a lot if you want to control for that what you can do is you can use the frequency penalty and it's available right on the playground right so the more you increase this the more it penalizes the the model and it avoids the model from outputting or repeating you know certain words right um so that's the idea of the frequency the presence is very similar so basically this one prevents the mod from repeating phrases often it's in its response right so it it you know unlike the other one which is is a frequency penalty uh the penalty is the same for all repeated tokens which means you know it's going to avoid this is a good way to avoid um the model from repeating certain sequences or certain phrases too often so yes that would be it for the explanation here hopefully it was a bit more clear and the intuition is there for you because it's important to be aware of these when you are developing with language models uh today in my experience we use them less so like we temperature still right sometimes we experiment with topy u Maxin sometimes because of cost to control cost um but you know and and this one is more specific to some use cases like code generation and this one we use it less because it's these models have less issues like generating repeated tokens or repeated words so hopefully that was useful if you have any questions please leave a comment on the YouTube page and I'll be looking at those and I'll try to provide you more guidance if that if there's a need or try to send you to some kind of link for you to get a more technical explanation if you're interested in that just let me know and I'll see you in the next one
Original Description
To learn how to build with LLMs, check out my new courses here: https://dair-ai.thinkific.com/
Use code YOUTUBE20 to get an extra 20% off. The discount is limited to the first 500 students so make sure to enroll early.
---
An explainer for understanding various LLM settings such as temperature, top_p, frequency penalty, stop sequence, and more.
More in our guide: https://www.promptingguide.ai/introduction/settings
#llms #ai #chatgpt #machinelearning #programming
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Elvis Saravia · Elvis Saravia · 37 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
▶
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
101 ways to solve search (by Pratik Bhavsar)
Elvis Saravia
TLDR Generation of Scientific Documents | ML Interview #1 with Isabel Cachola
Elvis Saravia
Sentiment Analysis: Key Milestones, Challenges and New Directions
Elvis Saravia
Discriminative Adversarial Search for Abstractive Summarization (by Thomas Scialom)
Elvis Saravia
Question Understanding: COVID-Q: 1,600+ Questions about COVID-19
Elvis Saravia
Getting Started with NLP
Elvis Saravia
Building tools and frameworks for large-scale social media mining (by Dr. Juan M. Banda)
Elvis Saravia
TextAttack: A Framework for Data Augmentation and Adversarial Training in NLP
Elvis Saravia
Dive into Deep Learning (Study Group): Introduction to Deep Learning | Session 1
Elvis Saravia
Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4
Elvis Saravia
How I read and annotate ML papers
Elvis Saravia
Keep Learning ML (Session 1) | DSV, CompLex, Modern tools for emotions
Elvis Saravia
Dive into Deep Learning (Study Group): Preliminaries | Session 2
Elvis Saravia
Keep Learning ML #2 | Language-conditioned policy learning, Effective ML Testing, EagerPy
Elvis Saravia
Dive into Deep Learning (Study Group): Linear Neural Networks | Session 3
Elvis Saravia
Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4
Elvis Saravia
Keep Learning ML #3 | Contrastively Trained Structured World Models
Elvis Saravia
Dive into Deep Learning (Study Group): Deep Learning Computation with PyTorch | Session 5
Elvis Saravia
Dive into Deep Learning (Study Group): Convolutional Neural Networks | Session 6
Elvis Saravia
Dive into Deep Learning (Study Group): Modern CNNs | Session 7
Elvis Saravia
101 ways to solve neural search with Jina
Elvis Saravia
(Hopefully-Reusable) Life Lessons for PhD Students in NLP
Elvis Saravia
How to save the world and forward your career in 5 easy steps | Women in NLP Talks
Elvis Saravia
Prompt Engineering Overview
Elvis Saravia
Getting Started with the OpenAI Playground
Elvis Saravia
LM-Guided Chain of Thought
Elvis Saravia
Elements of a Prompt
Elvis Saravia
Reasoning with Intermediate Revision and Search with LLMs #chatgpt #ai #llms #science #programming
Elvis Saravia
General Tips for Designing Prompts
Elvis Saravia
Efficient Infinite Context Transformers #ai #machinelearning #research #llms #science
Elvis Saravia
Best Practices and Lessons Learned on Synthetic Data for Language Models #ai #machinelearning #genai
Elvis Saravia
Reducing Hallucinations in Structured Outputs via RAG #chatgpt #ai #llms #programming
Elvis Saravia
Basic Prompt Examples for LLMs
Elvis Saravia
LLM In Context Recall is Prompt Dependent #llms #ai #chatgpt #machinelearning
Elvis Saravia
Zero-shot Prompting Explained
Elvis Saravia
RAG Faithfulness #llms #ai #gpt4
Elvis Saravia
Understanding LLM Settings
Elvis Saravia
Llama 3 is here! | First impressions and thoughts
Elvis Saravia
Llama 3 is Here! #ai #llms #llama3
Elvis Saravia
Microsoft introduces Phi-3 | The most capable small language model?
Elvis Saravia
Microsoft introduces Phi-3! #ai #llms #microsoft
Elvis Saravia
Make Your LLM Fully Utilize the Context #ai #llms #machinelearning
Elvis Saravia
When to Retrieve? #ai #llms #machinelearning
Elvis Saravia
Training an LLM to effectively use information retrieval
Elvis Saravia
State-of-the-art open-source LLM judges #ai #machinelearning #gpt4
Elvis Saravia
Better and Faster LLMs via Multi-token Prediction
Elvis Saravia
AlphaMath Almost Zero #ai #science #machinelearning
Elvis Saravia
SWE-Agent | An LLM-based Software Engineering Agent
Elvis Saravia
[LLM NEWS] AlphaFold 3, xLSTM, OpenAI's Model Spec, DeepSeek-V2, OpenDevin CodeAct 1.0
Elvis Saravia
LLM-powered tool for web scraping #ai #chatgpt #engineering
Elvis Saravia
Learn about LLMs in this NEW course #ai #chatgpt #engineering
Elvis Saravia
[LLM NEWS] KANs, Gemma 10M Context, OpenAI Updates?, Automatic Prompt Engineering, Tokenizer Arena
Elvis Saravia
[LLM News] GPT4-o, Project Astra, Veo, Copilot+ PCs, Gemini 1.5 Flash, Chameleon
Elvis Saravia
Enhancing Answer Selection in LLMs #ai #machinelearning #engineering
Elvis Saravia
On exploring LLMs #ai #promptengineering #chatgpt
Elvis Saravia
Transformers Can Do Arithmetic with the Right Embeddings #ai #machinelearning #engineering
Elvis Saravia
[LLM News] xAI Series B, Codestral, LLM Guide, AutoGen Course, Symbolic Chain-of-Thought
Elvis Saravia
PR-Agent #ai #gpt4 #software
Elvis Saravia
Extracting features from Claude 3 Sonnet
Elvis Saravia
Has prompt engineering been solved?
Elvis Saravia
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Sub-10ms AI Workflows: Accelerating sim.ai with On-Device Semantic Search using Moss
Medium · Machine Learning
Anthropic Built a $100M Club for Its Smartest AI. You’re Probably Not In It.
Medium · LLM
Stop Guessing: Guaranteed Structured Output from LLMs in Node.js
Dev.to · Hardik Mehta
Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI