«`html
5 Common LLM Parameters Explained with Examples
Large language models (LLMs) offer several parameters that let you fine-tune their behavior and control how they generate responses. If a model isn’t producing the desired output, the issue often lies in how these parameters are configured. In this tutorial, we’ll explore some of the most commonly used ones — max_completion_tokens, temperature, top_p, presence_penalty, and frequency_penalty — and understand how each influences the model’s output.
Installing the Dependencies
To get started, you will need to install the required dependencies:
pip install openai pandas matplotlib
Loading OpenAI API Key
Next, load your OpenAI API key:
import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')
Initializing the Model
Initialize the model using the following code:
from openai import OpenAI
model="gpt-4.1"
client = OpenAI()
Max Tokens
Max Tokens is the maximum number of tokens the model can generate during a run. The model will try to stay within this limit across all turns. If it exceeds the specified number, the run will stop and be marked as incomplete.
A smaller value (like 16) limits the model to very short answers, while a higher value (like 80) allows it to generate more detailed and complete responses. Increasing this parameter gives the model more room to elaborate, explain, or format its output more naturally.
prompt = "What is the most popular French cheese?"
for tokens in [16, 30, 80]:
print(f"\n--- max_output_tokens = {tokens} ---")
response = client.chat.completions.create(
model=model,
messages=[
{"role": "developer", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
max_completion_tokens=tokens
)
print(response.choices[0].message.content)
Temperature
In LLMs, the temperature parameter controls the diversity and randomness of generated outputs. Lower temperature values make the model more deterministic and focused on the most probable responses — ideal for tasks that require accuracy and consistency. Higher values, on the other hand, introduce creativity and variety by allowing the model to explore less likely options.
prompt = "What is one intriguing place worth visiting? Give a single-word answer and think globally."
temperatures = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.5]
n_choices = 10
results = {}
for temp in temperatures:
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
temperature=temp,
n=n_choices
)
results[temp] = [response.choices[i].message.content.strip() for i in range(n_choices)]
for temp, responses in results.items():
print(f"\n--- temperature = {temp} ---")
print(responses)
Top P
Top P (also known as nucleus sampling) is a parameter that controls how many tokens the model considers based on a cumulative probability threshold. It helps the model focus on the most likely tokens, often improving coherence and output quality.
The generation process works as follows:
- Apply the temperature to adjust the token probabilities.
- Use Top P to retain only the most probable tokens that together make up 50% of the total probability mass.
- Renormalize the remaining probabilities before sampling.
prompt = "What is one intriguing place worth visiting? Give a single-word answer and think globally."
temperatures = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.5]
n_choices = 10
results_ = {}
for temp in temperatures:
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
temperature=temp,
n=n_choices,
top_p=0.5
)
results_[temp] = [response.choices[i].message.content.strip() for i in range(n_choices)]
for temp, responses in results_.items():
print(f"\n--- temperature = {temp} ---")
print(responses)
Frequency Penalty
Frequency Penalty controls how much the model avoids repeating the same words or phrases in its output.
Range: -2 to 2
Default: 0
A higher frequency penalty encourages the model to choose new and different words, making the text more varied and less repetitive.
prompt = "List 10 possible titles for a fantasy book. Give the titles only and each title on a new line."
frequency_penalties = [-2.0, -1.0, 0.0, 0.5, 1.0, 1.5, 2.0]
results = {}
for fp in frequency_penalties:
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
frequency_penalty=fp,
temperature=0.2
)
text = response.choices[0].message.content
items = [line.strip("- ").strip() for line in text.split("\n") if line.strip()]
results[fp] = items
for fp, items in results.items():
print(f"\n--- frequency_penalty = {fp} ---")
print(items)
Presence Penalty
Presence Penalty controls how much the model avoids repeating words or phrases that have already appeared in the text.
Range: -2 to 2
Default: 0
A higher presence penalty encourages the model to use a wider variety of words, making the output more diverse and creative.
prompt = "List 10 possible titles for a fantasy book. Give the titles only and each title on a new line."
presence_penalties = [-2.0, -1.0, 0.0, 0.5, 1.0, 1.5, 2.0]
results = {}
for fp in frequency_penalties:
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
presence_penalty=fp,
temperature=0.2
)
text = response.choices[0].message.content
items = [line.strip("- ").strip() for line in text.split("\n") if line.strip()]
results[fp] = items
for fp, items in results.items():
print(f"\n--- presence_penalties = {fp} ---")
print(items)
For further exploration of these parameters, feel free to check out our GitHub page for tutorials, codes, and notebooks.
«`