Sampling
Here, we will discuss the sampling settings that influence the output of Large Language Models (LLMs). This guide covers parameters such as Temperature, N, Top P, Presence Penalty, and Frequency Penalty, and explains how to adjust them. Whether you aim to generate creative content or ensure accurate responses, understanding these settings is key.
Available Sampling Parameters
Let's explore each parameter and learn how to fine-tune LLM outputs effectively.
N represents the number of completions to return for each request. This parameter is useful when you want to generate multiple responses for a single input. Each completion will be a unique response generated by the model, providing a variety of outputs to choose from.
Key Points
- Multiple Responses: By setting
N
to a value greater than 1, you can get multiple responses for the same input. - Cost Efficiency: Input tokens are only billed once, regardless of the number of completions requested. This makes it cost-effective to explore different possibilities.
Example
Here's an example of how to use the N
parameter in the API:
import os
from mistralai import Mistral
api_key = os.environ["MISTRAL_API_KEY"]
model = "ministral-3b-latest"
client = Mistral(api_key=api_key)
chat_response = client.chat.complete(
model=model,
messages=[
{
"role": "user",
"content": "What is the best mythical creature? Answer with a single word.",
},
],
temperature = 1, # Increasing randomness and diversity of the output, this is required to be higher than 0 to have diverse outputs
n = 10 # Number of completions
)
for i, choice in enumerate(chat_response.choices):
print(choice.message.content)
In this example, the model generates 10 responses for the same input prompt. This allows you to see a variety of possible answers and choose the one that best fits your needs.