Sampling

Here, we will discuss the sampling settings that influence the output of Large Language Models (LLMs). This guide covers parameters such as Temperature, N, Top P, Presence Penalty, and Frequency Penalty, and explains how to adjust them. Whether you aim to generate creative content or ensure accurate responses, understanding these settings is key.

Available Sampling Parameters

Let's explore each parameter and learn how to fine-tune LLM outputs effectively.

N represents the number of completions to return for each request. This parameter is useful when you want to generate multiple responses for a single input. Each completion will be a unique response generated by the model, providing a variety of outputs to choose from.

note

Currently mistral-large-2512 does not support N completions.

Key Points

Multiple Responses: By setting N to a value greater than 1, you can get multiple responses for the same input.
Cost Efficiency: Input tokens are only billed once, regardless of the number of completions requested. This makes it cost-effective to explore different possibilities.

Example

Here's an example of how to use the N parameter in the API:

import os
from mistralai import Mistral

api_key = os.environ["MISTRAL_API_KEY"]
model = "ministral-3b-latest"

client = Mistral(api_key=api_key)

chat_response = client.chat.complete(
    model=model,
    messages=[
        {
            "role": "user",
            "content": "What is the best mythical creature? Answer with a single word.",
        },
    ],
    temperature = 1, # Increasing randomness and diversity of the output, this is required to be higher than 0 to have diverse outputs
    n = 10 # Number of completions
)

for i, choice in enumerate(chat_response.choices):
    print(choice.message.content)

import os
from mistralai import Mistral

api_key = os.environ["MISTRAL_API_KEY"]
model = "ministral-3b-latest"

client = Mistral(api_key=api_key)

chat_response = client.chat.complete(
    model=model,
    messages=[
        {
            "role": "user",
            "content": "What is the best mythical creature? Answer with a single word.",
        },
    ],
    temperature = 1, # Increasing randomness and diversity of the output, this is required to be higher than 0 to have diverse outputs
    n = 10 # Number of completions
)

for i, choice in enumerate(chat_response.choices):
    print(choice.message.content)

In this example, the model generates 10 responses for the same input prompt. This allows you to see a variety of possible answers and choose the one that best fits your needs.