Usage
Large Language Models (LLMs) are AI systems that generate text and engage in conversational interactions. They are fine-tuned to follow instructions and respond naturally to prompts—inputs like questions, instructions, or task examples. The model processes the prompt and produces a relevant text output as its response. Below, we have an overview on how to use the Chat Completion API to generate text and engage in conversational interactions with Mistral AI models.
Chat Completion
Use Chat Completions
The Chat Completion API accepts a list of chat messages as input and
generates a response. This response is in the form of a new chat message with
the role "assistant" as output, the "content" of each response can either be a string
or a list
of chunks with different kinds of chunk types for different features. Visit our API spec for more details.
For non-streaming chat completions requests, you will provid a list of messages and the model will return a single full completion response. This response will contain the full completion until the model decides to stop or the maximum number of tokens is reached, important to know that the longer the output and full completion, the higher the latency.
Note that the response content of the model can have interleaved events instead of a single string, such as citations and tool calls.
The content can be either a string, the most standard usage of llms:
{'content': '...'}
...or a list of different types of contents:
{'content': [{'type': 'text', 'text': '...'}, {'type': '...', '...': [...]}, ...]}
.
import os
from mistralai import Mistral
api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-medium-latest"
client = Mistral(api_key=api_key)
chat_response = client.chat.complete(
model = model,
messages = [
{
"role": "user",
"content": "What is the best French cheese?",
},
]
)
Chat Messages
Chat messages
are a collection of prompts or messages, with each message having a specific role assigned to it, such as "system," "user," "assistant," or "tool."
- A
system message
is an optional message that sets the behavior and context for an AI assistant in a conversation, such as modifying its personality or providing specific instructions. A system message can include task instructions, personality traits, contextual information, creativity constraints, and other relevant guidelines to help the AI better understand and respond to the user's input. See the prompting for explanations on prompting capabilities in general. - A
user message
is a message sent from the perspective of the human in a conversation with an AI assistant. It typically provides a request, question, or comment that the AI assistant should respond to. User prompts allow the human to initiate and guide the conversation, and they can be used to request information, ask for help, provide feedback, or engage in other types of interaction with the AI. - An
assistant message
is a message sent by the AI assistant back to the user. It is usually meant to reply to a previous user message by following its instructions, but you can also find it at the beginning of a conversation, for example to greet the user. - A
tool message
only appears in the context of function calling, it is used at the final response formulation step when the model has to format the tool call's output for the user. To learn more about function calling, see the guide.
When to use user
prompt vs. system
message then user
message?
- You can either combine your
system
message anduser
message into a singleuser
message or separate them into two distinct messages. - We recommend you experiment with both ways to determine which one works better for your specific use case.
Multi-turn
Chat Completions can be used for multi-turn conversations. This means that you can send multiple messages back and forth between the user and the assistant. This is useful for applications like chatbots, where the user can have a conversation with the assistant.
Interesting to note that you ma have different events interleaved between these interactions, such as tool calls for function calling, or even handoffs when handling agents.
if you are interested on a simplified way to handle multi-turn conversations, you may want to check out our Agents and Conversations APIs. Managing multi-turn conversations can be complex, and our APIs are designed to simplify this process while providing you with built-in tools and connectors.
import os
from mistralai import Mistral
api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-medium-latest"
client = Mistral(api_key=api_key)
chat_response = client.chat.complete(
model=model,
messages=[
{
"role": "user",
"content": "What is the capital of France?",
},
{
"role": "assistant",
"content": "The Capital of France is **Paris**.",
},
{
"role": "user",
"content": "Translate that to French.",
},
],
)
Other useful Features
Our Chat Completions service also has other features that can be used to customize your requests.
- The
prefix
flag enables prepending content to the assistant's response content. When used in a message, it allows the addition of an assistant's message at the end of the list, which will be prepended to the assistant's response. - The
safe_prompt
flag is used to force chat completion to be moderated against sensitive content (see Guardrailing). - A
stop
sequence allows forcing the model to stop generating after one or more chosen tokens or strings. The output will not contain the stop sequence.
You can find short examples on how to use them below.

¡Meow! Click one of the tabs above to learn more.
More
This was a simple introduction to our Chat Completions service, however we have a lot more to offer we recommend taking a look; from Vision capabilities, to Function Calling, Predicted Outputs, Structured Outputs and much more.