Native

tip

Before continuing, we recommend reading the Chat Completions documentation to learn more about the chat completions API and how to use it before proceeding.

Before you start

Models

Native reasoning models always generate thinking traces without any extra parameters:

magistral-small-latest: Our open smaller version for open research and efficient reasoning.
magistral-medium-latest: Our more powerful reasoning model balancing performance and cost.

Information

Currently, -latest points to -2509, our most recent version of our reasoning models. If you were previously using -2506, a migration regarding the thinking chunks is required.

-2509 & -2507 (new): Uses tokenized thinking chunks via control tokens, providing the thinking traces in different types of content chunks.
-2506 (old): Used <think>\n and \n</think>\n tags as strings to encapsulate the thinking traces for input and output within the same content type.

¡Meow! Click one of the tabs above to learn more.

Usage

How to use Native Reasoning Models

Native reasoning models work similarly to standard models — reasoning is always active and no extra parameter is required. However, they generate more tokens for the reasoning step and perform best with a specific system prompt.

System Prompt

To have the best performance out of native reasoning models, we recommend having the following system prompt (currently default):

{
  "role": "system",
  "content": [
    {
      "type": "text",
      "text": "# HOW YOU SHOULD THINK AND ANSWER\n\nFirst draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input.\n\nYour thinking process must follow the template below:"
    },
    {
      "type": "thinking",
      "thinking": [
        {
          "type": "text",
          "text": "Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response to the user."
        }
      ]
    },
    {
      "type": "text",
      "text": "Here, provide a self-contained response."
    }
  ]
}

{
  "role": "system",
  "content": [
    {
      "type": "text",
      "text": "# HOW YOU SHOULD THINK AND ANSWER\n\nFirst draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input.\n\nYour thinking process must follow the template below:"
    },
    {
      "type": "thinking",
      "thinking": [
        {
          "type": "text",
          "text": "Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response to the user."
        }
      ]
    },
    {
      "type": "text",
      "text": "Here, provide a self-contained response."
    }
  ]
}

Providing your own system prompt will override the default system prompt with the new one.

You can also control the system prompt behavior via the prompt_mode parameter:

"reasoning": Explicitly uses the default reasoning system prompt.
null: Opts out of the default system prompt entirely.

Native reasoning with chat completions

You can use our native reasoning models in a similar way to how you would use our other text models, here is an example via our chat completions endpoint:

import os
from mistralai.client import Mistral

api_key = os.environ["MISTRAL_API_KEY"]
model = "magistral-medium-latest"

client = Mistral(api_key=api_key)

chat_response = client.chat.complete(
    model = model,
    messages = [
        {
            "role": "user",
            "content": "John is one of 4 children. The first sister is 4 years old. Next year, the second sister will be twice as old as the first sister. The third sister is two years older than the second sister. The third sister is half the age of her older brother. How old is John?",
        },
    ],
    # prompt_mode = "reasoning" if you want to explicitly use the default system prompt, or None if you want to opt out of the default system prompt.
)

import os
from mistralai.client import Mistral

api_key = os.environ["MISTRAL_API_KEY"]
model = "magistral-medium-latest"

client = Mistral(api_key=api_key)

chat_response = client.chat.complete(
    model = model,
    messages = [
        {
            "role": "user",
            "content": "John is one of 4 children. The first sister is 4 years old. Next year, the second sister will be twice as old as the first sister. The third sister is two years older than the second sister. The third sister is half the age of her older brother. How old is John?",
        },
    ],
    # prompt_mode = "reasoning" if you want to explicitly use the default system prompt, or None if you want to opt out of the default system prompt.
)

Native reasoning output

Below we provide the full output of the model so you can see the reasoning traces and the final answer in detail.

The output of the model will include different chunks of content, but mostly a thinking type with the reasoning traces and a text type with the answer like so:

"content": [
  {
    "type": "thinking",
    "thinking": [
      {
        "type": "text",
        "text": "*Thoughts and reasoning traces will go here.*"
      }
    ]
  },
  {
    "type": "text",
    "text": "*Final answer will go here.*"
  },
  ...
]

"content": [
  {
    "type": "thinking",
    "thinking": [
      {
        "type": "text",
        "text": "*Thoughts and reasoning traces will go here.*"
      }
    ]
  },
  {
    "type": "text",
    "text": "*Final answer will go here.*"
  },
  ...
]

¡Meow! Click one of the tabs above to learn more.