Reasoning

Reasoning is the next step of CoT (Chain of Thought), naturally used to describe the logical steps generated by the model before reaching a conclusion. Reasoning strengthens this characteristic by going through training steps that encourage the model to generate chains of thought freely before producing the final answer. This allows models to explore the problem more profoundly and ultimately reach a better solution to the best of their ability by using extra compute time to generate more tokens and improve the answer—also described as Test Time Computation.

Reasoning LLM AI

They excel at complex use cases like math and coding tasks, but can be used in a wide range of scenarios to solve diverse problems. The output of reasoning models includes a reasoning chunk with the model's thinking traces, followed by the final answer.

Note

Looking for native reasoning models (magistral-small-latest, magistral-medium-latest)? These have been deprecated. See Native reasoning (deprecated).

Tip

Before continuing, we recommend reading the Chat Completions documentation to learn more about the chat completions API and how to use it before proceeding.

Before you start

Before you start

Model

  • mistral-small-latest: Supports adjustable reasoning via the reasoning_effort parameter. No extra configuration required — just add the parameter to any chat completion request.
  • mistral-medium-3-5: Supports adjustable reasoning via the reasoning_effort parameter. For agentic and code use cases, reasoning_effort="high" is recommended.

The reasoning_effort parameter controls how reasoning is surfaced in the output:

  • reasoning_effort = "high": The response includes a full thinking chunk before the final answer, at the cost of increased token usage.
  • reasoning_effort = "none": The model thinks minimally and the thinking chunk is omitted from the response.
Note

reasoning_effort is also available on the Agents and Conversations endpoints via the API, inside the completion_args field.

Usage

Usage

Reasoning with chat completions

Reasoning with chat completions

Here is an example via our chat completions endpoint:

import os
from mistralai.client import Mistral

api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-small-latest"

client = Mistral(api_key=api_key)

chat_response = client.chat.complete(
    model = model,
    messages = [
        {
            "role": "user",
            "content": "John is one of 4 children. The first sister is 4 years old. Next year, the second sister will be twice as old as the first sister. The third sister is two years older than the second sister. The third sister is half the age of her older brother. How old is John?",
        },
    ],
    reasoning_effort="high"
)

Handling thinking chunks

Handling thinking chunks

When reasoning_effort is set to "high", the response message.content is a list of chunks instead of a plain string. Two chunk types appear:

  • ThinkChunk (type: "thinking"): contains the model's reasoning trace. The thinking field is itself a list of TextChunk objects.
  • TextChunk (type: "text"): contains the final answer.

When reasoning_effort is "none", message.content is a plain str with no thinking trace.

Parsing the response

Parsing the response

from mistralai.client.models import TextChunk, ThinkChunk

# Assuming chat_response from the example above
content = chat_response.choices[0].message.content

# reasoning_effort="none" returns a plain string
if isinstance(content, str):
    print(content)
else:
    for chunk in content:
        if isinstance(chunk, ThinkChunk):
            print("--- thinking ---")
            for inner in chunk.thinking:
                if isinstance(inner, TextChunk):
                    print(inner.text)
            print("--- /thinking ---")
        elif isinstance(chunk, TextChunk):
            print(chunk.text)

Streaming

Streaming

When streaming, delta.content changes shape during the response:

  1. Thinking phase: delta.content is a list containing a ThinkChunk
  2. Transition: a single list containing both a closing ThinkChunk and the first TextChunk
  3. Answer phase: delta.content is a plain string
import os
from mistralai.client import Mistral
from mistralai.client.models import TextChunk, ThinkChunk

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

in_thinking = False

for event in client.chat.stream(
    model="mistral-medium-3-5",
    messages=[{"role": "user", "content": "What is 17 * 23?"}],
    reasoning_effort="high",
):
    delta = event.data.choices[0].delta.content
    if not delta:
        continue

    # After thinking ends, delta is a plain string
    if isinstance(delta, str):
        if in_thinking:
            print("\n--- /thinking ---")
            in_thinking = False
        print(delta, end="", flush=True)
        continue

    # During thinking, delta is a list of chunks
    for chunk in delta:
        if isinstance(chunk, ThinkChunk):
            if not in_thinking:
                print("--- thinking ---")
                in_thinking = True
            for inner in chunk.thinking:
                if isinstance(inner, TextChunk):
                    print(inner.text, end="", flush=True)
        elif isinstance(chunk, TextChunk):
            if in_thinking:
                print("\n--- /thinking ---")
                in_thinking = False
            print(chunk.text, end="", flush=True)

Multi-turn conversations

Multi-turn conversations

When building multi-turn conversations with reasoning, always replay the full assistant message (including ThinkChunk) back into the message history. Dropping the reasoning trace across turns degrades model performance.

from mistralai.client import Mistral
from mistralai.client.models import TextChunk, ThinkChunk, UserMessage

client = Mistral(api_key="your-api-key", timeout_ms=300_000)

messages = []

for user_text in ["What is 17 * 23?", "Now multiply that by 3."]:
    messages.append(UserMessage(content=user_text))

    response = client.chat.complete(
        model="mistral-medium-3-5",
        messages=messages,
        reasoning_effort="high",
    )

    assistant_message = response.choices[0].message

    # Print only the final answer (for display purposes)
    content = assistant_message.content
    if isinstance(content, str):
        print(content)
    else:
        answer = "".join(
            c.text for c in (content or []) if isinstance(c, TextChunk)
        )
        print(answer)

    # IMPORTANT: append the full assistant message to history.
    # This preserves ThinkChunk so the model can see its own
    # reasoning trace in subsequent turns.
    # Do NOT rebuild the message with only the answer text.
    messages.append(assistant_message)
Warning

Do not strip ThinkChunk from assistant messages before replaying them. The model relies on the reasoning trace to maintain coherence across turns. Stripping them increases token efficiency but significantly degrades output quality.