Observability

Information

Observability is in Private Preview and is available for Enterprise-tier organizations only. Contact your Mistral representative to get access.

Note

Usage during Private Preview is included in your Studio plan. Pricing can change at general availability.

AI applications are harder to debug than traditional software. The same prompt can produce different responses, tool calls can fail silently, and quality degrades in ways that unit tests don't catch. You only find out something went wrong when a user tells you.

Observability gives you the data to get ahead of that. The suite has two distinct capabilities:

Traces: collect and explore every request flowing through your AI application in production.
Offline evaluations: measure and track pipeline quality systematically, before issues reach users.

Traces

Every request in your AI application becomes a trace: a tree of spans representing every step in the execution chain. Each span carries its input, output, latency, token counts, and status.

Traces are collected using OpenTelemetry and are supported across Mistral products: the Mistral SDK (Python and TypeScript), Workflows, Vibe Code CLI, and Vibe Work. Any application instrumented with OpenTelemetry can also send traces directly.

After traces are flowing, the Trace Explorer in Studio lets you search and filter across all requests, inspect individual executions end to end, and debug failures at the span level.

Information

Data retention: traces are kept for 30 days.

Where to go next:

Send traces to instrument your application and start collecting data.
Explore traces to search, filter, and inspect your traces in Studio.

Access

Sending traces requires only a valid Mistral API key: no special role or feature flag is needed.

Reading traces in Studio requires an Enterprise organization and one of the following roles:

Role	Access
Org Admin	All traces across all workspaces
Workspace Admin	All traces in their workspace
Observability Viewer	All traces in their workspace (read-only)

Other workspace members don't have access to trace data. A Workspace Admin or Org Admin can assign the Observability Viewer role to grant read-only access without admin privileges.

Offline evaluations

Production traces tell you what your application did. Offline evaluations tell you how well it performed, and whether it's improving or regressing as you change your pipeline.

You define a set of test cases, run them through your pipeline, and score the outputs against criteria you control. Results upload to Studio, where you can track quality trends over time and compare pipeline configurations side by side.

The evaluation workflow has three building blocks:

Evaluation SDK (mistralai-observability): the Python package that runs evaluations, computes statistics, and uploads results.
Judges: LLM-based scorers for criteria that can't be captured with code, such as helpfulness, factual accuracy, or tone.
Datasets: your test cases as a list of input records, with fields like prompts, expected outputs, and grading guidance.

Where to go next:

Offline evaluations to get started with the Evaluation SDK.