[Capabilities]

Vision

Vision capabilities enable models to analyze images and provide insights based on visual content in addition to text. This multimodal approach opens up new possibilities for applications that require both textual and visual understanding.

We provide a variety of models with vision capabilities, all available via the Chat Completions API.

tip

For more specific use cases regarding Document Parsing, OCR and Data Extraction we recommend taking a look at our Document AI stack here.

Before You Start

Before You Start

Models with Vision Capabilities

  • Pixtral 12B via pixtral-12b-latest
  • Pixtral Large via pixtral-large-latest
  • Mistral Medium 3.1 via mistral-medium-2508
  • Mistral Small 3.2 via mistral-small-2506
Sending an Image

Sending an Image

Use Vision Models

There are two ways to send an image to the Chat Completions API, either by passing a URL or by passing a base64 encoded image.

tip

Before continuing, we recommend reading the Chat Competions documentation to learn more about the chat completions API and how to use it before proceeding.

If the image is hosted online, you can simply provide the publicaly accessible URL of the image in the request. This method is straightforward and does not require any encoding.

import os
from mistralai import Mistral

api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-small-2506"

client = Mistral(api_key=api_key)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": "https://docs.mistral.ai/img/eiffel-tower-paris.jpg"
            }
        ]
    }
]

chat_response = client.chat.complete(
    model=model,
    messages=messages
)
Use cases

Use cases

Below you can find a few examples of use cases leveraging our models vision, from understanding graphs to extract data, the use cases are diverse.

note

These are simple examples you can use as inspiration to build your own use cases, for OCR and Structured Outputs, we recommend leveraging Document AI and Document AI Annotations.

Cat head

¡Meow! Click one of the tabs above to learn more.

FAQ

FAQ