[Capabilities]

Audio & Transcription

Audio input capabilities enable models to chat and understand audio directly, this can be used for both chat use cases via audio or for optimal transcription purposes.

audio_graph
Models with Audio Capabilities

Models with Audio Capabilities

Audio capable models:

  • Voxtral Small (voxtral-small-latest) with audio input for chat use cases.
  • Voxtral Mini (voxtral-mini-latest) with audio input for chat use cases
  • And Voxtral Mini Transcribe (voxtral-mini-latest via audio/transcriptions), with an efficient transcription only service.
Chat with Audio

Chat with Audio

Our Voxtral models are capable of being used for chat use cases with our chat completions endpoint.

tip

Before continuing, we recommend reading the Chat Competions documentation to learn more about the chat completions API and how to use it before proceeding.

To pass a local audio file, you can encode it in base64 and pass it as a string.

import base64
from mistralai import Mistral

api_key = os.environ["MISTRAL_API_KEY"]
model = "voxtral-mini-latest"

client = Mistral(api_key=api_key)

# Encode the audio file in base64
with open("examples/files/bcn_weather.mp3", "rb") as f:
    content = f.read()
audio_base64 = base64.b64encode(content).decode('utf-8')

chat_response = client.chat.complete(
    model=model,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "input_audio",
                "input_audio": audio_base64,
            },
            {
                "type": "text",
                "text": "What's in this file?"
            },
        ]
    }],
)
Example Samples

Example Samples

Below you can find a few of the multiple use cases possible, by leveraging the audio capabilities of our models.

Cat head

¡Meow! Click one of the tabs above to learn more.

Transcription

Transcription

Transcription provides an optimized endpoint for transcription purposes and currently supports voxtral-mini-latest, which runs Voxtral Mini Transcribe.

Parameters
We provide different settings and parameters for transcription, such as:

  • timestamp_granularities: This allows you to set timestamps to track not only "what" was said but also "when". You can find more about timestamps here.
  • language: Our transcription service also works as a language detection service. However, you can manually set the language of the transcription for better accuracy if the language of the audio is already known.

Among the different methods to pass the audio, you can directly provide a path to a local file to upload and transcribe it as follows:

import os
from mistralai import Mistral

api_key = os.environ["MISTRAL_API_KEY"]
model = "voxtral-mini-latest"

client = Mistral(api_key=api_key)

with open("/path/to/file/audio.mp3", "rb") as f:
    transcription_response = client.audio.transcriptions.complete(
        model=model,
        file={
            "content": f,
            "file_name": "audio.mp3",
        },
        ## language="en"
    )
Example Samples

Example Samples

Below you can find a few examples leveraging the audio transcription endpoint.

Cat head

¡Meow! Click one of the tabs above to learn more.

Transcription with Timestamps

Transcription with Timestamps

You can request timestamps for the transcription by passing the timestamp_granularities parameter, currently supporting segment.
It will return the start and end time of each segment in the audio file.

import os
from mistralai import Mistral

api_key = os.environ["MISTRAL_API_KEY"]
model = "voxtral-mini-latest"

client = Mistral(api_key=api_key)

transcription_response = client.audio.transcriptions.complete(
    model=model,
    file_url="https://docs.mistral.ai/audio/obama.mp3",
    timestamp_granularities=["segment"]
)
FAQ

FAQ