Audio & Transcription

This page covers Mistral's Audio & Transcription capabilities, including Offline and Realtime transcription. You'll learn how to integrate these features into your applications and understand their use cases.

Overview

Mistral's Audio & Transcription services enable you to convert speech to text (STT) with high accuracy and low latency. We offer two main models tailored for different use cases:

Voxtral Mini Transcribe V2

Voxtral Mini Transcribe V2 is designed for batch transcription. It provides:

High accuracy: Industry-leading transcription quality with low word error rates.
Speaker diarization: Automatically identifies and labels different speakers in your audio.
Context biasing: Allows you to guide the model with custom vocabulary for accurate transcription of domain-specific terms.
Word-Level timestamps: Provides precise timestamps for each word, useful for subtitle generation and audio search.
Multilingual support: Supports transcription in 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch.
Noise robustness: Maintains high accuracy in challenging acoustic environments.
Long audio support: Processes recordings up to 3 hours in a single request.

Voxtral Realtime

Voxtral Realtime is built for live applications. It offers:

Ultra-Low latency: Configurable latency down to sub-200ms, ideal for voice agents and real-time applications.
Streaming architecture: Transcribes audio as it arrives, enabling natural and responsive voice interactions.
Multilingual support: Strong performance in 13 languages, ensuring global reach.
Edge deployment: Can be deployed on edge devices for privacy-first applications with a 4B parameter footprint.
Open weights: Available under the Apache 2.0 license on the Hugging Face Hub, offering flexibility and transparency.

note

Security & Privacy: Both models support GDPR and HIPAA-compliant deployments through secure on-premise or private cloud setups, ensuring your data remains protected.

Audio & Transcription Services

Getting Started

Explore the features of Mistral's Audio & Transcription services with our comprehensive guides:

Offline Transcription: Learn how to use Voxtral Mini Transcribe V2 for batch transcription, including speaker diarization, context biasing, and word-level timestamps.
Realtime Transcription: Learn how to integrate Voxtral Realtime for live transcription with ultra-low latency, perfect for voice agents and real-time applications.

Information

Looking for translating more than one file at a time? Check our Batch feature (via API)