Audio & Transcription
This page covers Mistral's Audio & Transcription capabilities, including Offline and Realtime transcription. You'll learn how to integrate these features into your applications and understand their use cases.
Overview
Mistral's Audio & Transcription services enable you to convert speech to text (STT) with high accuracy and low latency. We offer two main models tailored for different use cases:
Voxtral Mini Transcribe V2
Voxtral Mini Transcribe V2 is designed for batch transcription. It provides:
- High accuracy: Industry-leading transcription quality with low word error rates.
- Speaker diarization: Automatically identifies and labels different speakers in your audio.
- Context biasing: Allows you to guide the model with custom vocabulary for accurate transcription of domain-specific terms.
- Word-Level timestamps: Provides precise timestamps for each word, useful for subtitle generation and audio search.
- Multilingual support: Supports transcription in 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch.
- Noise robustness: Maintains high accuracy in challenging acoustic environments.
- Long audio support: Processes recordings up to 3 hours in a single request.
Voxtral Realtime
Voxtral Realtime is built for live applications. It offers:
- Ultra-Low latency: Configurable latency down to sub-200ms, ideal for voice agents and real-time applications.
- Streaming architecture: Transcribes audio as it arrives, enabling natural and responsive voice interactions.
- Multilingual support: Strong performance in 13 languages, ensuring global reach.
- Edge deployment: Can be deployed on edge devices for privacy-first applications with a 4B parameter footprint.
- Open weights: Available under the Apache 2.0 license on the Hugging Face Hub, offering flexibility and transparency.
Security & Privacy: Both models support GDPR and HIPAA-compliant deployments through secure on-premise or private cloud setups, ensuring your data remains protected.
Audio & Transcription Services
Getting Started
Explore the features of Mistral's Audio & Transcription services with our comprehensive guides:
- Offline Transcription: Learn how to use Voxtral Mini Transcribe V2 for batch transcription, including speaker diarization, context biasing, and word-level timestamps.
- Realtime Transcription: Learn how to integrate Voxtral Realtime for live transcription with ultra-low latency, perfect for voice agents and real-time applications.
Looking for translating more than one file at a time? Check our Batch feature (via API)