Speech Generation
Generate speech from text using a saved voice (voice_id) or a one-off reference audio clip (ref_audio).
Generate
Generate
Generate speech using a previously saved voice (voice_id) identifier.
import base64
from pathlib import Path
from mistralai.client import Mistral
client = Mistral(api_key="your-api-key")
response = client.audio.speech.complete(
model="voxtral-mini-tts-2603",
input="Hello! This is Voxtral, Mistral's text-to-speech model.",
voice_id="your-voice-id",
response_format="mp3",
)
Path("output.mp3").write_bytes(base64.b64decode(response.audio_data))
print("Saved to output.mp3")Best Practices
Best Practices
Text Prompt Guidelines
Text Prompt Guidelines
- Language match: the voice prompt should be in the same language as the text prompt for best results.
- Cross-lingual prompts: the model also supports cross-lingual voice transfer. For example, a French voice prompt with English text will produce French-accented English.
- Verbalizable form: convert numbers and symbols to their spoken equivalent to avoid ambiguity. For example, use
one thousand two hundred thirty fourinstead of1234, ortwelve thirty fourdepending on context. - No rich formatting: avoid markdown, emojis, or special characters in the text — they will not be rendered and may degrade output quality.
- Abbreviations: spell out abbreviations for better pronunciation. Use
F-B-IorF.B.I.instead ofFBI. - Length: keep prompts under 300 words for best results.
Response Audio Formats
Response Audio Formats
| Format | Description |
|---|---|
mp3 | Compressed, suitable for most use cases |
wav | Uncompressed PCM, highest quality |
pcm | Raw float32 LE samples — recommended for streaming (lowest latency) |
flac | Lossless compression |
opus | Low bitrate, good for streaming |