Speech Generation

Generate speech from text using a saved voice (voice_id) or a one-off reference audio clip (ref_audio).

Generate

Generate

Generate speech using a previously saved voice (voice_id) identifier.

import base64
from pathlib import Path
from mistralai.client import Mistral

client = Mistral(api_key="your-api-key")

response = client.audio.speech.complete(
    model="voxtral-mini-tts-2603",
    input="Hello! This is Voxtral, Mistral's text-to-speech model.",
    voice_id="your-voice-id",
    response_format="mp3",
)

Path("output.mp3").write_bytes(base64.b64decode(response.audio_data))
print("Saved to output.mp3")
Best Practices

Best Practices

Text Prompt Guidelines

Text Prompt Guidelines

  • Language match: the voice prompt should be in the same language as the text prompt for best results.
  • Cross-lingual prompts: the model also supports cross-lingual voice transfer. For example, a French voice prompt with English text will produce French-accented English.
  • Verbalizable form: convert numbers and symbols to their spoken equivalent to avoid ambiguity. For example, use one thousand two hundred thirty four instead of 1234, or twelve thirty four depending on context.
  • No rich formatting: avoid markdown, emojis, or special characters in the text — they will not be rendered and may degrade output quality.
  • Abbreviations: spell out abbreviations for better pronunciation. Use F-B-I or F.B.I. instead of FBI.
  • Length: keep prompts under 300 words for best results.
Warning

The TTS API includes content moderation. Requests containing certain text will be rejected with a 403 error. Keep your text inputs within acceptable use guidelines.

Response Audio Formats

Response Audio Formats

FormatDescription
mp3Compressed, suitable for most use cases
wavUncompressed PCM, highest quality
pcmRaw float32 LE samples — recommended for streaming (lowest latency)
flacLossless compression
opusLow bitrate, good for streaming