Deployment

Run Mistral models through managed cloud services or Mistral Compute. Open-weight models (Apache 2.0) can be deployed on compatible hardware. Commercial models are available through cloud provider integrations or Mistral Compute.

In this section

  • Cloud Deployments: Access Mistral models through Azure AI, Amazon Bedrock, Google Cloud Vertex AI, Snowflake Cortex, IBM watsonx, and Outscale.

  • Local Deployment: Run open-weight models on your own infrastructure using vLLM, TensorRT-LLM, TGI, SkyPilot, Cerebrium, or Cloudflare Workers AI. Supports configurations from single-GPU setups (RTX 4090) to multi-node clusters (4+ H100s for larger models).