Self-Deployment

Mistral AI models can be self-deployed on your own infrastructure through various inference engines. We recommend using vLLM, a highly-optimized Python-only serving framework which can expose an OpenAI-compatible API.

Other inference engine alternatives include TensorRT-LLM and TGI.

You can also leverage specific tools to facilitate infrastructure management, such as SkyPilot or Cerebrium.

tip

For full-stack enterprise self-deployment - from efficient model inference to team management - we recommend reaching out to us for a self-hosted AI Studio.