Self-deployment

Mistral AI models can be self-deployed on your own infrastructure through various inference engines. We recommend using vLLM, a highly-optimized Python-only serving framework which can expose an OpenAI-compatible API.

Other inference engine alternatives include TensorRT-LLM and TGI.

You can also leverage specific tools to facilitate infrastructure management, such as SkyPilot or Cerebrium.