Deploy with SkyPilot
SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution.
We provide an example SkyPilot config that deploys the
Mistral-7B-v0.1 model on an AWS
g5.xlarge (A node with a single NVIDIA A10G GPU) instance.
After installing SkyPilot, you need to create a configuration file that tells SkyPilot how and where to deploy your inference server, using our pre-built docker container:
docker run --gpus all -p 8000:8000 ghcr.io/mistralai/mistral-src/vllm:latest \
--host 0.0.0.0 \
--model $MODEL_NAME \
Once these environment variables are set, you can use
sky launch to launch the inference server with the name
sky launch -c mistral-7b mistral-7b-v0.1.yaml --region us-east-1
When deployed that way, the model will be accessible to the whole world. You must secure it, either by exposing it exclusively on your private network (change the
--host Docker option for that), by adding a load-balancer with an authentication mechanism in front of it, or by configuring your instance networking properly.
Test it out!
To easily retrieve the IP address of the deployed
mistral-7b cluster you can use:
sky status --ip mistral-7b
You can then use curl to send a completion request:
IP=$(sky status --ip cluster-name)
curl http://$IP:8000/v1/completions \
-H "Content-Type: application/json" \
"prompt": "My favourite condiment is",
Many cloud providers require you to explicitly request access to powerful GPU instances. Read SkyPilot's guide on how to do this.