Skip to main content

Vertex AI

Introduction

Mistral AI's open and commercial models can be deployed on the Google Cloud Vertex AI platform as fully managed endpoints. Mistral models on Vertex AI are serverless services so you don't have to manage any infrastructure.

As of today, the following models are available:

  • Mistral Large
  • Mistral NeMo
  • Codestral (chat and FIM completions)

For more details, visit the models page.

Getting started

The following sections outline the steps to deploy and query a Mistral model on the Vertex AI platform.

Requesting access to the model

The following items are required:

  • Access to a Google Cloud Project with the Vertex AI API enabled
  • Relevant IAM permissions to be able to enable the model and query endpoints through the following roles:

To enable the model of your choice, navigate to its card in the Vertex Model Garden catalog, then click on "Enable".

Querying the model (chat completion)

Available models expose a REST API that you can query using Mistral's SDKs or plain HTTP calls.

To run the examples below:

  • Install the gcloud CLI to authenticate against the Google Cloud APIs, please refer to this page for more details.
  • Set the following environment variables:
    • GOOGLE_CLOUD_REGION: The target cloud region.
    • GOOGLE_CLOUD_PROJECT_ID: The name of your project.
    • VERTEX_MODEL_NAME: The name of the model to query (e.g. mistral-large).
    • VERTEX_MODEL_VERSION: The version of the model to query (e.g. 2407).
base_url="https://$GOOGLE_CLOUD_REGION-aiplatform.googleapis.com/v1/projects/$GOOGLE_CLOUD_PROJECT_ID/locations/$GOOGLE_CLOUD_REGION/publishers/mistralai/models"
model_version="$VERTEX_MODEL_NAME@$VERTEX_MODEL_VERSION"
url="$base_url/$model_version:rawPredict"

curl --location $url\
--header "Content-Type: application/json" \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--data '{
"model": "'"$VERTEX_MODEL_NAME"'",
"temperature": 0,
"messages": [
{"role": "user", "content": "Who is the best French painter? Answer in one short sentence."}
],
"stream": false
}'

Querying the model (FIM completion)

Codestral can be queried using an additional completion mode called fill-in-the-middle (FIM). For more information, see the code generation section.

VERTEX_MODEL_NAME=codestral
VERTEX_MODEL_VERSION=2405

base_url="https://$GOOGLE_CLOUD_REGION-aiplatform.googleapis.com/v1/projects/$GOOGLE_CLOUD_PROJECT_ID/locations/$GOOGLE_CLOUD_REGION/publishers/mistralai/models"
model_version="$VERTEX_MODEL_NAME@$VERTEX_MODEL_VERSION"
url="$base_url/$model_version:rawPredict"

curl --location $url\
--header "Content-Type: application/json" \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--data '{
"model":"'"$VERTEX_MODEL_NAME"'",
"prompt": "def count_words_in_file(file_path: str) -> int:",
"suffix": "return n_words",
"stream": false
}'

Going further

For more information and examples, you can check: