Vertex AI

Introduction

Mistral AI's open and commercial models can be deployed on the Google Cloud Vertex AI platform as fully managed endpoints. Mistral models on Vertex AI are serverless services so you don't have to manage any infrastructure.

As of today, the following models are available:

Mistral Large (24.11, 24.07)
Codestral (24.05)
Mistral Nemo

For more details, visit the models page.

Getting started

The following sections outline the steps to deploy and query a Mistral model on the Vertex AI platform.

Requesting access to the model

The following items are required:

Access to a Google Cloud Project with the Vertex AI API enabled
Relevant IAM permissions to be able to enable the model and query endpoints through the following roles:
- Vertex AI User IAM role.
- Consumer Procurement Entitlement Manager role

To enable the model of your choice, navigate to its card in the Vertex Model Garden catalog, then click on "Enable".

Querying the model (chat completion)

Available models expose a REST API that you can query using Mistral's SDKs or plain HTTP calls.

To run the examples below:

Install the gcloud CLI to authenticate against the Google Cloud APIs, please refer to this page for more details.
Set the following environment variables:
- GOOGLE_CLOUD_REGION: The target cloud region.
- GOOGLE_CLOUD_PROJECT_ID: The name of your project.
- VERTEX_MODEL_NAME: The name of the model to query (e.g. mistral-large).
- VERTEX_MODEL_VERSION: The version of the model to query (e.g. 2407).

cURL
Python
TypeScript

base_url="https://$GOOGLE_CLOUD_REGION-aiplatform.googleapis.com/v1/projects/$GOOGLE_CLOUD_PROJECT_ID/locations/$GOOGLE_CLOUD_REGION/publishers/mistralai/models"
model_version="$VERTEX_MODEL_NAME@$VERTEX_MODEL_VERSION"
url="$base_url/$model_version:rawPredict"

curl --location $url\
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer $(gcloud auth print-access-token)" \
  --data '{
      "model": "'"$VERTEX_MODEL_NAME"'",
      "temperature": 0,
      "messages": [
        {"role": "user", "content": "Who is the best French painter? Answer in one short sentence."}
      ],
      "stream": false
    }'

This code requires a virtual environment with the following packages:

mistralai[gcp]>=1.0.0

import os
from mistralai_gcp import MistralGoogleCloud

region = os.environ.get("GOOGLE_CLOUD_REGION")
project_id = os.environ.get("GOOGLE_CLOUD_PROJECT_NAME")
model_name = os.environ.get("VERTEX_MODEL_NAME")
model_version = os.environ.get("VERTEX_MODEL_VERSION")

client = MistralGoogleCloud(region=region, project_id=project_id)

resp = client.chat.complete(
    model = f"{model_name}-{model_version}",
    messages=[
        {
            "role": "user",
            "content": "Who is the best French painter? Answer in one short sentence.",
        }
    ],
)

print(resp.choices[0].message.content)

This code requires the following package:

@mistralai/mistralai-gcp (version >= 1.0.0)

import { MistralGoogleCloud } from "@mistralai/mistralai-gcp";

const client = new MistralGoogleCloud({
    region: process.env.GOOGLE_CLOUD_REGION || "",
    projectId: process.env.GOOGLE_CLOUD_PROJECT_ID || "",
});

const modelName = process.env.VERTEX_MODEL_NAME|| "";
const modelVersion = process.env.VERTEX_MODEL_VERSION || "";

async function chatCompletion(user_msg: string) {
    const resp = await client.chat.complete({
        model: modelName + "-" + modelVersion,
        messages: [
            {
                content: user_msg,
                role: "user",
            },
        ],
    });
    if (resp.choices && resp.choices.length > 0) {
        console.log(resp.choices[0]);
    }
}

chatCompletion("Who is the best French painter? Answer in one short sentence.");

Querying the model (FIM completion)

Codestral can be queried using an additional completion mode called fill-in-the-middle (FIM). For more information, see the code generation section.

cURL
Python
TypeScript

VERTEX_MODEL_NAME=codestral
VERTEX_MODEL_VERSION=2405

base_url="https://$GOOGLE_CLOUD_REGION-aiplatform.googleapis.com/v1/projects/$GOOGLE_CLOUD_PROJECT_ID/locations/$GOOGLE_CLOUD_REGION/publishers/mistralai/models"
model_version="$VERTEX_MODEL_NAME@$VERTEX_MODEL_VERSION"
url="$base_url/$model_version:rawPredict"

curl --location $url\
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer $(gcloud auth print-access-token)" \
  --data '{
      "model":"'"$VERTEX_MODEL_NAME"'",
      "prompt": "def count_words_in_file(file_path: str) -> int:",
      "suffix": "return n_words",
      "stream": false
    }'

import os
from mistralai_gcp import MistralGoogleCloud

region = os.environ.get("GOOGLE_CLOUD_REGION")
project_id = os.environ.get("GOOGLE_CLOUD_PROJECT_NAME")
model_name = "codestral"
model_version = "2405"

client = MistralGoogleCloud(region=region, project_id=project_id)

resp = client.fim.complete(
    model = f"{model_name}-{model_version}",
    prompt="def count_words_in_file(file_path: str) -> int:",
    suffix="return n_words"
)

print(resp.choices[0].message.content)

import { MistralGoogleCloud } from "@mistralai/mistralai-gcp";

const client = new MistralGoogleCloud({
    region: process.env.GOOGLE_CLOUD_REGION || "",
    projectId: process.env.GOOGLE_CLOUD_PROJECT_ID || "",
});

const modelName = "codestral";
const modelVersion = "2405";

async function fimCompletion(prompt: string, suffix: string) {
    const resp = await client.fim.complete({
        model: modelName + "-" + modelVersion,
        prompt: prompt,
        suffix: suffix
    });
    if (resp.choices && resp.choices.length > 0) {
        console.log(resp.choices[0]);
    }
}

fimCompletion("def count_words_in_file(file_path: str) -> int:",
              "return n_words");

Going further

For more information and examples, you can check:

The Google Cloud Partner Models documentation page.
The Vertex Model Cards for Mistral Large, Mistral-NeMo and Codestral.
The Getting Started Colab Notebook for Mistral models on Vertex, along with the source file on GitHub.

Introduction​

Getting started​

Requesting access to the model​

Querying the model (chat completion)​

Querying the model (FIM completion)​

Going further​