[Deployment]

Vertex AI

Mistral AI's open and commercial models can be deployed on the Google Cloud Vertex AI platform as fully managed endpoints. Mistral models on Vertex AI are serverless services, so you don't have to manage any infrastructure.

As of today, the following models are available:

  • Mistral Large (24.11, 24.07)
  • Codestral (24.05)
  • Mistral Nemo

For more details, visit the models page.

Getting Started

Getting Started

The following sections outline the steps to deploy and query a Mistral model on the Vertex AI platform.

Requesting access to the model

Requesting access to the model

The following items are required:

  • Access to a Google Cloud Project with the Vertex AI API enabled.
  • Relevant IAM permissions to enable the model and query endpoints through the following roles:

To enable the model of your choice, navigate to its card in the Vertex Model Garden catalog, then click on "Enable".

Querying the model (chat completion)

Querying the model (chat completion)

Available models expose a REST API that you can query using Mistral's SDKs or plain HTTP calls.

To run the examples below:

  • Install the gcloud CLI to authenticate against the Google Cloud APIs. Refer to this page for more details.
  • Set the following environment variables:
    • GOOGLE_CLOUD_REGION: The target cloud region.
    • GOOGLE_CLOUD_PROJECT_ID: The name of your project.
    • VERTEX_MODEL_NAME: The name of the model to query (e.g., mistral-large).
    • VERTEX_MODEL_VERSION: The version of the model to query (e.g., 2407).
# This code requires the following packages: mistralai[gcp] (version >= `1.0.0`)
import os
from mistralai_gcp import MistralGoogleCloud
region = os.environ.get("GOOGLE_CLOUD_REGION")
project_id = os.environ.get("GOOGLE_CLOUD_PROJECT_ID")
model_name = os.environ.get("VERTEX_MODEL_NAME")
model_version = os.environ.get("VERTEX_MODEL_VERSION")
client = MistralGoogleCloud(region=region, project_id=project_id)
resp = client.chat.complete(
    model=f"{model_name}-{model_version}",
    messages=[
        {
            "role": "user",
            "content": "Who is the best French painter? Answer in one short sentence.",
        }
    ],
)
print(resp.choices[0].message.content)
Querying the model (FIM completion)

Querying the model (FIM completion)

Codestral can be queried using an additional completion mode called fill-in-the-middle (FIM).

For more information, see the code generation section.

import os
from mistralai_gcp import MistralGoogleCloud
region = os.environ.get("GOOGLE_CLOUD_REGION")
project_id = os.environ.get("GOOGLE_CLOUD_PROJECT_ID")
model_name = "codestral"
model_version = "2405"
client = MistralGoogleCloud(region=region, project_id=project_id)
resp = client.fim.complete(
    model=f"{model_name}-{model_version}",
    prompt="def count_words_in_file(file_path: str) -> int:",
    suffix="return n_words"
)
print(resp.choices[0].message.content)
Going Further

Going Further

For more information and examples, check: