Vertex AI
Mistral AI's open and commercial models can be deployed on the Google Cloud Vertex AI platform as fully managed endpoints. Mistral models on Vertex AI are serverless services, so you don't have to manage any infrastructure.
As of today, the following models are available:
- Mistral Large (24.11, 24.07)
 - Codestral (24.05)
 - Mistral Nemo
 
For more details, visit the models page.
Getting Started
The following sections outline the steps to deploy and query a Mistral model on the Vertex AI platform.
Requesting access to the model
- Access to a Google Cloud Project with the Vertex AI API enabled.
 - Relevant IAM permissions to enable the model and query endpoints through the following roles:
- Vertex AI User IAM role.
 - Consumer Procurement Entitlement Manager role.
 
 
To enable the model of your choice, navigate to its card in the Vertex Model Garden catalog, then click on "Enable".
Querying the model (chat completion)
Available models expose a REST API that you can query using Mistral's SDKs or plain HTTP calls.
To run the examples below:
- Install the 
gcloudCLI to authenticate against the Google Cloud APIs. Refer to this page for more details. - Set the following environment variables:
GOOGLE_CLOUD_REGION: The target cloud region.GOOGLE_CLOUD_PROJECT_ID: The name of your project.VERTEX_MODEL_NAME: The name of the model to query (e.g.,mistral-large).VERTEX_MODEL_VERSION: The version of the model to query (e.g.,2407).
 
# This code requires the following packages: mistralai[gcp] (version >= `1.0.0`)
import os
from mistralai_gcp import MistralGoogleCloud
region = os.environ.get("GOOGLE_CLOUD_REGION")
project_id = os.environ.get("GOOGLE_CLOUD_PROJECT_ID")
model_name = os.environ.get("VERTEX_MODEL_NAME")
model_version = os.environ.get("VERTEX_MODEL_VERSION")
client = MistralGoogleCloud(region=region, project_id=project_id)
resp = client.chat.complete(
    model=f"{model_name}-{model_version}",
    messages=[
        {
            "role": "user",
            "content": "Who is the best French painter? Answer in one short sentence.",
        }
    ],
)
print(resp.choices[0].message.content)Querying the model (FIM completion)
Codestral can be queried using an additional completion mode called fill-in-the-middle (FIM).
For more information, see the code generation section.
import os
from mistralai_gcp import MistralGoogleCloud
region = os.environ.get("GOOGLE_CLOUD_REGION")
project_id = os.environ.get("GOOGLE_CLOUD_PROJECT_ID")
model_name = "codestral"
model_version = "2405"
client = MistralGoogleCloud(region=region, project_id=project_id)
resp = client.fim.complete(
    model=f"{model_name}-{model_version}",
    prompt="def count_words_in_file(file_path: str) -> int:",
    suffix="return n_words"
)
print(resp.choices[0].message.content)Going Further
For more information and examples, check:
- The Google Cloud Partner Models documentation page.
 - The Vertex Model Cards for Mistral Large, Mistral-NeMo, and Codestral.
 - The Getting Started Colab Notebook for Mistral models on Vertex, along with the source file on GitHub.