Batch Processing

Batching allows you to run asynchronous inference on large inputs in parallel, reducing compute costs while running large workloads at a 50% discount.

Prepare Batch

Prepare and Upload your Batch

A batch is composed of a list of API requests. The structure of an individual request includes:

A unique custom_id for identifying each request and referencing results after completion
A body object with the raw request you would have when calling the original endpoint without batching

Here's an example of how to structure a batch request:

{"custom_id": "0", "body": {"model": "mistral-small-latest", "max_tokens": 100, "messages": [{"role": "user", "content": "What is the best French cheese?"}]}}
{"custom_id": "1", "body": {"model": "mistral-small-latest", "max_tokens": 100, "messages": [{"role": "user", "content": "What is the best French wine?"}]}}

{"custom_id": "0", "body": {"model": "mistral-small-latest", "max_tokens": 100, "messages": [{"role": "user", "content": "What is the best French cheese?"}]}}
{"custom_id": "1", "body": {"model": "mistral-small-latest", "max_tokens": 100, "messages": [{"role": "user", "content": "What is the best French wine?"}]}}

Tip

Each line must be a valid JSON object. Do not add line breaks within a JSON object. The model field inside body is optional when you specify the model at job creation time.

A batch body object can be any valid request body for the endpoint you are using. Below are examples of batches for different endpoints, they have their body match the endpoint's request body.

A batch file for chat completions, with v1/chat/completions as the Endpoint and ministral-3b-latest as the Model, would look like the following:

{"custom_id": "0", "body": {"max_tokens": 128, "messages": [{"role": "user", "content": "What is the best French cheese?"}]}}
{"custom_id": "1", "body": {"max_tokens": 512, "temperature": 0.2, "messages": [{"role": "user", "content": "What is the best French wine?"}]}}
{"custom_id": "2", "body": {"max_tokens": 256, "temperature": 0.8, "messages": [{"role": "user", "content": "What is the best French pastry?"}]}}

{"custom_id": "0", "body": {"max_tokens": 128, "messages": [{"role": "user", "content": "What is the best French cheese?"}]}}
{"custom_id": "1", "body": {"max_tokens": 512, "temperature": 0.2, "messages": [{"role": "user", "content": "What is the best French wine?"}]}}
{"custom_id": "2", "body": {"max_tokens": 256, "temperature": 0.8, "messages": [{"role": "user", "content": "What is the best French pastry?"}]}}

As a JSONL file, each line represents a request to the API Endpoint and Model.

Explanation

The body request will follow the same format as the endpoint you want to run your batching, except the model id that will be provided only during the job creation to start the batch run. Below we provide an example of row with chat completions:

{
    "custom_id": "2", # An ID as metadata that will be returned in the output file to identify the request
    "body": { # The body of the request
        "max_tokens": 256, # Max tokens corresponding to the max generated tokens for chat completions
        "temperature": 0.8, # The temperature to use for sampling
        "messages": [ # The messages to generate chat completions for
            {
                "role": "user", # The role of the messages author
                "content": "What is the best French pastry?" # The content of the message
            }
        ]
    }
}

{
    "custom_id": "2", # An ID as metadata that will be returned in the output file to identify the request
    "body": { # The body of the request
        "max_tokens": 256, # Max tokens corresponding to the max generated tokens for chat completions
        "temperature": 0.8, # The temperature to use for sampling
        "messages": [ # The messages to generate chat completions for
            {
                "role": "user", # The role of the messages author
                "content": "What is the best French pastry?" # The content of the message
            }
        ]
    }
}

For more information regarding completions, visit the Chat Completions docs and the corresponding API Spec.

For large batches of up to 1M requests, you would create a .jsonl file with the above data. Once saved, you can upload your batch input file to ensure it is correctly referenced when initiating batch processes.

Tip

For batches with less than 10k requests, we support inline batching.

There are 2 main methods of uploading a batch file:

A. Via Studio:

Upload your files in Studio›Files ↗.
- Upload the file in the format described previously.
- Set purpose to Batch Processing.
Start and manage your batches in Studio›Batches ↗.
- Create and start a job by providing the files, endpoint and model. You won't need to use the API to upload your files or create batching jobs.

B. Via the API, explained below:

To upload your batch file, you need to use the files endpoint.

from mistralai.client import Mistral
import os

api_key = os.environ["MISTRAL_API_KEY"]

client = Mistral(api_key=api_key)

batch_data = client.files.upload(
    file={
        "file_name": "test.jsonl",
        "content": open("test.jsonl", "rb")
    },
    purpose = "batch"
)

from mistralai.client import Mistral
import os

api_key = os.environ["MISTRAL_API_KEY"]

client = Mistral(api_key=api_key)

batch_data = client.files.upload(
    file={
        "file_name": "test.jsonl",
        "content": open("test.jsonl", "rb")
    },
    purpose = "batch"
)

Batch Creation

Create a new Batch Job

Create a new batch job, it will be queued for processing.

Requests Data: The data for the requests to be batched. There are two options:
- input_files: a list of the batch input file IDs, see how to use file-batching.
- requests: a list of the requests to be batched, see how to use inline batching.
model: you can only use one model (e.g., codestral-latest) per batch. However, you can run multiple batches on the same files with different models if you want to compare outputs.
endpoint: we currently support /v1/embeddings, /v1/chat/completions, /v1/fim/completions, /v1/moderations, /v1/chat/moderations, /v1/ocr, /v1/classifications, /v1/conversations, /v1/audio/transcriptions.
metadata: optional custom metadata for the batch.

File Batching

The standard batching approach relies on batch files containing all the requests to be processed. We support up to 1 million requests in a single batch, enabling efficient handling of large volumes of requests at a reduced cost. This is ideal for tasks with high throughput requirements but low latency sensitivity or priority.

created_job = client.batch.jobs.create(
    input_files=[batch_data.id],
    model="mistral-small-latest",
    endpoint="/v1/chat/completions",
    metadata={"job_type": "testing"}
)

created_job = client.batch.jobs.create(
    input_files=[batch_data.id],
    model="mistral-small-latest",
    endpoint="/v1/chat/completions",
    metadata={"job_type": "testing"}
)

Inline-Batching

For batches of fewer than 10,000 requests, we support inline batching. Instead of creating and uploading a .jsonl file with all the request data, you can include the request body directly in the job creation request. This is convenient for smaller-scale or less bulk-intensive tasks.

from mistralai.client import Mistral
import os

api_key = os.environ["MISTRAL_API_KEY"]

client = Mistral(api_key=api_key)

inline_batch_data = [
    {
        "custom_id": "0",
        "body": {
            "max_tokens": 128,
            "messages": [{"role": "user", "content": "What is the best French cheese?"}]
        }
    },
    {
        "custom_id": "1",
        "body": {
            "max_tokens": 512,
            "temperature": 0.2,
            "messages": [{"role": "user", "content": "What is the best French wine?"}]
        }
    },
    {
        "custom_id": "2",
        "body": {
            "max_tokens": 256,
            "temperature": 0.8,
            "messages": [{"role": "user", "content": "What is the best French pastry?"}]
        }
    },
]

created_job = client.batch.jobs.create(
    requests=inline_batch_data,
    model="mistral-small-latest",
    endpoint="/v1/chat/completions",
    metadata={"job_type": "testing"}
)

from mistralai.client import Mistral
import os

api_key = os.environ["MISTRAL_API_KEY"]

client = Mistral(api_key=api_key)

inline_batch_data = [
    {
        "custom_id": "0",
        "body": {
            "max_tokens": 128,
            "messages": [{"role": "user", "content": "What is the best French cheese?"}]
        }
    },
    {
        "custom_id": "1",
        "body": {
            "max_tokens": 512,
            "temperature": 0.2,
            "messages": [{"role": "user", "content": "What is the best French wine?"}]
        }
    },
    {
        "custom_id": "2",
        "body": {
            "max_tokens": 256,
            "temperature": 0.8,
            "messages": [{"role": "user", "content": "What is the best French pastry?"}]
        }
    },
]

created_job = client.batch.jobs.create(
    requests=inline_batch_data,
    model="mistral-small-latest",
    endpoint="/v1/chat/completions",
    metadata={"job_type": "testing"}
)

Get/Retrieve

Retrieve your Batch Job

Once batch sent, you will want to retrieve a lot of information such as:

The status of the batch job
The results of the batch job
The list of batch jobs

Get a batch job details

You can retrieve the details of a batch job by its ID.

retrieved_job = client.batch.jobs.get(job_id=created_job.id)

retrieved_job = client.batch.jobs.get(job_id=created_job.id)

Get batch job results

Once the batch job is completed, download the results.

output_file_stream = client.files.download(file_id=retrieved_job.output_file)

# Write and save the file
with open('batch_results.jsonl', 'wb') as f:
    f.write(output_file_stream.read())

output_file_stream = client.files.download(file_id=retrieved_job.output_file)

# Write and save the file
with open('batch_results.jsonl', 'wb') as f:
    f.write(output_file_stream.read())

List batch jobs

You can view a list of your batch jobs and filter them by various criteria, including:

Status: QUEUED, RUNNING, SUCCESS, FAILED, TIMEOUT_EXCEEDED, CANCELLATION_REQUESTED and CANCELLED
Metadata: custom metadata key and value for the batch

list_job = client.batch.jobs.list(
    status="RUNNING",
    metadata={"job_type": "testing"}
)

list_job = client.batch.jobs.list(
    status="RUNNING",
    metadata={"job_type": "testing"}
)

Request Cancellation

Cancel any Job

If you want to cancel a batch job, you can do so by sending a cancellation request.

canceled_job = client.batch.jobs.cancel(job_id=created_job.id)

canceled_job = client.batch.jobs.cancel(job_id=created_job.id)

An end-to-end example

Below is an end-to-end example of how to use the batch API from start to finish.

A dense example of how to use the Mistral API to run a batch job with random data. From:

Creating a client
Generating random input data
Creating an input file
Running a batch job
Downloading the results

import argparse
import json
import os
import random
import time
from io import BytesIO

import httpx
from mistralai.client import File, Mistral

def create_client():
    """
    Create a Mistral client using the API key from environment variables.

    Returns:
        Mistral: An instance of the Mistral client.
    """
    return Mistral(api_key=os.environ["MISTRAL_API_KEY"])

def generate_random_string(start, end):
    """
    Generate a random string of variable length.

    Args:
        start (int): Minimum length of the string.
        end (int): Maximum length of the string.

    Returns:
        str: A randomly generated string.
    """
    length = random.randrange(start, end)
    return ' '.join(random.choices('abcdefghijklmnopqrstuvwxyz', k=length))

def print_stats(batch_job):
    """
    Print the statistics of the batch job.

    Args:
        batch_job: The batch job object containing job statistics.
    """
    print(f"Total requests: {batch_job.total_requests}")
    print(f"Failed requests: {batch_job.failed_requests}")
    print(f"Successful requests: {batch_job.succeeded_requests}")
    print(
        f"Percent done: {round((batch_job.succeeded_requests + batch_job.failed_requests) / batch_job.total_requests, 4) * 100}")


def create_input_file(client, num_samples):
    """
    Create an input file for the batch job.

    Args:
        client (Mistral): The Mistral client instance.
        num_samples (int): Number of samples to generate.

    Returns:
        File: The uploaded input file object.
    """
    buffer = BytesIO()
    for idx in range(num_samples):
        request = {
            "custom_id": str(idx),
            "body": {
                "max_tokens": random.randint(10, 1000),
                "messages": [{"role": "user", "content": generate_random_string(100, 5000)}]
            }
        }
        buffer.write(json.dumps(request).encode("utf-8"))
        buffer.write("\n".encode("utf-8"))
    return client.files.upload(file=File(file_name="file.jsonl", content=buffer.getvalue()), purpose="batch")


def run_batch_job(client, input_file, model):
    """
    Run a batch job using the provided input file and model.

    Args:
        client (Mistral): The Mistral client instance.
        input_file (File): The input file object.
        model (str): The model to use for the batch job.

    Returns:
        BatchJob: The completed batch job object.
    """
    batch_job = client.batch.jobs.create(
        input_files=[input_file.id],
        model=model,
        endpoint="/v1/chat/completions",
        metadata={"job_type": "testing"}
    )

    while batch_job.status in ["QUEUED", "RUNNING"]:
        batch_job = client.batch.jobs.get(job_id=batch_job.id)
        print_stats(batch_job)
        time.sleep(1)

    print(f"Batch job {batch_job.id} completed with status: {batch_job.status}")
    return batch_job


def download_file(client, file_id, output_path):
    """
    Download a file from the Mistral server.

    Args:
        client (Mistral): The Mistral client instance.
        file_id (str): The ID of the file to download.
        output_path (str): The path where the file will be saved.
    """
    if file_id is not None:
        print(f"Downloading file to {output_path}")
        output_file = client.files.download(file_id=file_id)
        with open(output_path, "w") as f:
            for chunk in output_file.stream:
                f.write(chunk.decode("utf-8"))
        print(f"Downloaded file to {output_path}")


def main(num_samples, success_path, error_path, model):
    """
    Main function to run the batch job.

    Args:
        num_samples (int): Number of samples to process.
        success_path (str): Path to save successful outputs.
        error_path (str): Path to save error outputs.
        model (str): Model name to use.
    """
    client = create_client()
    input_file = create_input_file(client, num_samples)
    print(f"Created input file {input_file}")

    batch_job = run_batch_job(client, input_file, model)
    print(f"Job duration: {batch_job.completed_at - batch_job.created_at} seconds")
    download_file(client, batch_job.error_file, error_path)
    download_file(client, batch_job.output_file, success_path)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Run Mistral AI batch job")
    parser.add_argument("--num_samples", type=int, default=100, help="Number of samples to process")
    parser.add_argument("--success_path", type=str, default="output.jsonl", help="Path to save successful outputs")
    parser.add_argument("--error_path", type=str, default="error.jsonl", help="Path to save error outputs")
    parser.add_argument("--model", type=str, default="codestral-latest", help="Model name to use")

    args = parser.parse_args()

    main(args.num_samples, args.success_path, args.error_path, args.model)

import argparse
import json
import os
import random
import time
from io import BytesIO

import httpx
from mistralai.client import File, Mistral

def create_client():
    """
    Create a Mistral client using the API key from environment variables.

    Returns:
        Mistral: An instance of the Mistral client.
    """
    return Mistral(api_key=os.environ["MISTRAL_API_KEY"])

def generate_random_string(start, end):
    """
    Generate a random string of variable length.

    Args:
        start (int): Minimum length of the string.
        end (int): Maximum length of the string.

    Returns:
        str: A randomly generated string.
    """
    length = random.randrange(start, end)
    return ' '.join(random.choices('abcdefghijklmnopqrstuvwxyz', k=length))

def print_stats(batch_job):
    """
    Print the statistics of the batch job.

    Args:
        batch_job: The batch job object containing job statistics.
    """
    print(f"Total requests: {batch_job.total_requests}")
    print(f"Failed requests: {batch_job.failed_requests}")
    print(f"Successful requests: {batch_job.succeeded_requests}")
    print(
        f"Percent done: {round((batch_job.succeeded_requests + batch_job.failed_requests) / batch_job.total_requests, 4) * 100}")


def create_input_file(client, num_samples):
    """
    Create an input file for the batch job.

    Args:
        client (Mistral): The Mistral client instance.
        num_samples (int): Number of samples to generate.

    Returns:
        File: The uploaded input file object.
    """
    buffer = BytesIO()
    for idx in range(num_samples):
        request = {
            "custom_id": str(idx),
            "body": {
                "max_tokens": random.randint(10, 1000),
                "messages": [{"role": "user", "content": generate_random_string(100, 5000)}]
            }
        }
        buffer.write(json.dumps(request).encode("utf-8"))
        buffer.write("\n".encode("utf-8"))
    return client.files.upload(file=File(file_name="file.jsonl", content=buffer.getvalue()), purpose="batch")


def run_batch_job(client, input_file, model):
    """
    Run a batch job using the provided input file and model.

    Args:
        client (Mistral): The Mistral client instance.
        input_file (File): The input file object.
        model (str): The model to use for the batch job.

    Returns:
        BatchJob: The completed batch job object.
    """
    batch_job = client.batch.jobs.create(
        input_files=[input_file.id],
        model=model,
        endpoint="/v1/chat/completions",
        metadata={"job_type": "testing"}
    )

    while batch_job.status in ["QUEUED", "RUNNING"]:
        batch_job = client.batch.jobs.get(job_id=batch_job.id)
        print_stats(batch_job)
        time.sleep(1)

    print(f"Batch job {batch_job.id} completed with status: {batch_job.status}")
    return batch_job


def download_file(client, file_id, output_path):
    """
    Download a file from the Mistral server.

    Args:
        client (Mistral): The Mistral client instance.
        file_id (str): The ID of the file to download.
        output_path (str): The path where the file will be saved.
    """
    if file_id is not None:
        print(f"Downloading file to {output_path}")
        output_file = client.files.download(file_id=file_id)
        with open(output_path, "w") as f:
            for chunk in output_file.stream:
                f.write(chunk.decode("utf-8"))
        print(f"Downloaded file to {output_path}")


def main(num_samples, success_path, error_path, model):
    """
    Main function to run the batch job.

    Args:
        num_samples (int): Number of samples to process.
        success_path (str): Path to save successful outputs.
        error_path (str): Path to save error outputs.
        model (str): Model name to use.
    """
    client = create_client()
    input_file = create_input_file(client, num_samples)
    print(f"Created input file {input_file}")

    batch_job = run_batch_job(client, input_file, model)
    print(f"Job duration: {batch_job.completed_at - batch_job.created_at} seconds")
    download_file(client, batch_job.error_file, error_path)
    download_file(client, batch_job.output_file, success_path)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Run Mistral AI batch job")
    parser.add_argument("--num_samples", type=int, default=100, help="Number of samples to process")
    parser.add_argument("--success_path", type=str, default="output.jsonl", help="Path to save successful outputs")
    parser.add_argument("--error_path", type=str, default="error.jsonl", help="Path to save error outputs")
    parser.add_argument("--model", type=str, default="codestral-latest", help="Model name to use")

    args = parser.parse_args()

    main(args.num_samples, args.success_path, args.error_path, args.model)

FAQ

Batch Processing

Prepare Batch

Prepare and Upload your Batch

Explanation

Batch Creation

Create a new Batch Job

File Batching

Inline-Batching

Get/Retrieve

Retrieve your Batch Job

Get a batch job details

Get batch job results

List batch jobs

Request Cancellation

Cancel any Job

An end-to-end example

FAQ

Is the batch API available for all models?

Does the batch API affect pricing?

Does the batch API affect rate limits?

What's the max number of requests in a batch?

What's the max number of batch jobs one can create?

How long does the batch API take to process?

Can I view batch results from my workspace?

Will batch results ever expire?

Can batches exceed the spend limit?