Batch Inference

Batching allows you to run inference on large inputs in parallel, reducing costs while running large workloads.

Prepare Batch

Prepare Batch

Prepare and Upload your Batch

A batch is composed of a list of API requests. The structure of an individual request includes:

  • A unique custom_id for identifying each request and referencing results after completion
  • A body object with the raw request you would have when calling the original endpoint without batching

Here's an example of how to structure a batch request:

{"custom_id": "0", "body": {"max_tokens": 100, "messages": [{"role": "user", "content": "What is the best French cheese?"}]}}
{"custom_id": "1", "body": {"max_tokens": 100, "messages": [{"role": "user", "content": "What is the best French wine?"}]}}

A batch body object can be any valid request body for the endpoint you are using. Below are examples of batches for different endpoints, they have their body match the endpoint's request body.

Cat head

¡Meow! Click one of the tabs above to learn more.

For large batches of up to 1M requests, you would create a .jsonl file with the above data. Once saved, you can upload your batch input file to ensure it is correctly referenced when initiating batch processes.

tip

There are 2 main methods of uploading a batch file:

A. Via AI Studio:

B. Via the API, explained below:

To upload your batch file, you need to use the files endpoint.

from mistralai import Mistral
import os

api_key = os.environ["MISTRAL_API_KEY"]

client = Mistral(api_key=api_key)

batch_data = client.files.upload(
    file={
        "file_name": "test.jsonl",
        "content": open("test.jsonl", "rb")
    },
    purpose = "batch"
)
Batch Creation

Batch Creation

Create a new Batch Job

Create a new batch job, it will be queued for processing.

  • Requests Data: The data for the requests to be batched. There are two options:
    • input_files: a list of the batch input file IDs, see how to use file-batching.
    • requests: a list of the requests to be batched, see how to use inline batching.
  • model: you can only use one model (e.g., codestral-latest) per batch. However, you can run multiple batches on the same files with different models if you want to compare outputs.
  • endpoint: we currently support /v1/embeddings, /v1/chat/completions, /v1/fim/completions, /v1/moderations, /v1/chat/moderations, /v1/ocr, /v1/classifications, /v1/conversations, /v1/audio/transcriptions.
  • metadata: optional custom metadata for the batch.
File Batching

File Batching

The standard batching approach relies on batch files containing all the requests to be processed. We support up to 1 million requests in a single batch, enabling efficient handling of large volumes of requests at a reduced cost. This is ideal for tasks with high throughput requirements but low latency sensitivity or priority.

created_job = client.batch.jobs.create(
    input_files=[batch_data.id],
    model="mistral-small-latest",
    endpoint="/v1/chat/completions",
    metadata={"job_type": "testing"}
)
Inline-Batching

Inline-Batching

For batches of fewer than 10,000 requests, we support inline batching. Instead of creating and uploading a .jsonl file with all the request data, you can include the request body directly in the job creation request. This is convenient for smaller-scale or less bulk-intensive tasks.

from mistralai import Mistral
import os

api_key = os.environ["MISTRAL_API_KEY"]

client = Mistral(api_key=api_key)

inline_batch_data = [
    {
        "custom_id": "0", 
        "body": {
            "max_tokens": 128, 
            "messages": [{"role": "user", "content": "What is the best French cheese?"}]
        }
    },
    {
        "custom_id": "1", 
        "body": {
            "max_tokens": 512, 
            "temperature": 0.2, 
            "messages": [{"role": "user", "content": "What is the best French wine?"}]
        }
    },
    {
        "custom_id": "2", 
        "body": {
            "max_tokens": 256, 
            "temperature": 0.8, 
            "messages": [{"role": "user", "content": "What is the best French pastry?"}]
        }
    },
]

created_job = client.batch.jobs.create(
    requests=inline_batch_data,
    model="mistral-small-latest",
    endpoint="/v1/chat/completions",
    metadata={"job_type": "testing"}
)
Get/Retrieve

Get/Retrieve

Retrieve your Batch Job

Once batch sent, you will want to retrieve a lot of information such as:

  • The status of the batch job
  • The results of the batch job
  • The list of batch jobs
Get a batch job details

Get a batch job details

You can retrieve the details of a batch job by its ID.

retrieved_job = client.batch.jobs.get(job_id=created_job.id)
Get batch job results

Get batch job results

Once the batch job is completed, you can easily download the results.

output_file_stream = client.files.download(file_id=retrieved_job.output_file)

# Write and save the file
with open('batch_results.jsonl', 'wb') as f:
    f.write(output_file_stream.read())
List batch jobs

List batch jobs

You can view a list of your batch jobs and filter them by various criteria, including:

  • Status: QUEUED, RUNNING, SUCCESS, FAILED, TIMEOUT_EXCEEDED, CANCELLATION_REQUESTED and CANCELLED
  • Metadata: custom metadata key and value for the batch
list_job = client.batch.jobs.list(
    status="RUNNING",
    metadata={"job_type": "testing"}
)
Request Cancellation

Request Cancellation

Cancel any Job

If you want to cancel a batch job, you can do so by sending a cancellation request.

canceled_job = client.batch.jobs.cancel(job_id=created_job.id)
An end-to-end example

An end-to-end example

Below is an end-to-end example of how to use the batch API from start to finish.

Cat head

¡Meow! Click one of the tabs above to learn more.

FAQ

FAQ