Batch Inference
Batching allows you to run inference on large inputs in parallel, reducing costs while running large workloads.
Prepare batch file
Prepare and Upload your Batch File
A batch is composed of a list of API requests. The structure of an individual request includes:
- A unique
custom_id
for identifying each request and referencing results after completion - A
body
object with message information
Here's an example of how to structure a batch request:
{"custom_id": "0", "body": {"max_tokens": 100, "messages": [{"role": "user", "content": "What is the best French cheese?"}]}}
{"custom_id": "1", "body": {"max_tokens": 100, "messages": [{"role": "user", "content": "What is the best French wine?"}]}}
A bacth body
object can be any valid request body for the endpoint you are using.
Save your batch into a .jsonl file. Once saved, you can upload your batch input file to ensure it is correctly referenced when initiating batch processes:
from mistralai import Mistral
import os
api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)
batch_data = client.files.upload(
file={
"file_name": "test.jsonl",
"content": open("test.jsonl", "rb")
},
purpose = "batch"
)
Batch Creation
Create a new Batch Job
Create a new batch job, it will be queued for processing.
input_files
: a list of the batch input file IDs.model
: you can only use one model (e.g.,codestral-latest
) per batch. However, you can run multiple batches on the same files with different models if you want to compare outputs.endpoint
: we currently support/v1/embeddings
,/v1/chat/completions
,/v1/fim/completions
,/v1/moderations
,/v1/chat/moderations
,/v1/ocr
,/v1/classifications
,/v1/conversations
,/v1/audio/transcriptions
.metadata
: optional custom metadata for the batch.
created_job = client.batch.jobs.create(
input_files=[batch_data.id],
model="mistral-small-latest",
endpoint="/v1/chat/completions",
metadata={"job_type": "testing"}
)
Get/Retrieve
Retrieve your Batch Job
Once batch sent, you will want to retrieve a lot of information such as:
- The status of the batch job
- The results of the batch job
- The list of batch jobs
Get a batch job details
You can retrieve the details of a batch job by its ID.
retrieved_job = client.batch.jobs.get(job_id=created_job.id)
Get batch job results
Once the batch job is completed, you can easily download the results.
output_file_stream = client.files.download(file_id=retrieved_job.output_file)
# Write and save the file
with open('batch_results.jsonl', 'wb') as f:
f.write(output_file_stream.read())
List batch jobs
You can view a list of your batch jobs and filter them by various criteria, including:
- Status:
QUEUED
,RUNNING
,SUCCESS
,FAILED
,TIMEOUT_EXCEEDED
,CANCELLATION_REQUESTED
andCANCELLED
- Metadata: custom metadata key and value for the batch
list_job = client.batch.jobs.list(
status="RUNNING",
metadata={"job_type": "testing"}
)
Request Cancellation
Cancel any Job
If you want to cancel a batch job, you can do so by sending a cancellation request.
canceled_job = client.batch.jobs.cancel(job_id=created_job.id)
An end-to-end example
Below is an end-to-end example of how to use the batch API from start to finish.

¡Meow! Click one of the tabs above to learn more.