Batch Inference
Batching allows you to run inference on large inputs in parallel, reducing costs while running large workloads.
Prepare Batch
Prepare and Upload your Batch
A batch is composed of a list of API requests. The structure of an individual request includes:
- A unique
custom_idfor identifying each request and referencing results after completion - A
bodyobject with the raw request you would have when calling the original endpoint without batching
Here's an example of how to structure a batch request:
{"custom_id": "0", "body": {"max_tokens": 100, "messages": [{"role": "user", "content": "What is the best French cheese?"}]}}
{"custom_id": "1", "body": {"max_tokens": 100, "messages": [{"role": "user", "content": "What is the best French wine?"}]}}A batch body object can be any valid request body for the endpoint you are using. Below are examples of batches for different endpoints, they have their body match the endpoint's request body.
¡Meow! Click one of the tabs above to learn more.
For large batches of up to 1M requests, you would create a .jsonl file with the above data. Once saved, you can upload your batch input file to ensure it is correctly referenced when initiating batch processes.
For batches with less than 10k requests, we support inline batching.
There are 2 main methods of uploading a batch file:
A. Via AI Studio:
- Upload your files here: https://console.mistral.ai/build/files
- Upload the file in the format described previously.
- Set
purposeto Batch Processing.
- Start and Manage your batches here: https://console.mistral.ai/build/batches
- Create and start a job by providing the
files,endpointandmodel. You wont need to use the API to upload your files and/or create batching jobs.
- Create and start a job by providing the
B. Via the API, explained below:
To upload your batch file, you need to use the files endpoint.
from mistralai import Mistral
import os
api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)
batch_data = client.files.upload(
file={
"file_name": "test.jsonl",
"content": open("test.jsonl", "rb")
},
purpose = "batch"
)Batch Creation
Create a new Batch Job
Create a new batch job, it will be queued for processing.
- Requests Data: The data for the requests to be batched. There are two options:
input_files: a list of the batch input file IDs, see how to use file-batching.requests: a list of the requests to be batched, see how to use inline batching.
model: you can only use one model (e.g.,codestral-latest) per batch. However, you can run multiple batches on the same files with different models if you want to compare outputs.endpoint: we currently support/v1/embeddings,/v1/chat/completions,/v1/fim/completions,/v1/moderations,/v1/chat/moderations,/v1/ocr,/v1/classifications,/v1/conversations,/v1/audio/transcriptions.metadata: optional custom metadata for the batch.
File Batching
The standard batching approach relies on batch files containing all the requests to be processed. We support up to 1 million requests in a single batch, enabling efficient handling of large volumes of requests at a reduced cost. This is ideal for tasks with high throughput requirements but low latency sensitivity or priority.
created_job = client.batch.jobs.create(
input_files=[batch_data.id],
model="mistral-small-latest",
endpoint="/v1/chat/completions",
metadata={"job_type": "testing"}
)Inline-Batching
For batches of fewer than 10,000 requests, we support inline batching. Instead of creating and uploading a .jsonl file with all the request data, you can include the request body directly in the job creation request. This is convenient for smaller-scale or less bulk-intensive tasks.
from mistralai import Mistral
import os
api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)
inline_batch_data = [
{
"custom_id": "0",
"body": {
"max_tokens": 128,
"messages": [{"role": "user", "content": "What is the best French cheese?"}]
}
},
{
"custom_id": "1",
"body": {
"max_tokens": 512,
"temperature": 0.2,
"messages": [{"role": "user", "content": "What is the best French wine?"}]
}
},
{
"custom_id": "2",
"body": {
"max_tokens": 256,
"temperature": 0.8,
"messages": [{"role": "user", "content": "What is the best French pastry?"}]
}
},
]
created_job = client.batch.jobs.create(
requests=inline_batch_data,
model="mistral-small-latest",
endpoint="/v1/chat/completions",
metadata={"job_type": "testing"}
)Get/Retrieve
Retrieve your Batch Job
Once batch sent, you will want to retrieve a lot of information such as:
- The status of the batch job
- The results of the batch job
- The list of batch jobs
Get a batch job details
You can retrieve the details of a batch job by its ID.
retrieved_job = client.batch.jobs.get(job_id=created_job.id)Get batch job results
Once the batch job is completed, you can easily download the results.
output_file_stream = client.files.download(file_id=retrieved_job.output_file)
# Write and save the file
with open('batch_results.jsonl', 'wb') as f:
f.write(output_file_stream.read())List batch jobs
You can view a list of your batch jobs and filter them by various criteria, including:
- Status:
QUEUED,RUNNING,SUCCESS,FAILED,TIMEOUT_EXCEEDED,CANCELLATION_REQUESTEDandCANCELLED - Metadata: custom metadata key and value for the batch
list_job = client.batch.jobs.list(
status="RUNNING",
metadata={"job_type": "testing"}
)Request Cancellation
Cancel any Job
If you want to cancel a batch job, you can do so by sending a cancellation request.
canceled_job = client.batch.jobs.cancel(job_id=created_job.id)An end-to-end example
Below is an end-to-end example of how to use the batch API from start to finish.
¡Meow! Click one of the tabs above to learn more.