Skip to main content

Fine-tuning

tip

For detailed end-to-end fine-tuning examples and FAQ, check out our fine-tuning guide.

warning

Every fine-tuning job comes with a minimum fee of $4, and there's a monthly storage fee of $2 for each model. For more detailed pricing information, please visit our pricing page.

Fine-tuning basics

Fine-tuning vs. prompting

When deciding whether to use prompt engineering or fine-tuning for an AI model, it can be difficult to determine which method is best. It's generally recommended to start with prompt engineering, as it's faster and less resource-intensive. To help you choose the right approach, here are the key benefits of prompting and fine-tuning:

  • Benefits of Prompting

    • A generic model can work out of the box (the task can be described in a zero shot fashion)
    • Does not require any fine-tuning data or training to work
    • Can easily be updated for new workflows and prototyping

    Check out our prompting guide to explore various capabilities of Mistral models.

  • Benefits of Fine-tuning

    • Works significantly better than prompting
    • Typically works better than a larger model (faster and cheaper because it doesn't require a very long prompt)
    • Provides a better alignment with the task of interest because it has been specifically trained on these tasks
    • Can be used to teach new facts and information to the model (such as advanced tools or complicated workflows)

Common use cases

Fine-tuning has a wide range of use cases, some of which include:

  • Customizing the model to generate responses in a specific format and tone
  • Specializing the model for a specific topic or domain to improve its performance on domain-specific tasks
  • Improving the model through distillation from a stronger and more powerful model by training it to mimic the behavior of the larger model
  • Enhancing the model’s performance by mimicking the behavior of a model with a complex prompt, but without the need for the actual prompt, thereby saving tokens, and reducing associated costs
  • Reducing cost and latency by using a small yet efficient fine-tuned model

Dataset Format

Data must be stored in JSON Lines (.jsonl) files, which allow storing multiple JSON objects, each on a new line.

Datasets should follow an instruction-following format representing a user-assistant conversation. Each JSON data sample should either consist of only user and assistant messages ("Default Instruct") or include function-calling logic ("Function-calling Instruct").

1. Default Instruct

Conversational data between user and assistant, which can be one-turn or multi-turn. Example:

{
"messages": [
{
"role": "user",
"content": "User interaction n°1 contained in document n°2"
},
{
"role": "assistant",
"content": "Bot interaction n°1 contained in document n°2"
},
{
"role": "user",
"content": "User interaction n°2 contained in document n°1"
},
{
"role": "assistant",
"content": "Bot interaction n°2 contained in document n°1"
}
]
}
  • Conversational data must be stored under the "messages" key as a list.
  • Each list item is a dictionary containing the "content" and "role" keys. "role" is a string: "user", "assistant", or "system".
  • Loss computation is performed only on tokens corresponding to assistant messages ("role" == "assistant").

2. Function-calling Instruct

Conversational data with tool usage. Example:

{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant with access to the following functions to help the user. You can use the functions if needed."
},
{
"role": "user",
"content": "Can you help me generate an anagram of the word 'listen'?"
},
{
"role": "assistant",
"tool_calls": [
{
"id": "TX92Jm8Zi",
"type": "function",
"function": {
"name": "generate_anagram",
"arguments": "{\"word\": \"listen\"}"
}
}
]
},
{
"role": "tool",
"content": "{\"anagram\": \"silent\"}",
"tool_call_id": "TX92Jm8Zi"
},
{
"role": "assistant",
"content": "The anagram of the word 'listen' is 'silent'."
},
{
"role": "user",
"content": "That's amazing! Can you generate an anagram for the word 'race'?"
},
{
"role": "assistant",
"tool_calls": [
{
"id": "3XhQnxLsT",
"type": "function",
"function": {
"name": "generate_anagram",
"arguments": "{\"word\": \"race\"}"
}
}
]
}
],
"tools": [
{
"type": "function",
"function": {
"name": "generate_anagram",
"description": "Generate an anagram of a given word",
"parameters": {
"type": "object",
"properties": {
"word": {
"type": "string",
"description": "The word to generate an anagram of"
}
},
"required": ["word"]
}
}
}
]
}
  • Conversational data must be stored under the "messages" key as a list.
  • Each message is a dictionary containing the "role" and "content" or "tool_calls" keys. "role" should be one of "user", "assistant", "system", or "tool".
  • Only messages of type "assistant" can have a "tool_calls" key, representing the assistant performing a call to an available tool.
  • An assistant message with a "tool_calls" key cannot have a "content" key and must be followed by a "tool" message, which in turn must be followed by another assistant message.
  • The "tool_call_id" of tool messages must match the "id" of at least one of the previous assistant messages.
  • Both "id" and "tool_call_id" are randomly generated strings of exactly 9 characters. We recommend generating these automatically in a data preparation script as done here.
  • The "tools" key must include definitions of all tools used in the conversation.
  • Loss computation is performed only on tokens corresponding to assistant messages ("role" == "assistant").

Upload a file

Once you have the data file with the right format, you can upload the data file to the Mistral Client, making them available for use in fine-tuning jobs.

from mistralai import Mistral
import os

api_key = os.environ["MISTRAL_API_KEY"]

client = Mistral(api_key=api_key)

training_data = client.files.upload(
file={
"file_name": "ultrachat_chunk_train.jsonl",
"content": open("ultrachat_chunk_train.jsonl", "rb"),
}
)

Create a fine-tuning job

The next step is to create a fine-tuning job.

  • model: the specific model you would like to fine-tune. The choices are open-mistral-7b (v0.3), mistral-small-latest (mistral-small-2402), codestral-latest (codestral-2405), open-mistral-nemo and , mistral-large-latest (mistral-large-2407).
  • training_files: a collection of training file IDs, which can consist of a single file or multiple files
  • validation_files: a collection of validation file IDs, which can consist of a single file or multiple files
  • hyperparameters: two adjustable hyperparameters, "training_step" and "learning_rate", that users can modify.
  • auto_start:
    • auto_start=True: Your job will be launched immediately after validation.
    • auto_start=False (default): You can manually start the training after validation by sending a POST request to /fine_tuning/jobs/<uuid>/start.
# create a fine-tuning job
created_jobs = client.fine_tuning.jobs.create(
model="open-mistral-7b",
training_files=[{"file_id": ultrachat_chunk_train.id, "weight": 1}],
validation_files=[ultrachat_chunk_eval.id],
hyperparameters={
"training_steps": 10,
"learning_rate":0.0001
},
auto_start=False
)

# start a fine-tuning job
client.fine_tuning.jobs.start(job_id = created_jobs.id)

created_jobs

List/retrieve/cancel jobs

You can also list jobs, retrieve a job, or cancel a job.

You can filter and view a list of jobs using various parameters such as page, page_size, model, created_after, created_by_me, status, wandb_project, wandb_name, and suffix. Check out our API specs for details.

# List jobs
jobs = client.fine_tuning.jobs.list()
print(jobs)

# Retrieve a jobs
retrieved_jobs = client.fine_tuning.jobs.get(job_id = created_jobs.id)
print(retrieved_jobs)

# Cancel a jobs
canceled_jobs = client.fine_tuning.jobs.cancel(job_id = created_jobs.id)
print(canceled_jobs)

Use a fine-tuned model

When a fine-tuned job is finished, you will be able to see the fine-tuned model name via retrieved_jobs.fine_tuned_model. Then you can use our chat endpoint to chat with the fine-tuned model:

chat_response = client.chat.complete(
model=retrieved_job.fine_tuned_model,
messages = [{"role":'user', "content":'What is the best French cheese?'}]
)

Delete a fine-tuned model

client.models.delete(model_id=retrieved_job.fine_tuned_model)

FAQ

How to validate data format?

  • Mistral API: We currently validate each file when you upload the dataset.

  • mistral-finetune: You can run the data validation script to validate the data and run the reformat data script to reformat the data to the right format:

    # download the reformat script
    wget https://raw.githubusercontent.com/mistralai/mistral-finetune/main/utils/reformat_data.py
    # download the validation script
    wget https://raw.githubusercontent.com/mistralai/mistral-finetune/main/utils/validate_data.py
    # reformat data
    python reformat_data.py data.jsonl
    # validate data
    python validate_data.py data.jsonl

    However, it's important to note that these scripts might not detect all problematic cases. Therefore, you may need to manually validate and correct any unique edge cases in your data.

What's the size limit of the training data?

While the size limit for an individual training data file is 512MB, there's no limitation on the number of files you can upload. You can upload multiple files and reference them when creating the job.

What's the size limit of the validation data?

The size limit for the validation data is 1MB. As a rule of thumb:

validation_set_max_size = min(1MB, 5% of training data)

How many epochs are in the training process?

A general rule of thumb is: Num epochs = max_steps / file_of_training_jsonls_in_MB. For instance, if your training file is 100MB and you set max_steps=1000, the training process will roughly perform 10 epochs.

Where can I find information on cost/ ETA / number of tokens / number of passes over each files?

Mistral API: When you create a fine-tuning job, you should automatically see these info with the default auto_start=False argument.

Note that the dry_run=True argument will be removed in September.

mistral-finetune: You can use the following script to find out: https://github.com/mistralai/mistral-finetune/blob/main/utils/validate_data.py. This script accepts a .yaml training file as input and returns the number of tokens the model is being trained on.

How to estimate cost of a fine-tuning job?

For Mistral API, you can use the auto_start=False argument as mentioned in the previous question.

For LoRA fine-tuning, we recommended 1e-4 (default) or 1e-5.

Note that the learning rate we define is the peak learning rate, instead of a flat learning rate. The learning rate follows a linear warmup and cosine decay schedule. During the warmup phase, the learning rate is linearly increased from a small initial value to a larger value over a certain number of training steps. After the warmup phase, the learning rate is decayed using a cosine function.

Is the fine-tuning API compatible with OpenAI data format?

Yes, we support OpenAI format.

What if my file size is larger than 500MB and I get the error message 413 Request Entity Too Large?

You can split your data file into chunks. Here is an example:

Details
import json
from datasets import load_dataset

# get data from hugging face
ds = load_dataset("HuggingFaceH4/ultrachat_200k",split="train_gen")

# save data into .jsonl. This file is about 1.3GB
with open('train.jsonl', 'w') as f:
for line in ds:
json.dump(line, f)
f.write('\n')

# reformat data
!wget https://raw.githubusercontent.com/mistralai/mistral-finetune/main/utils/reformat_data.py
!python reformat_data.py train.jsonl

# Split file into three chunks
input_file = "train.jsonl"
output_files = ["train_1.jsonl", "train_2.jsonl", "train_3.jsonl"]
# open the output files
output_file_objects = [open(file, "w") for file in output_files]
# counter for output files
counter = 0
with open(input_file, "r") as f_in:
# read the input file line by line
for line in f_in:
# parse the line as JSON
data = json.loads(line)
# write the data to the current output file
output_file_objects[counter].write(json.dumps(data) + "\n")
# increment the counter
counter = (counter + 1) % 3
# close the output files
for file in output_file_objects:
file.close()

# now you should see three jsonl files under 500MB