Model customization
Otherwise known as "How to Build an Application with a Custom Model"
Overview
The following is a quick guide on how to build an application with a custom model. Our goal is to help developers build product operations for LLMs to go from a prototype to deployment.
AI is a tool, building applications that harness AI make them more useful and practical to your end users.
Before LLMs, AI applications were built around personalization, precision, and prediction. Traditional AI applications are catered towards predicting your next choice and recommending it to you based on your previous behavior and “customers like you”.
In contrast, LLM applications are built around Human-AI collaboration. As a developer and the end user, you have more agency in the customisation of your product. You can create something that did not exist before.
Applications built with custom LLMs require an iterative development cycle, relying on continuous end user feedback and rigorous evals to ensure that your custom model behavior is aligned to the intended application behavior.
Key terms
Before we get started, let’s define key terms:
Application behavior can be defined as the user interaction. It takes into account usability, performance, safety, and adaptability. Application behavior includes Objectives and Values.
Model behavior can be defined as the expected, appropriate, and acceptable way of an LLM acting in a specific context or application boundaries. Model behavior includes Objectives and Values.
Objectives determine whether the model behavior is in line with the expected application behavior.
Values denotes the developers’ intended policy for the model and application. This can be a set of rules, a Constitution, or even a fictional character’s morals.
Steerability: three methods
There are several techniques (with varying levels of engineering complexity) available to steer model behavior within your application context. We recommend leveraging the three methods below to do so:
- System prompt
- Tune a model
- Deploy a moderation layer for input/output processing
A System Prompt is a method to provide context, instructions, and guidelines to your model before the model is tasked with user input data (prompt guide). By using a system prompt, you can steer the model to better align to your intended product behavior - whether the application is a conversation or task, you can specify a persona, personality, tone, values, or any other relevant information that may help your model better perform in response to the end user’s input.
System prompts can include:
- Clear and specific instructions and objectives
- Roles, desired persona and tone
- Guidance on style e.g. verbosity constraints
- Value definitions e.g. policies, rules and safeguards
- Desired output format
Tuning a model is a method to train the model on your intended application behavior (fine-tuning guide). Two popular approaches for tuning LLMs:
- Application tuning, where you leverage a dataset of examples specified to the desired behavior of your application.
- Safety tuning, where you leverage a dataset that specifies both example inputs that might result in unsafe behavior, along with the desired safe output in that situation.
Deploying a classifier for content moderation is a third method to create guardrails for your model’s behavior within the application. This is considered an extra security measure in case you are deploying your application to end users.
Guide for tuning a model to your intended application behavior
Step 1: Define your intended Application Behavior
The first step is to define the Objectives, i.e. how you want users to interact with your LLM product.
For inspiration, look to developers building with Mistral models:
- standalone products like conversational assistants;
- within pre-existing products to complete a specific task like “Summarize” or “Translate” or enable new capabilities like function calling with API access for “Knowledge retrieval”.
Learn how others are building products with custom models here: developer examples.
Step 2: Define your policies based on your Values
When you deploy an LLM within an end-user facing application, you identify which Values the model will need to abide by in order to meet your Content Moderation guidelines along with your user expectations.
For Content Moderation, look for inspiration from Llama Guard’s categories like Privacy, Hate, and Specialized Advice and ML Commons Taxonomy categories like CSAM and Hate.
Step 3: Create your Application Evals
The goal of your evals is to enable you to have better signal on whether your custom model’s behavior will meet your Application behavior before deployment. Identifying how you want to evaluate your custom model will help determine the type of training data to include in the fine-tuning.
There are two methods to evaluate an LLM:
- Automated Evals
- Metrics-based, similar to the public benchmark evaluations where you can derive a metric from pre-annotated data for example.
- LLM-based, where you leverage a different LLM like Mistral Large to evaluate or judge the output of your custom model.
- Human-based Evals, where you employ Content Annotators to evaluate or judge the output of your custom model and collect Human annotations.
For more on how to conduct an LLM Evaluation, check out our evaluation guide.
Step 4: Test your application behavior hypothesis with an MVP powered by Mistral Large
Once you understand the intent of your custom LLM and the contours of how you want the model to behave, begin by testing your application hypothesis with Mistral Large and collect the interaction data to better understand how your end users may interact with your LLM. For example, many developers begin their process by creating a Demo or MVP with limited access (a Private Beta).
For some applications, a system prompt is the best solution for an aligned model behavior. If you need help deciding between the two, look to our fine-tuning guide.
If a system prompt works creating a Custom Model, skip to Step 6.