Code Embeddings
Embeddings are at the core of multiple enterprise use cases, such as retrieval systems, clustering, code analytics, classification, and a variety of search applications. With code embedings, you can embed code databases and repositories, and power coding assistants with state-of-the-art retrieval capabilities.
Codestral Embed API
To generate code embeddings using Mistral AI's embeddings API, we can make a request to the API endpoint and specify the embedding model codestral-embed
, along with providing a list of input texts. The API will then return the corresponding embeddings as numerical vectors, which can be used for further analysis or processing in NLP applications.
We also provide output_dtype
and output_dimension
parameters that allow you to control the type and dimensional size of your embeddings.
Output DType
output_dtype
allows you to select the precision and format of the embeddings, enabling you to obtain embeddings with your desired level of numerical accuracy and representation.
The accepted dtypes are:
- float (default): A list of 32-bit (4-byte) single-precision floating-point numbers. Provides the highest precision and retrieval accuracy.
- int8: A list of 8-bit (1-byte) integers ranging from -128 to 127.
- uint8: A list of 8-bit (1-byte) integers ranging from 0 to 255.
- binary: A list of 8-bit integers that represent bit-packed, quantized single-bit embedding values using the
int8
type. The length of the returned list of integers is 1/8 ofoutput_dimension
. This type uses the offset binary method. - ubinary: Similar to
binary
, but uses theuint8
type for bit-packed, quantized single-bit embedding values.
Output Dimension
output_dimension
allows you to select a specific size for the embedding, enabling you to obtain an embedding of your chosen dimension, defaults to 1536 and has a maximum value of 3072.
For any integer target dimension n, you can choose to retain the first n dimensions. These dimensions are ordered by relevance, and the first n are selected for a smooth trade-off between quality and cost.
Usage
Below is an example of how to use the embeddings API to generate embeddings for a list of input texts code related.
import os
from mistralai import Mistral
api_key = os.environ["MISTRAL_API_KEY"]
model = "codestral-embed"
client = Mistral(api_key=api_key)
embeddings_batch_response = client.embeddings.create(
model=model,
# output_dtype="binary",
# output_dimension=512,
inputs=[
"Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target. You may assume that each input would have exactly one solution, and you may not use the same element twice. You can return the answer in any order. Example 1: Input: nums = [2,7,11,15], target = 9 Output: [0,1] Explanation: Because nums[0] + nums[1] == 9, we return [0, 1]. Example 2: Input: nums = [3,2,4], target = 6 Output: [1,2] Example 3: Input: nums = [3,3], target = 6 Output: [0,1] Constraints: 2 <= nums.length <= 104 -109 <= nums[i] <= 109 -109 <= target <= 109 Only one valid answer exists.",
"class Solution: def twoSum(self, nums: List[int], target: int) -> List[int]: d = {} for i, x in enumerate(nums): if (y := target - x) in d: return [d[y], i] d[x] = i"
],
)
Let's take a look at the length of the first embedding:
len(embeddings_batch_response.data[0].embedding)
It returns 1553, which means that our embedding dimension is 1553. The codestral-embed
model generates embedding vectors up to dimensions of 3072 for each text string, regardless of the text length, you can reduce the dimension using output_dimension
if needed. It's worth nothing that while higher dimensional embeddings can better capture text information and improve the performance of NLP tasks, they may require more resources and may result in increased latency and memory usage for storing and processing these embeddings. This trade-off between performance and computational resources should be considered when designing NLP systems that rely on text embeddings.
Usage Examples
Below you will find some examples of how to use the Codestral Embeddings API and different use cases.

¡Meow! Click one of the tabs above to learn more.
Cookbooks
For more information and guides on how to make use of our embedding sdk, we have the following cookbooks: