Embedders

Embedders convert text into vector embeddings for semantic search. Embeddings capture the semantic meaning of text, allowing similar concepts to be found even with different wording.

Embedder API

All embedders implement the Embedder abstract base class, which provides three methods:

Method	Signature	Description
`embed`	`(texts: list[str]) -> EmbeddingResult`	Core method — embed a batch of strings.
`embed_chunks`	`(chunks: list[DocumentChunk]) -> list[DocumentChunk]`	Embed chunks and return copies with embeddings set.
`embed_query`	`(text: str) -> list[float]`	Embed a single string and return the vector.

Only embed is abstract. embed_chunks and embed_query are concrete methods built on top of it.

`EmbeddingResult`

Returned by embed:

from pydantic import BaseModel

class EmbeddingResult(BaseModel):
    embeddings: list[list[float]]  # One embedding per input
    total_tokens: int              # Total tokens consumed

from pydantic import BaseModel

class EmbeddingResult(BaseModel):
    embeddings: list[list[float]]  # One embedding per input
    total_tokens: int              # Total tokens consumed

Mistral Embedder

Use Mistral's embedding API for vectorizing text.

Installation: Core library (no extra required)

Example:

from mistralai.client import Mistral
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING

client = Mistral(api_key="your-api-key")
embedder = MistralEmbedder(
    client=client,
    model_name=MODEL_1024_EMBEDDING,
)
embedded_chunks = await embedder.embed_chunks(chunks)

from mistralai.client import Mistral
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING

client = Mistral(api_key="your-api-key")
embedder = MistralEmbedder(
    client=client,
    model_name=MODEL_1024_EMBEDDING,
)
embedded_chunks = await embedder.embed_chunks(chunks)

Configuration:

Option	Type	Default	Purpose
`model_name`	str	`MODEL_128_EMBEDDING`	Embedding model to use (see model constants below)
`client`	Mistral	Required	Mistral API client (must be configured)

Embedding model constants:

Use these pre-defined constants when configuring MistralEmbedder. Always match the embedding_dimensions parameter in your Vespa schema to the model's output dimensions.

Constant	Model Name	Dimensions	Best for
`MODEL_1024_EMBEDDING`	`"mistral-embed"`	1024	Default for Search Toolkit pipelines (matches Vespa schema `embedding_dimensions=1024`)
`MODEL_256_EMBEDDING`	`"mistral-embed-dim256-2510"`	256	Low-latency, memory-constrained environments
`MODEL_128_EMBEDDING`	`"mistral-embed-dim128-2510"`	128	Minimum dimensions (default fallback)

Import and usage:

from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING, MODEL_256_EMBEDDING, MODEL_128_EMBEDDING

# Use MODEL_1024_EMBEDDING for Search Toolkit pipelines
embedder = MistralEmbedder(client=client, model_name=MODEL_1024_EMBEDDING)

# Or use a different model for specific use cases
embedder_compact = MistralEmbedder(client=client, model_name=MODEL_256_EMBEDDING)

from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING, MODEL_256_EMBEDDING, MODEL_128_EMBEDDING

# Use MODEL_1024_EMBEDDING for Search Toolkit pipelines
embedder = MistralEmbedder(client=client, model_name=MODEL_1024_EMBEDDING)

# Or use a different model for specific use cases
embedder_compact = MistralEmbedder(client=client, model_name=MODEL_256_EMBEDDING)

Batch processing (automatic):

# Embed_chunks handles batching internally
chunks_with_embeddings = await embedder.embed_chunks(chunks)

# For custom batching
embeddings = await embedder.embed([
    "text 1",
    "text 2",
    "text 3",
])

# Embed_chunks handles batching internally
chunks_with_embeddings = await embedder.embed_chunks(chunks)

# For custom batching
embeddings = await embedder.embed([
    "text 1",
    "text 2",
    "text 3",
])

Custom embedders

Implement the Embedder base class with a single embed method:

from mistralai.search.toolkit.embedders import Embedder, EmbeddingResult


class MyEmbedder(Embedder):
    async def embed(self, texts: list[str]) -> EmbeddingResult:
        embeddings = my_provider.embed(texts)
        return EmbeddingResult(
            embeddings=embeddings,
            total_tokens=sum(len(t.split()) for t in texts),
        )

from mistralai.search.toolkit.embedders import Embedder, EmbeddingResult


class MyEmbedder(Embedder):
    async def embed(self, texts: list[str]) -> EmbeddingResult:
        embeddings = my_provider.embed(texts)
        return EmbeddingResult(
            embeddings=embeddings,
            total_tokens=sum(len(t.split()) for t in texts),
        )

embed_chunks and embed_query are inherited automatically.