Embedders

Embedders convert text into vector embeddings for semantic search. Embeddings capture the semantic meaning of text, allowing similar concepts to be found even with different wording.

Embedder API

Embedder API

All embedders implement the Embedder abstract base class, which provides three methods:

MethodSignatureDescription
embed(texts: list[str]) -> EmbeddingResultCore method — embed a batch of strings.
embed_chunks(chunks: list[DocumentChunk]) -> list[DocumentChunk]Embed chunks and return copies with embeddings set.
embed_query(text: str) -> list[float]Embed a single string and return the vector.

Only embed is abstract. embed_chunks and embed_query are concrete methods built on top of it.

EmbeddingResult

Returned by embed:

from pydantic import BaseModel

class EmbeddingResult(BaseModel):
    embeddings: list[list[float]]  # One embedding per input
    total_tokens: int              # Total tokens consumed
Mistral Embedder

Mistral Embedder

Use Mistral's embedding API for vectorizing text.

Installation: Core library (no extra required)

Example:

from mistralai.client import Mistral
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING

client = Mistral(api_key="your-api-key")
embedder = MistralEmbedder(
    client=client,
    model_name=MODEL_1024_EMBEDDING,
)
embedded_chunks = await embedder.embed_chunks(chunks)

Configuration:

OptionTypeDefaultPurpose
model_namestrMODEL_128_EMBEDDINGEmbedding model to use (see model constants below)
clientMistralRequiredMistral API client (must be configured)

Embedding model constants:

Use these pre-defined constants when configuring MistralEmbedder. Always match the embedding_dimensions parameter in your Vespa schema to the model's output dimensions.

ConstantModel NameDimensionsBest for
MODEL_1024_EMBEDDING"mistral-embed"1024Default for Search Toolkit pipelines (matches Vespa schema embedding_dimensions=1024)
MODEL_256_EMBEDDING"mistral-embed-dim256-2510"256Low-latency, memory-constrained environments
MODEL_128_EMBEDDING"mistral-embed-dim128-2510"128Minimum dimensions (default fallback)

Import and usage:

from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING, MODEL_256_EMBEDDING, MODEL_128_EMBEDDING

# Use MODEL_1024_EMBEDDING for Search Toolkit pipelines
embedder = MistralEmbedder(client=client, model_name=MODEL_1024_EMBEDDING)

# Or use a different model for specific use cases
embedder_compact = MistralEmbedder(client=client, model_name=MODEL_256_EMBEDDING)

Batch processing (automatic):

# Embed_chunks handles batching internally
chunks_with_embeddings = await embedder.embed_chunks(chunks)

# For custom batching
embeddings = await embedder.embed([
    "text 1",
    "text 2",
    "text 3",
])
Custom embedders

Custom embedders

Implement the Embedder base class with a single embed method:

from mistralai.search.toolkit.embedders import Embedder, EmbeddingResult


class MyEmbedder(Embedder):
    async def embed(self, texts: list[str]) -> EmbeddingResult:
        embeddings = my_provider.embed(texts)
        return EmbeddingResult(
            embeddings=embeddings,
            total_tokens=sum(len(t.split()) for t in texts),
        )

embed_chunks and embed_query are inherited automatically.