Embedders
Embedders convert text into vector embeddings for semantic search. Embeddings capture the semantic meaning of text, allowing similar concepts to be found even with different wording.
Embedder API
All embedders implement the Embedder abstract base class, which provides three methods:
| Method | Signature | Description |
|---|---|---|
embed | (texts: list[str]) -> EmbeddingResult | Core method — embed a batch of strings. |
embed_chunks | (chunks: list[DocumentChunk]) -> list[DocumentChunk] | Embed chunks and return copies with embeddings set. |
embed_query | (text: str) -> list[float] | Embed a single string and return the vector. |
Only embed is abstract. embed_chunks and embed_query are concrete methods built on top of it.
EmbeddingResult
Returned by embed:
from pydantic import BaseModel
class EmbeddingResult(BaseModel):
embeddings: list[list[float]] # One embedding per input
total_tokens: int # Total tokens consumedMistral Embedder
Use Mistral's embedding API for vectorizing text.
Installation: Core library (no extra required)
Example:
from mistralai.client import Mistral
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
client = Mistral(api_key="your-api-key")
embedder = MistralEmbedder(
client=client,
model_name=MODEL_1024_EMBEDDING,
)
embedded_chunks = await embedder.embed_chunks(chunks)Configuration:
| Option | Type | Default | Purpose |
|---|---|---|---|
model_name | str | MODEL_128_EMBEDDING | Embedding model to use (see model constants below) |
client | Mistral | Required | Mistral API client (must be configured) |
Embedding model constants:
Use these pre-defined constants when configuring MistralEmbedder. Always match the embedding_dimensions parameter in your Vespa schema to the model's output dimensions.
| Constant | Model Name | Dimensions | Best for |
|---|---|---|---|
MODEL_1024_EMBEDDING | "mistral-embed" | 1024 | Default for Search Toolkit pipelines (matches Vespa schema embedding_dimensions=1024) |
MODEL_256_EMBEDDING | "mistral-embed-dim256-2510" | 256 | Low-latency, memory-constrained environments |
MODEL_128_EMBEDDING | "mistral-embed-dim128-2510" | 128 | Minimum dimensions (default fallback) |
Import and usage:
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING, MODEL_256_EMBEDDING, MODEL_128_EMBEDDING
# Use MODEL_1024_EMBEDDING for Search Toolkit pipelines
embedder = MistralEmbedder(client=client, model_name=MODEL_1024_EMBEDDING)
# Or use a different model for specific use cases
embedder_compact = MistralEmbedder(client=client, model_name=MODEL_256_EMBEDDING)Batch processing (automatic):
# Embed_chunks handles batching internally
chunks_with_embeddings = await embedder.embed_chunks(chunks)
# For custom batching
embeddings = await embedder.embed([
"text 1",
"text 2",
"text 3",
])Custom embedders
Implement the Embedder base class with a single embed method:
from mistralai.search.toolkit.embedders import Embedder, EmbeddingResult
class MyEmbedder(Embedder):
async def embed(self, texts: list[str]) -> EmbeddingResult:
embeddings = my_provider.embed(texts)
return EmbeddingResult(
embeddings=embeddings,
total_tokens=sum(len(t.split()) for t in texts),
)embed_chunks and embed_query are inherited automatically.