Retrievers

Retrievers execute the search against your index. Search Toolkit provides vector (semantic) retrievers for semantic search over your documents.

Available retrievers

Available retrievers

RetrieverPurpose
Vector RetrieverSemantic search using embeddings
Custom RetrieversCustom search logic
Vector Retriever

Vector Retriever

Performs semantic search using embeddings. Finds chunks with similar meaning to the query, even if they use different words.

Installation: Core library (no extra required)

Example:

from mistralai.search.toolkit.retrieval.retrievers import VectorRetriever

retriever = VectorRetriever(
    client=vector_store,  # Vector store client (Vespa or custom implementation)
    embedder=embedder,    # MistralEmbedder or custom Embedder
)

results = await retriever.retrieve(
    query="What is semantic search?",
    top_k=10,
)

Configuration options:

OptionTypeDefaultPurpose
clientVectorStoreClientRequiredVector store to search
embedderEmbedderRequiredEmbedder for query vectorization

Filtering:

Filter by metadata to narrow results:

results = await retriever.retrieve(
    query="machine learning",
    top_k=10,
    filter={"category": "tutorials", "year": 2024},  # Depends on store support
)
i
Information

Filtering support depends on your vector store backend. Vespa supports metadata filtering natively.

When to use:

  • General semantic search across your documents
  • Finding conceptually similar content
  • Handling synonyms and paraphrasing naturally
Custom retrievers

Custom retrievers

Implement the Retriever protocol for custom search logic:

from mistralai.search.toolkit.retrieval.retrievers import Retriever
from mistralai.search.toolkit.indices import SearchResult

class CustomRetriever(Retriever):
    """Custom retriever with domain-specific logic."""

    async def retrieve(
        self,
        query: str,
        top_k: int = 10,
        filter: dict | None = None,
    ) -> list[SearchResult]:
        """Retrieve relevant chunks for the query."""
        # 1. Preprocess query
        processed_query = self._preprocess(query)

        # 2. Search with custom logic
        results = await self._search(processed_query, top_k, filter)

        # 3. Post-process or augment results
        enhanced_results = self._enhance(results)

        return enhanced_results

    def _preprocess(self, query: str) -> str:
        # Custom preprocessing (stemming, lemmatization, etc.)
        return query.lower().strip()

    async def _search(self, query: str, top_k: int, filter: dict) -> list[SearchResult]:
        # Custom search implementation
        ...

    def _enhance(self, results: list[SearchResult]) -> list[SearchResult]:
        # Augment results with additional context
        ...

# Use in QueryEngine
query_engine = QueryEngine(retriever=CustomRetriever())
See also

See also