Retrievers

Retrievers execute the search against your index. Search Toolkit provides vector (semantic) retrievers for semantic search over your documents.

Available retrievers

Retriever	Purpose
Vector Retriever	Semantic search using embeddings
Custom Retrievers	Custom search logic

Vector Retriever

Performs semantic search using embeddings. Finds chunks with similar meaning to the query, even if they use different words.

Installation: Core library (no extra required)

Example:

from mistralai.search.toolkit.retrieval.retrievers import VectorRetriever

retriever = VectorRetriever(
    client=vector_store,  # Vector store client (Vespa or custom implementation)
    embedder=embedder,    # MistralEmbedder or custom Embedder
)

results = await retriever.retrieve(
    query="What is semantic search?",
    top_k=10,
)

from mistralai.search.toolkit.retrieval.retrievers import VectorRetriever

retriever = VectorRetriever(
    client=vector_store,  # Vector store client (Vespa or custom implementation)
    embedder=embedder,    # MistralEmbedder or custom Embedder
)

results = await retriever.retrieve(
    query="What is semantic search?",
    top_k=10,
)

Configuration options:

Option	Type	Default	Purpose
`client`	VectorStoreClient	Required	Vector store to search
`embedder`	Embedder	Required	Embedder for query vectorization

Filtering:

Filter by metadata to narrow results:

results = await retriever.retrieve(
    query="machine learning",
    top_k=10,
    filter={"category": "tutorials", "year": 2024},  # Depends on store support
)

results = await retriever.retrieve(
    query="machine learning",
    top_k=10,
    filter={"category": "tutorials", "year": 2024},  # Depends on store support
)

Information

Filtering support depends on your vector store backend. Vespa supports metadata filtering natively.

When to use:

General semantic search across your documents
Finding conceptually similar content
Handling synonyms and paraphrasing naturally

Custom retrievers

Implement the Retriever protocol for custom search logic:

from mistralai.search.toolkit.retrieval.retrievers import Retriever
from mistralai.search.toolkit.indices import SearchResult

class CustomRetriever(Retriever):
    """Custom retriever with domain-specific logic."""

    async def retrieve(
        self,
        query: str,
        top_k: int = 10,
        filter: dict | None = None,
    ) -> list[SearchResult]:
        """Retrieve relevant chunks for the query."""
        # 1. Preprocess query
        processed_query = self._preprocess(query)

        # 2. Search with custom logic
        results = await self._search(processed_query, top_k, filter)

        # 3. Post-process or augment results
        enhanced_results = self._enhance(results)

        return enhanced_results

    def _preprocess(self, query: str) -> str:
        # Custom preprocessing (stemming, lemmatization, etc.)
        return query.lower().strip()

    async def _search(self, query: str, top_k: int, filter: dict) -> list[SearchResult]:
        # Custom search implementation
        ...

    def _enhance(self, results: list[SearchResult]) -> list[SearchResult]:
        # Augment results with additional context
        ...

# Use in QueryEngine
query_engine = QueryEngine(retriever=CustomRetriever())

from mistralai.search.toolkit.retrieval.retrievers import Retriever
from mistralai.search.toolkit.indices import SearchResult

class CustomRetriever(Retriever):
    """Custom retriever with domain-specific logic."""

    async def retrieve(
        self,
        query: str,
        top_k: int = 10,
        filter: dict | None = None,
    ) -> list[SearchResult]:
        """Retrieve relevant chunks for the query."""
        # 1. Preprocess query
        processed_query = self._preprocess(query)

        # 2. Search with custom logic
        results = await self._search(processed_query, top_k, filter)

        # 3. Post-process or augment results
        enhanced_results = self._enhance(results)

        return enhanced_results

    def _preprocess(self, query: str) -> str:
        # Custom preprocessing (stemming, lemmatization, etc.)
        return query.lower().strip()

    async def _search(self, query: str, top_k: int, filter: dict) -> list[SearchResult]:
        # Custom search implementation
        ...

    def _enhance(self, results: list[SearchResult]) -> list[SearchResult]:
        # Augment results with additional context
        ...

# Use in QueryEngine
query_engine = QueryEngine(retriever=CustomRetriever())

Retrievers

Available retrievers

Vector Retriever

Custom retrievers

See also