Rerankers
Rerankers apply more sophisticated scoring to improve ranking quality. Initial retrieval is fast but approximate; rerankers use deeper models or logic to refine results.
Search Toolkit provides rerankers in two categories:
ReRanker— operates on a single result list (LLMReRanker, CrossEncoderReRanker)GroupedRanker— operates on multiple result groups (RRFRanker for fusion)
Multiple rerankers can be chained in QueryEngine.rerankers list and are applied sequentially.
Available rerankers
| Reranker | Type | Purpose |
|---|---|---|
| LLM ReRanker | ReRanker | Deep relevance scoring using an LLM |
| Cross-Encoder ReRanker | ReRanker | Fast reranking using dedicated model |
| RRF Ranker | GroupedRanker | Fuse results from multiple retrievers |
| Custom Rerankers | ReRanker / GroupedRanker | Custom scoring and fusion logic |
LLM ReRanker
Uses an LLM to re-score results with deeper understanding of relevance to the query.
Installation: Core library (no extra required)
Example:
from mistralai.search.toolkit.retrieval.rerankers import LLMReRanker
from mistralai.search.toolkit.retrieval import QueryEngine
from mistralai.search.toolkit.llm import MistralChat, LLMConfig
from mistralai.client import Mistral
llm = MistralChat(
client=Mistral(api_key="your-api-key"),
config=LLMConfig(
model="mistral-small-latest",
temperature=0.1,
response_format={"type": "json_object"},
),
)
reranker = LLMReRanker(
llm_provider=llm,
top_k=10, # Return top 10 after reranking
)
# Use in QueryEngine
query_engine = QueryEngine(
retriever=vector_retriever,
rerankers=[reranker],
)
result = await query_engine.search(query="What is RAG?", top_k=10)Configuration options:
| Option | Type | Default | Purpose |
|---|---|---|---|
llm_provider | LLMProvider | Required | LLM to use for scoring (MistralChat, etc.) |
top_k | int | 10 | Number of results to return after reranking |
batch_size | int | 10 | Batch size for LLM scoring |
How it works:
- LLM receives the query and retrieved chunks
- LLM scores relevance for each chunk
- Chunks are sorted by LLM score
- Top-k results are returned
When to use:
- Semantic relevance judgments beyond vector similarity
- Complex domain knowledge required for ranking
- After initial retrieval to improve quality
- Lower throughput tolerance (LLM calls are slower)
Cost optimization:
LLM reranking is expensive (1 LLM call per chunk). Reduce cost by:
# 1. Get many results from retriever (fast)
query_engine = QueryEngine(
retriever=vector_retriever,
rerankers=[LLMReRanker(llm_provider=llm, top_k=10)],
)
# 2. Vector retriever returns top 100 to reranker (cheaper than querying LLM for all)
result = await query_engine.search(query="...", top_k=10)Cross-Encoder ReRanker
Uses a hosted cross-encoder model optimized for reranking. Faster than LLM reranking but requires a dedicated model server.
Installation: Core library (no extra required)
Example:
from mistralai.search.toolkit.retrieval.rerankers import CrossEncoderReRanker
from mistralai.search.toolkit.retrieval import QueryEngine
reranker = CrossEncoderReRanker(
base_url="http://localhost:8080", # Cross-encoder service endpoint
model="cross-encoder/ms-marco-MiniLM-L-6-v2",
top_k=10,
)
query_engine = QueryEngine(
retriever=vector_retriever,
rerankers=[reranker],
)
result = await query_engine.search(query="What is RAG?", top_k=10)Configuration options:
| Option | Type | Default | Purpose |
|---|---|---|---|
base_url | str | Required | Cross-encoder service URL |
model | str | Required | Model identifier on the service |
top_k | int | 10 | Number of results to return |
timeout | int | 30 | Request timeout in seconds |
Popular models:
cross-encoder/ms-marco-MiniLM-L-6-v2— Fast, general-purposecross-encoder/ms-marco-TinyBERT-L-2-v2— Lightweight, low latencycross-encoder/qnli-distilroberta-base— Question-answer matching
When to use:
- Need reranking but want to avoid LLM costs
- Have a dedicated cross-encoder service running
- Require fast reranking (faster than LLM)
- Standard relevance datasets (MS MARCO, Natural Questions)
Custom rerankers
Implement the ReRanker protocol for single-list reranking:
from mistralai.search.toolkit.retrieval.rerankers import ReRanker
from mistralai.search.toolkit.indices import SearchResult
class MetadataReRanker(ReRanker):
"""Boost results based on metadata."""
async def rerank(
self, query: str, search_results: list[SearchResult]
) -> list[SearchResult]:
"""Re-score results by boosting verified content."""
for result in search_results:
# Boost chunks with verified metadata
if result.chunk.metadata.get("verified"):
result.score *= 1.5
# Penalize outdated content
if result.chunk.metadata.get("year") and result.chunk.metadata["year"] < 2020:
result.score *= 0.8
# Return sorted by new score
return sorted(search_results, key=lambda x: x.score, reverse=True)
# Use in QueryEngine
query_engine = QueryEngine(
retriever=vector_retriever,
rerankers=[MetadataReRanker()],
)Implement the GroupedRanker protocol for multi-group fusion:
from mistralai.search.toolkit.retrieval.rerankers import GroupedRanker
from mistralai.search.toolkit.indices import SearchResult
class CustomFusionRanker(GroupedRanker):
"""Custom fusion of multiple retriever results."""
async def rerank_groups(
self, query: str, result_groups: list[list[SearchResult]]
) -> list[SearchResult]:
"""Fuse multiple retriever results with custom logic."""
# result_groups[0] = vector retriever results
# result_groups[1] = keyword retriever results
# etc.
# Custom fusion logic
fused = self._custom_fusion(query, result_groups)
return fused
def _custom_fusion(self, query: str, groups: list[list[SearchResult]]) -> list[SearchResult]:
# Implement your fusion algorithm
...RRF Ranker
Reciprocal Rank Fusion (RRF) combines results from multiple retrievers using rank positions. Useful for fusing different retrieval strategies on the same index.
Installation: Core library (no extra required)
Example — multiple vector retrievers with different query profiles:
from mistralai.client import Mistral
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
from mistralai.search.toolkit.retrieval.retrievers import VectorRetriever
from mistralai.search.toolkit.retrieval.rerankers import RRFRanker
from mistralai.search.toolkit.retrieval import QueryEngine
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app
# Initialize embedder for query vectorization
embedder = MistralEmbedder(
client=Mistral(api_key="your-api-key"),
model_name=MODEL_1024_EMBEDDING,
)
# Create Vespa search backends with different query profiles
config = VespaClientConfig(
endpoint="http://localhost:8080",
)
# Use different query profiles for dense and hybrid search
dense_store = app.get_search_index(config, collection_name="my_collection", query_profile="dense")
hybrid_store = app.get_search_index(config, collection_name="my_collection", query_profile="hybrid")
# Create retrievers with different strategies
dense_retriever = VectorRetriever(client=dense_store, embedder=embedder)
hybrid_retriever = VectorRetriever(client=hybrid_store, embedder=embedder)
# Fuse results with RRF
query_engine = QueryEngine(
retriever=[dense_retriever, hybrid_retriever],
rerankers=[RRFRanker(rrf_k=60, top_k=10)],
)
result = await query_engine.search(query="How does RAG work?", top_k=10)Configuration options:
| Option | Type | Default | Purpose |
|---|---|---|---|
rrf_k | int | 60 | Smoothing factor (try 30-100) |
top_k | int | 10 | Number of results to return |
Tuning RRF:
# Lower rrf_k = more emphasis on exact rank positions
RRFRanker(rrf_k=30, top_k=10) # Lower k, more differentiation
# Higher rrf_k = less emphasis on rank positions
RRFRanker(rrf_k=100, top_k=10) # Higher k, flattens differencesWhen to use:
- Combining multiple retrieval strategies on the same index (dense vectors, hybrid search, etc.)
- Results from multiple sources with incomparable scores
- No additional model or cost constraints
- Robust fusion that doesn't require score normalization
Chaining rerankers
Multiple rerankers can be applied in sequence for progressive refinement:
from mistralai.search.toolkit.retrieval.rerankers import RRFRanker, LLMReRanker
query_engine = QueryEngine(
retriever=[dense_retriever, hybrid_retriever],
rerankers=[
RRFRanker(rrf_k=60, top_k=50), # Step 1: Fuse results → top 50
MetadataReRanker(), # Step 2: Boost verified content
LLMReRanker(llm_provider=llm, top_k=10), # Step 3: Final LLM reranking
],
)
# Process flow:
# 1. Multiple retrievers return results
# 2. RRFRanker fuses them → top 50
# 3. MetadataReRanker applies scoring boost → still ~50
# 4. LLMReRanker scores → top 10
# Total cost: 1 RRF operation + metadata checks + 10 LLM calls (not 100+)
result = await query_engine.search(query="...", top_k=10)Best practices:
- Start with cheap operations (RRF fusion, metadata boosting)
- End with expensive operations (LLM reranking)
- Use rerankers to progressively narrow results and reduce costs
See also
- Retrieval overview — Retrieval pipeline architecture
- Retrievers — Vector search and custom retrievers
- Query preprocessing — Improve queries before retrieval