Query preprocessing

Query preprocessing improves user queries before retrieval. Techniques like rewriting and expansion can significantly improve retrieval quality, though they increase latency and cost.

LLM Query Rewriter

LLM Query Rewriter

Uses an LLM to reformulate queries into forms more likely to match indexed content. Converts informal language into structured queries, expands abbreviations, and clarifies intent.

Installation: Core library (no extra required)

Example:

from mistralai.search.toolkit.retrieval.pre_processors import LLMQueryRewriter
from mistralai.search.toolkit.llm import MistralChat, LLMConfig
from mistralai.client import Mistral

llm = MistralChat(
    client=Mistral(api_key="your-api-key"),
    config=LLMConfig(model="mistral-small-latest"),
)

rewriter = LLMQueryRewriter(llm_provider=llm)

# Original: "rag mistral"
rewritten = await rewriter.rewrite("rag mistral")
# Result: "What is Retrieval-Augmented Generation with Mistral AI?"

# Use in QueryEngine
query_engine = QueryEngine(
    retriever=vector_retriever,
    query_rewriter=rewriter,
)

result = await query_engine.search(query="rag mistral")
# Query is rewritten before retrieval

Configuration options:

OptionTypeDefaultPurpose
llm_providerLLMProviderRequiredLLM to use for rewriting (MistralChat, etc.)

Custom rewriting prompt:

from mistralai.search.toolkit.retrieval.pre_processors import LLMQueryRewriter
from mistralai.search.toolkit.llm import MistralChat, LLMConfig

llm = MistralChat(
    client=Mistral(api_key="your-api-key"),
    config=LLMConfig(model="mistral-small-latest"),
)

# With custom prompt
rewriter = LLMQueryRewriter(
    llm_provider=llm,
    prompt="Reformulate this query for a technical documentation search engine: ",
)

rewritten = await rewriter.rewrite("how to set up django")

When to use:

  • Short or informal queries that need expansion
  • Queries with acronyms or domain-specific abbreviations
  • Improving recall for vague user input
  • Converting questions into keyword combinations

Cost considerations:

  • 1 LLM call per query
  • Adds 100-500ms latency
  • Use strategically, not for every query
LLM Query Extension

LLM Query Extension

Breaks a query into multiple sub-queries exploring different facets. Each sub-query runs an independent retrieval pass; results are combined for broader recall.

Installation: Core library (no extra required)

Example:

from mistralai.search.toolkit.retrieval.pre_processors import LLMQueryExtension
from mistralai.search.toolkit.llm import MistralChat, LLMConfig
from mistralai.client import Mistral

llm = MistralChat(
    client=Mistral(api_key="your-api-key"),
    config=LLMConfig(model="mistral-small-latest"),
)

extender = LLMQueryExtension(
    llm_provider=llm,
    num_queries=3,  # Generate 3 sub-queries
)

# Original: "How does RAG work?"
extended = await extender.extend("How does RAG work?")
# Results in:
# - "What is RAG?"
# - "How does retrieval work in RAG?"
# - "How does generation work in RAG?"

# Use in QueryEngine
query_engine = QueryEngine(
    retriever=vector_retriever,
    query_rewriter=extender,
)

result = await query_engine.search(query="How does RAG work?")
# All 3 sub-queries are retrieved and results combined

Configuration options:

OptionTypeDefaultPurpose
llm_providerLLMProviderRequiredLLM to use for extension (MistralChat, etc.)
num_queriesint3Number of sub-queries to generate

Cost and performance:

# num_queries=3 means:
# - 1 LLM call to generate sub-queries
# - 3 retrieval passes (3x slower)
# - 3x embedding calls (for each sub-query)
# - Results combined/deduplicated

# Choose based on tolerance:
extender = LLMQueryExtension(llm_provider=llm, num_queries=2)  # Faster, lower recall
extender = LLMQueryExtension(llm_provider=llm, num_queries=5)  # Slower, higher recall

When to use:

  • Complex queries with multiple aspects
  • Improving recall across different facets
  • Multi-faceted questions that need broad coverage
  • When latency tolerance allows for multiple retrievals

Best practice - combine with reranking:

query_engine = QueryEngine(
    retriever=vector_retriever,
    query_rewriter=LLMQueryExtension(llm_provider=llm, num_queries=3),
    rerankers=[LLMReRanker(llm_provider=llm, top_k=10)],
)

# Flow:
# 1. 1 query → 3 sub-queries (LLM call)
# 2. 3 retrieval passes (3x embedding + vector search)
# 3. ~30 results combined
# 4. LLM reranking narrows to top 10 (1 more LLM call)
# Total: 2 LLM calls + 3 vector searches + deduplication + reranking
result = await query_engine.search(query="...", top_k=10)
Custom query preprocessors

Custom query preprocessors

Implement the QueryPreprocessor protocol:

from mistralai.search.toolkit.retrieval.pre_processors import QueryPreprocessor

class DomainSpecificRewriter(QueryPreprocessor):
    """Rewrite queries for a specific domain."""

    async def preprocess(self, query: str) -> str:
        """Preprocess the query."""
        # 1. Normalize
        normalized = query.lower().strip()

        # 2. Expand domain-specific abbreviations
        expansions = {
            "ml": "machine learning",
            "nlp": "natural language processing",
            "ai": "artificial intelligence",
        }
        for abbr, expansion in expansions.items():
            normalized = normalized.replace(f" {abbr} ", f" {expansion} ")

        # 3. Add domain context if needed
        if "algorithm" in normalized:
            normalized += " implementation approach"

        return normalized


rewriter = DomainSpecificRewriter()
query_engine = QueryEngine(
    retriever=vector_retriever,
    query_rewriter=rewriter,
)
When to use query preprocessing

When to use query preprocessing

Use preprocessing if:

  • User queries are short/vague (e.g., "rag" instead of full questions)
  • Your index has highly specific terminology
  • You need better recall for multi-faceted questions
  • Latency tolerance allows for extra LLM calls

Skip preprocessing if:

  • User queries are already well-formed
  • Latency is critical (real-time chat)
  • Cost is a concern (each LLM call adds up)
  • Vector retrieval alone gives good results

Hybrid approach:

# Optional preprocessing based on query length
class OptionalRewriter(QueryPreprocessor):
    async def preprocess(self, query: str) -> str:
        # Only rewrite short queries
        if len(query.split()) < 3:
            # Expand short query
            return await llm.rewrite(query)
        return query

query_engine = QueryEngine(
    retriever=vector_retriever,
    query_rewriter=OptionalRewriter(),
)
See also

See also