Adaptive RAG with LlamaIndex

User queries in general can be complex queries, simple queries. One don't always need complex RAG system even to handle simple queries. Adaptive RAG proposes an approach to handle complex queries and simple queries.

In this notebook, we will implement an approach similar to Adaptive RAG, which differentiates between handling complex and simple queries. We'll focus on Lyft's 10k SEC filings for the years 2020, 2021, and 2022.

Our approach will involve using RouterQueryEngine and FunctionCalling capabilities of MistralAI to call different tools or indices based on the query's complexity.

Complex Queries: These will leverage multiple tools that require context from several documents.
Simple Queries: These will utilize a single tool that requires context from a single document or directly use an LLM to provide an answer.

Following are the steps we follow here:

Download Data.
Load Data.
Create indices for 3 documents.
Create query engines with documents and LLM.
Initialize a FunctionCallingAgentWorker for complex queries.
Create tools.
Create RouterQueryEngine - To route queries based on its complexity.
Querying.

Installation

!pip install llama-index
!pip install llama-index-llms-mistralai
!pip install llama-index-embeddings-mistralai

!pip install llama-index
!pip install llama-index-llms-mistralai
!pip install llama-index-embeddings-mistralai

Setup API Key

import os
os.environ['MISTRAL_API_KEY'] = '<YOUR MISTRAL API KEY>'

import os
os.environ['MISTRAL_API_KEY'] = '<YOUR MISTRAL API KEY>'

Setup LLM and Embedding Model

import nest_asyncio

nest_asyncio.apply()

import nest_asyncio

nest_asyncio.apply()

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.mistralai import MistralAI
from llama_index.embeddings.mistralai import MistralAIEmbedding
from llama_index.core import Settings

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors.llm_selectors import LLMSingleSelector

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.mistralai import MistralAI
from llama_index.embeddings.mistralai import MistralAIEmbedding
from llama_index.core import Settings

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors.llm_selectors import LLMSingleSelector

# Note: Only `mistral-large-latest` supports function calling
llm = MistralAI(model='mistral-large-latest') 
embed_model = MistralAIEmbedding()

Settings.llm = llm
Settings.embed_model = embed_model

# Note: Only `mistral-large-latest` supports function calling
llm = MistralAI(model='mistral-large-latest') 
embed_model = MistralAIEmbedding()

Settings.llm = llm
Settings.embed_model = embed_model

Logging

# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

import logging
import sys

# Set up the root logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)  # Set logger level to INFO

# Clear out any existing handlers
logger.handlers = []

# Set up the StreamHandler to output to sys.stdout (Colab's output)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)  # Set handler level to INFO

# Add the handler to the logger
logger.addHandler(handler)

from IPython.display import display, HTML

# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

import logging
import sys

# Set up the root logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)  # Set logger level to INFO

# Clear out any existing handlers
logger.handlers = []

# Set up the StreamHandler to output to sys.stdout (Colab's output)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)  # Set handler level to INFO

# Add the handler to the logger
logger.addHandler(handler)

from IPython.display import display, HTML

Download Data

We will download Lyft's 10k SEC filings for the years 2020, 2021, and 2022.

!wget "https://www.dropbox.com/scl/fi/ywc29qvt66s8i97h1taci/lyft-10k-2020.pdf?rlkey=d7bru2jno7398imeirn09fey5&dl=0" -q -O ./lyft_10k_2020.pdf
!wget "https://www.dropbox.com/scl/fi/lpmmki7a9a14s1l5ef7ep/lyft-10k-2021.pdf?rlkey=ud5cwlfotrii6r5jjag1o3hvm&dl=0" -q -O ./lyft_10k_2021.pdf
!wget "https://www.dropbox.com/scl/fi/iffbbnbw9h7shqnnot5es/lyft-10k-2022.pdf?rlkey=grkdgxcrib60oegtp4jn8hpl8&dl=0" -q -O ./lyft_10k_2022.pdf

!wget "https://www.dropbox.com/scl/fi/ywc29qvt66s8i97h1taci/lyft-10k-2020.pdf?rlkey=d7bru2jno7398imeirn09fey5&dl=0" -q -O ./lyft_10k_2020.pdf
!wget "https://www.dropbox.com/scl/fi/lpmmki7a9a14s1l5ef7ep/lyft-10k-2021.pdf?rlkey=ud5cwlfotrii6r5jjag1o3hvm&dl=0" -q -O ./lyft_10k_2021.pdf
!wget "https://www.dropbox.com/scl/fi/iffbbnbw9h7shqnnot5es/lyft-10k-2022.pdf?rlkey=grkdgxcrib60oegtp4jn8hpl8&dl=0" -q -O ./lyft_10k_2022.pdf

Load Data

# Lyft 2020 docs
lyft_2020_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2020.pdf"]).load_data()

# Lyft 2021 docs
lyft_2021_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2021.pdf"]).load_data()

# Lyft 2022 docs
lyft_2022_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2022.pdf"]).load_data()

# Lyft 2020 docs
lyft_2020_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2020.pdf"]).load_data()

# Lyft 2021 docs
lyft_2021_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2021.pdf"]).load_data()

# Lyft 2022 docs
lyft_2022_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2022.pdf"]).load_data()

Create Indicies

# Index on Lyft 2020 Document
lyft_2020_index = VectorStoreIndex.from_documents(lyft_2020_docs)

# Index on Lyft 2021 Document
lyft_2021_index = VectorStoreIndex.from_documents(lyft_2021_docs)

# Index on Lyft 2022 Document
lyft_2022_index = VectorStoreIndex.from_documents(lyft_2022_docs)

# Index on Lyft 2020 Document
lyft_2020_index = VectorStoreIndex.from_documents(lyft_2020_docs)

# Index on Lyft 2021 Document
lyft_2021_index = VectorStoreIndex.from_documents(lyft_2021_docs)

# Index on Lyft 2022 Document
lyft_2022_index = VectorStoreIndex.from_documents(lyft_2022_docs)

Create Query Engines

# Query Engine on Lyft 2020 Docs Index
lyft_2020_query_engine = lyft_2020_index.as_query_engine(similarity_top_k=5)

# Query Engine on Lyft 2021 Docs Index
lyft_2021_query_engine = lyft_2021_index.as_query_engine(similarity_top_k=5)

# Query Engine on Lyft 2022 Docs Index
lyft_2022_query_engine = lyft_2022_index.as_query_engine(similarity_top_k=5)

# Query Engine on Lyft 2020 Docs Index
lyft_2020_query_engine = lyft_2020_index.as_query_engine(similarity_top_k=5)

# Query Engine on Lyft 2021 Docs Index
lyft_2021_query_engine = lyft_2021_index.as_query_engine(similarity_top_k=5)

# Query Engine on Lyft 2022 Docs Index
lyft_2022_query_engine = lyft_2022_index.as_query_engine(similarity_top_k=5)

Query Engine for LLM. With this we will use LLM to answer the query.

from llama_index.core.query_engine import CustomQueryEngine

class LLMQueryEngine(CustomQueryEngine):
    """RAG String Query Engine."""

    llm: llm

    def custom_query(self, query_str: str):

        response = self.llm.complete(query_str)

        return str(response)

llm_query_engine = LLMQueryEngine(llm=llm)

from llama_index.core.query_engine import CustomQueryEngine

class LLMQueryEngine(CustomQueryEngine):
    """RAG String Query Engine."""

    llm: llm

    def custom_query(self, query_str: str):

        response = self.llm.complete(query_str)

        return str(response)

llm_query_engine = LLMQueryEngine(llm=llm)

Initialize a `FunctionCallingAgentWorker`

# These tools are used to answer complex queries involving multiple documents.
query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_2020_query_engine,
        metadata=ToolMetadata(
            name="lyft_2020_10k_form",
            description="Annual report of Lyft's financial activities in 2020",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2021_query_engine,
        metadata=ToolMetadata(
            name="lyft_2021_10k_form",
            description="Annual report of Lyft's financial activities in 2021",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2022_query_engine,
        metadata=ToolMetadata(
            name="lyft_2022_10k_form",
            description="Annual report of Lyft's financial activities in 2022",
        ),
    )
]

# These tools are used to answer complex queries involving multiple documents.
query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_2020_query_engine,
        metadata=ToolMetadata(
            name="lyft_2020_10k_form",
            description="Annual report of Lyft's financial activities in 2020",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2021_query_engine,
        metadata=ToolMetadata(
            name="lyft_2021_10k_form",
            description="Annual report of Lyft's financial activities in 2021",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2022_query_engine,
        metadata=ToolMetadata(
            name="lyft_2022_10k_form",
            description="Annual report of Lyft's financial activities in 2022",
        ),
    )
]

from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    query_engine_tools,
    llm=llm,
    verbose=True,
    allow_parallel_tool_calls=True,
)
agent = AgentRunner(agent_worker)

from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    query_engine_tools,
    llm=llm,
    verbose=True,
    allow_parallel_tool_calls=True,
)
agent = AgentRunner(agent_worker)

Create Tools

We will create tools using the QueryEngines, and FunctionCallingAgentWorker created earlier.

query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_2020_query_engine,
        metadata=ToolMetadata(
            name="lyft_2020_10k_form",
            description="Queries related to only 2020 Lyft's financial activities.",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2021_query_engine,
        metadata=ToolMetadata(
            name="lyft_2021_10k_form",
            description="Queries related to only 2021 Lyft's financial activities.",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2022_query_engine,
        metadata=ToolMetadata(
            name="lyft_2022_10k_form",
            description="Queries related to only 2022 Lyft's financial activities.",
        ),
    ),
    QueryEngineTool(
        query_engine=agent,
        metadata=ToolMetadata(
            name="lyft_2020_2021_2022_10k_form",
            description=(
                "Useful for queries that span multiple years from 2020 to 2022 for Lyft's financial activities."

            )
        )
    ),
    QueryEngineTool(
        query_engine=llm_query_engine,
        metadata=ToolMetadata(
            name="general_queries",
            description=(
                "Provides information about general queries other than lyft."
            )
        )
    )
]

query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_2020_query_engine,
        metadata=ToolMetadata(
            name="lyft_2020_10k_form",
            description="Queries related to only 2020 Lyft's financial activities.",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2021_query_engine,
        metadata=ToolMetadata(
            name="lyft_2021_10k_form",
            description="Queries related to only 2021 Lyft's financial activities.",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2022_query_engine,
        metadata=ToolMetadata(
            name="lyft_2022_10k_form",
            description="Queries related to only 2022 Lyft's financial activities.",
        ),
    ),
    QueryEngineTool(
        query_engine=agent,
        metadata=ToolMetadata(
            name="lyft_2020_2021_2022_10k_form",
            description=(
                "Useful for queries that span multiple years from 2020 to 2022 for Lyft's financial activities."

            )
        )
    ),
    QueryEngineTool(
        query_engine=llm_query_engine,
        metadata=ToolMetadata(
            name="general_queries",
            description=(
                "Provides information about general queries other than lyft."
            )
        )
    )
]

Create RouterQueryEngine

RouterQueryEngine will route user queries to select one of the tools based on the complexity of the query.

from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=query_engine_tools,
    verbose = True
)

from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=query_engine_tools,
    verbose = True
)

Querying

Simple Queries:

Query: What is the capital of France?

You can see that it used LLM tool since it is a general query.

response = query_engine.query("What is the capital of France?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

response = query_engine.query("What is the capital of France?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

Query: What did Lyft do in R&D in 2022?

You can see that it used lyft_2022 tool to answer the query.

response = query_engine.query("What did Lyft do in R&D in 2022?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

response = query_engine.query("What did Lyft do in R&D in 2022?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

Query: What did Lyft do in R&D in 2021?

You can see that it used lyft_2021 tool to answer the query.

response = query_engine.query("What did Lyft do in R&D in 2021?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

response = query_engine.query("What did Lyft do in R&D in 2021?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

Query: What did Lyft do in R&D in 2020?

You can see that it used lyft_2020 tool to answer the query.

response = query_engine.query("What did Lyft do in R&D in 2020?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

response = query_engine.query("What did Lyft do in R&D in 2020?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

Complex Queries

Let's test queries that requires multiple tools.

Query: What did Lyft do in R&D in 2022 vs 2020?

You can see that it used lyft_2020 and lyft_2022 tools with FunctionCallingAgent to answer the query.

response = query_engine.query("What did Lyft do in R&D in 2022 vs 2020?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

response = query_engine.query("What did Lyft do in R&D in 2022 vs 2020?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

Query: What did Lyft do in R&D in 2021 vs 2020?

You can see that it used lyft_2020 and lyft_2021 tools with FunctionCallingAgent to answer the query.

response = query_engine.query("What did Lyft do in R&D in 2020 vs 2021?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

response = query_engine.query("What did Lyft do in R&D in 2020 vs 2021?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

Query: What did Lyft do in R&D in 2022 vs 2021 vs 2020?

You can see that it used lyft_2020, lyft_2021 and lyft_2022 tools with FunctionCallingAgent to answer the query.

response = query_engine.query("What did Lyft do in R&D in 2022 vs 2021 vs 2020?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

response = query_engine.query("What did Lyft do in R&D in 2022 vs 2021 vs 2020?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))