RAG Pipeline with Ollama, Mistral and LlamaIndex

LlamaindexOllama

In this notebook, we will demonstrate how to build a RAG pipeline using Ollama, Mistral models, and LlamaIndex. The following topics will be covered:

  1. Integrating Mistral with Ollama and LlamaIndex.
  2. Implementing RAG with Ollama and LlamaIndex using the Mistral model.
  3. Routing queries with RouterQueryEngine.
  4. Handling complex queries with SubQuestionQueryEngine.

Before running this notebook, you need to set up Ollama. Please follow the instructions here.

import nest_asyncio

nest_asyncio.apply()

from IPython.display import display, HTML

Setup LLM

from llama_index.llms.ollama import Ollama

llm = Ollama(model="mistral:instruct", request_timeout=60.0)

Querying


from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(role="system", content="You are a helpful assistant."),
    ChatMessage(role="user", content="What is the capital city of France?"),
]
response = llm.chat(messages)
display(HTML(f'<p style="font-size:20px">{response}</p>'))

Setup Embedding Model

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model

Download Data

We will use Uber and Lyft 10K SEC filings for the demostration.

!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O './uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O './lyft_2021.pdf'

Load Data

from llama_index.core import SimpleDirectoryReader

uber_docs = SimpleDirectoryReader(input_files=["./uber_2021.pdf"]).load_data()
lyft_docs = SimpleDirectoryReader(input_files=["./lyft_2021.pdf"]).load_data()

Create Index and Query Engines

from llama_index.core import VectorStoreIndex
from llama_index.core import SummaryIndex

uber_vector_index = VectorStoreIndex.from_documents(uber_docs)
uber_vector_query_engine = uber_vector_index.as_query_engine(similarity_top_k=2)

lyft_vector_index = VectorStoreIndex.from_documents(lyft_docs)
lyft_vector_query_engine = lyft_vector_index.as_query_engine(similarity_top_k=2)

Querying

response = uber_vector_query_engine.query("What is the revenue of uber in 2021 in millions?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
response = lyft_vector_query_engine.query("What is the revenue of lyft in 2021 in millions?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

RouterQueryEngine

We will utilize the RouterQueryEngine to direct user queries to the appropriate index based on the query related to either Uber/ Lyft.

Create QueryEngine tools

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors.llm_selectors import LLMSingleSelector

query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_vector_query_engine,
        metadata=ToolMetadata(
            name="vector_lyft_10k",
            description="Provides information about Lyft financials for year 2021",
        ),
    ),
    QueryEngineTool(
        query_engine=uber_vector_query_engine,
        metadata=ToolMetadata(
            name="vector_uber_10k",
            description="Provides information about Uber financials for year 2021",
        ),
    ),
]

Create RouterQueryEnine


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=query_engine_tools,
    verbose = True
)

Querying

response = query_engine.query("What are the investments made by Uber?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
response = query_engine.query("What are the investments made by the Lyft in 2021?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

SubQuestionQueryEngine

We will explore how the SubQuestionQueryEngine can be leveraged to tackle complex queries by generating and addressing sub-queries.

Create SubQuestionQueryEngine

from llama_index.core.query_engine import SubQuestionQueryEngine

sub_question_query_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools,
                                                                 verbose=True)

Querying

response = sub_question_query_engine.query("Compare the revenues of Uber and Lyft in 2021?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
response = sub_question_query_engine.query("What are the investments made by Uber and Lyft in 2021?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))