Using Mistral AI with LlamaIndex - Mistral AI Cookbook

In this notebook we're going to show how you can use LlamaIndex with the Mistral API to perform complex queries over multiple documents including answering questions that require multiple documents simultaneously. We'll do this using a ReAct agent, an autonomous LLM-powered agent capable of using tools.

First we install our dependencies. We need LlamaIndex, Mistral, and a PDF parser for later.

!pip install llama-index-core 
!pip install llama-index-embeddings-mistralai
!pip install llama-index-llms-mistralai
!pip install llama-index-readers-file
!pip install mistralai pypdf

!pip install llama-index-core 
!pip install llama-index-embeddings-mistralai
!pip install llama-index-llms-mistralai
!pip install llama-index-readers-file
!pip install mistralai pypdf

Now we set up our connection to Mistral. We need two things:

An LLM, to answer questions
An embedding model, to convert our data into vectors for retrieval by our index. Luckily, Mistral provides both!

Once we have them, we put them into a ServiceContext, an object LlamaIndex uses to pass configuration around.

from llama_index.llms.mistralai import MistralAI
from llama_index.embeddings.mistralai import MistralAIEmbedding
from llama_index.core.settings import Settings

api_key = ""
llm = MistralAI(api_key=api_key,model="mistral-large-latest")
embed_model = MistralAIEmbedding(model_name='mistral-embed', api_key=api_key)

Settings.llm = llm
Settings.embed_model = embed_model

from llama_index.llms.mistralai import MistralAI
from llama_index.embeddings.mistralai import MistralAIEmbedding
from llama_index.core.settings import Settings

api_key = ""
llm = MistralAI(api_key=api_key,model="mistral-large-latest")
embed_model = MistralAIEmbedding(model_name='mistral-embed', api_key=api_key)

Settings.llm = llm
Settings.embed_model = embed_model

Now let's download our dataset, 3 very large PDFs containing Lyft's annual reports from 2020-2022.

!wget "https://www.dropbox.com/scl/fi/ywc29qvt66s8i97h1taci/lyft-10k-2020.pdf?rlkey=d7bru2jno7398imeirn09fey5&dl=0" -q -O ./lyft_10k_2020.pdf
!wget "https://www.dropbox.com/scl/fi/lpmmki7a9a14s1l5ef7ep/lyft-10k-2021.pdf?rlkey=ud5cwlfotrii6r5jjag1o3hvm&dl=0" -q -O ./lyft_10k_2021.pdf
!wget "https://www.dropbox.com/scl/fi/iffbbnbw9h7shqnnot5es/lyft-10k-2022.pdf?rlkey=grkdgxcrib60oegtp4jn8hpl8&dl=0" -q -O ./lyft_10k_2022.pdf

!wget "https://www.dropbox.com/scl/fi/ywc29qvt66s8i97h1taci/lyft-10k-2020.pdf?rlkey=d7bru2jno7398imeirn09fey5&dl=0" -q -O ./lyft_10k_2020.pdf
!wget "https://www.dropbox.com/scl/fi/lpmmki7a9a14s1l5ef7ep/lyft-10k-2021.pdf?rlkey=ud5cwlfotrii6r5jjag1o3hvm&dl=0" -q -O ./lyft_10k_2021.pdf
!wget "https://www.dropbox.com/scl/fi/iffbbnbw9h7shqnnot5es/lyft-10k-2022.pdf?rlkey=grkdgxcrib60oegtp4jn8hpl8&dl=0" -q -O ./lyft_10k_2022.pdf

Now we have our data, we're going to do three things:

Load the PDF data into memory. It will be parsed into text as we do this. That's the load_data() line.
Index the data. This will create a vector representation of each document. That's the from_documents() line. It stores the vectors in memory.
Set up a query engine to retrieve information from the vector store and pass it to the LLM. That's the as_query_engine() line.

We're going to do this once for each of the three documents. If we had more than 3 we would do this programmatically with a loop, but this keeps the code very simple if a little repetitive. We've included a query to one of the indexes at the end as a test.

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

lyft_2020_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2020.pdf"]).load_data()
lyft_2020_index = VectorStoreIndex.from_documents(lyft_2020_docs)
lyft_2020_engine = lyft_2020_index.as_query_engine()

lyft_2021_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2021.pdf"]).load_data()
lyft_2021_index = VectorStoreIndex.from_documents(lyft_2021_docs)
lyft_2021_engine = lyft_2021_index.as_query_engine()

lyft_2022_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2022.pdf"]).load_data()
lyft_2022_index = VectorStoreIndex.from_documents(lyft_2022_docs)
lyft_2022_engine = lyft_2022_index.as_query_engine()

response = lyft_2022_engine.query("What was Lyft's profit in 2022?")
print(response)

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

lyft_2020_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2020.pdf"]).load_data()
lyft_2020_index = VectorStoreIndex.from_documents(lyft_2020_docs)
lyft_2020_engine = lyft_2020_index.as_query_engine()

lyft_2021_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2021.pdf"]).load_data()
lyft_2021_index = VectorStoreIndex.from_documents(lyft_2021_docs)
lyft_2021_engine = lyft_2021_index.as_query_engine()

lyft_2022_docs = SimpleDirectoryReader(input_files=["./lyft_10k_2022.pdf"]).load_data()
lyft_2022_index = VectorStoreIndex.from_documents(lyft_2022_docs)
lyft_2022_engine = lyft_2022_index.as_query_engine()

response = lyft_2022_engine.query("What was Lyft's profit in 2022?")
print(response)

Success! The 2022 index knows facts about 2022. We're almost ready to create our agent. Before we do, let's set up an array of tools for our agent to use. This turns each of the query engines we set up above into a tool, and indicates what each engine is best at answering questions about. The LLM will read these descriptions when deciding what tool to use.

from llama_index.core.tools import QueryEngineTool, ToolMetadata

query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_2020_engine,
        metadata=ToolMetadata(
            name="lyft_2020_10k_form",
            description="Annual report of Lyft's financial activities in 2020",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2021_engine,
        metadata=ToolMetadata(
            name="lyft_2021_10k_form",
            description="Annual report of Lyft's financial activities in 2021",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2022_engine,
        metadata=ToolMetadata(
            name="lyft_2022_10k_form",
            description="Annual report of Lyft's financial activities in 2022",
        ),
    ),
]

from llama_index.core.tools import QueryEngineTool, ToolMetadata

query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_2020_engine,
        metadata=ToolMetadata(
            name="lyft_2020_10k_form",
            description="Annual report of Lyft's financial activities in 2020",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2021_engine,
        metadata=ToolMetadata(
            name="lyft_2021_10k_form",
            description="Annual report of Lyft's financial activities in 2021",
        ),
    ),
    QueryEngineTool(
        query_engine=lyft_2022_engine,
        metadata=ToolMetadata(
            name="lyft_2022_10k_form",
            description="Annual report of Lyft's financial activities in 2022",
        ),
    ),
]

Now we create our agent from the tools we've set up and we can ask it complicated questions. It will reason through the process step by step, creating simpler questions, and use different tools to answer them. Then it'll take the information it gathers from each tool and combine it into a single answer to the more complex question.

from llama_index.core.agent import ReActAgent

lyft_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
response = lyft_agent.chat("What are the risk factors in 2022?")
print(response)

from llama_index.core.agent import ReActAgent

lyft_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
response = lyft_agent.chat("What are the risk factors in 2022?")
print(response)

from llama_index.core.agent import ReActAgent

lyft_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
response = lyft_agent.chat("What is Lyft's profit in 2022 vs 2020? Generate only one step at a time. Use existing tools.")
print(response)

from llama_index.core.agent import ReActAgent

lyft_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
response = lyft_agent.chat("What is Lyft's profit in 2022 vs 2020? Generate only one step at a time. Use existing tools.")
print(response)

Cool! As you can see it got the 2022 profit from the 2022 10-K form and the 2020 data from the 2020 report. It took both those answers and combined them into the difference we asked for. Let's try another question, this time asking about textual answers rather than numbers:

from llama_index.core.agent import ReActAgent

lyft_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
response = lyft_agent.chat("What did Lyft do in R&D in 2022 versus 2021? Generate only one step at a time. Use existing tools.")
print(response)

from llama_index.core.agent import ReActAgent

lyft_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
response = lyft_agent.chat("What did Lyft do in R&D in 2022 versus 2021? Generate only one step at a time. Use existing tools.")
print(response)

Great! It correctly itemized the risks, noticed the differences, and summarized them.

You can try this on any number of documents with any number of query engines to answer really complex questions. You can even have the query engines themselves be agents.