This notebook was created by Andrei Chernov (Github, Linkedin) In this tutorial, we will create an LLM agent based on the MistralAI language model. The agent's primary purpose will be to find and summarize research papers from Arxiv that are relevant to the user's query. To build the agent, we will use the LlamaIndex framework.
Tools Used by the Agent
The agent will utilize the following three tools:
-
RAG Query Engine This tool will store and retrieve recent papers from Arxiv, serving as a knowledge base for efficient and quick access to relevant information.
-
Paper Fetch Tool If the user specifies a topic that is not covered in the RAG Query Engine, this tool will fetch recent papers on the specified topic directly from Arxiv.
-
PDF Download Tool This tool allows the agent to download a research paper's PDF file locally using a link provided by Arxiv.
First, let's install necessary libraries
!pip install arxiv==2.1.3 llama_index==0.12.3 llama-index-llms-mistralai==0.3.0 llama-index-embeddings-mistralai==0.3.0
from getpass import getpass
import requests
import sys
import arxiv
from llama_index.llms.mistralai import MistralAI
from llama_index.embeddings.mistralai import MistralAIEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Document, StorageContext, load_index_from_storage, PromptTemplate, Settings
from llama_index.core.tools import FunctionTool, QueryEngineTool
from llama_index.core.agent import ReActAgent
Additionally, You Need to Provide Your API Key to Access Mistral Models
You can obtain an API key here.
api_key= getpass("Type your API Key")
llm = MistralAI(api_key=api_key, model='mistral-large-latest')
To Build a RAG Query Engine, We Will Need an Embedding Model
For this tutorial, we will use the MistralAI embedding model.
model_name = "mistral-embed"
embed_model = MistralAIEmbedding(model_name=model_name, api_key=api_key)
Now, We Will Download Recent Papers About Large Language Models from ArXiv
To keep this tutorial accessible with the free Mistral API version, we will download only the last 10 papers. Downloading more would exceed the limit later while building the RAG query engine. However, if you have a Mistral subscription, you can download additional papers.
def fetch_arxiv_papers(title :str, papers_count: int):
search_query = f'all:"{title}"'
search = arxiv.Search(
query=search_query,
max_results=papers_count,
sort_by=arxiv.SortCriterion.SubmittedDate,
sort_order=arxiv.SortOrder.Descending
)
papers = []
# Use the Client for searching
client = arxiv.Client()
# Execute the search
search = client.results(search)
for result in search:
paper_info = {
'title': result.title,
'authors': [author.name for author in result.authors],
'summary': result.summary,
'published': result.published,
'journal_ref': result.journal_ref,
'doi': result.doi,
'primary_category': result.primary_category,
'categories': result.categories,
'pdf_url': result.pdf_url,
'arxiv_url': result.entry_id
}
papers.append(paper_info)
return papers
papers = fetch_arxiv_papers("Language Models", 10)
[[p['title']] for p in papers]
To Build a RAG Agent, We First Need to Index All Documents
This process creates a vector representation for each chunk of a document using the embedding model.
def create_documents_from_papers(papers):
documents = []
for paper in papers:
content = f"Title: {paper['title']}\n" \
f"Authors: {', '.join(paper['authors'])}\n" \
f"Summary: {paper['summary']}\n" \
f"Published: {paper['published']}\n" \
f"Journal Reference: {paper['journal_ref']}\n" \
f"DOI: {paper['doi']}\n" \
f"Primary Category: {paper['primary_category']}\n" \
f"Categories: {', '.join(paper['categories'])}\n" \
f"PDF URL: {paper['pdf_url']}\n" \
f"arXiv URL: {paper['arxiv_url']}\n"
documents.append(Document(text=content))
return documents
#Create documents for LlamaIndex
documents = create_documents_from_papers(papers)
Settings.chunk_size = 1024
Settings.chunk_overlap = 50
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
Now, We Will Store the Index
Indexing a large number of texts can be time-consuming and costly since it requires making API calls to the embedding model. In real-world applications, it is better to store the index in a vector database to avoid reindexing. However, for simplicity, we will store the index locally in a directory in this tutorial, without using a vector database.
index.storage_context.persist('index/')
# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir='index/')
#load index
index = load_index_from_storage(storage_context, embed_model=embed_model)
We Are Ready to Build a RAG Query Engine for Our Agent
It is a good practice to provide a meaningful name and a clear description for each tool. This helps the agent select the most appropriate tool when needed.
query_engine = index.as_query_engine(llm=llm, similarity_top_k=5)
rag_tool = QueryEngineTool.from_defaults(
query_engine,
name="research_paper_query_engine_tool",
description="A RAG engine with recent research papers.",
)
Let's Take a Look at the Prompts the RAG Tool Uses to Answer a Query Based on Context
Note that there are two prompts. By default, LlamaIndex uses a refine prompt before returning an answer. You can find more information about the response modes here.
from llama_index.core import PromptTemplate
from IPython.display import Markdown, display
# define prompt viewing function
def display_prompt_dict(prompts_dict):
for k, p in prompts_dict.items():
text_md = f"**Prompt Key**: {k}" f"**Text:** "
display(Markdown(text_md))
print(p.get_template())
display(Markdown(""))
prompts_dict = query_engine.get_prompts()
display_prompt_dict(prompts_dict)
Building two other tools is straightforward because they are simply Python functions.
def download_pdf(pdf_url, output_file):
"""
Downloads a PDF file from the given URL and saves it to the specified file.
Args:
pdf_url (str): The URL of the PDF file to download.
output_file (str): The path and name of the file to save the PDF to.
Returns:
str: A message indicating success or the nature of an error.
"""
try:
# Send a GET request to the PDF URL
response = requests.get(pdf_url)
response.raise_for_status() # Raise an error for HTTP issues
# Write the content of the PDF to the output file
with open(output_file, "wb") as file:
file.write(response.content)
return f"PDF downloaded successfully and saved as '{output_file}'."
except requests.exceptions.RequestException as e:
return f"An error occurred: {e}"
download_pdf_tool = FunctionTool.from_defaults(
download_pdf,
name='download_pdf_file_tool',
description='python function, which downloads a pdf file by link'
)
fetch_arxiv_tool = FunctionTool.from_defaults(
fetch_arxiv_papers,
name='fetch_from_arxiv',
description='download the {max_results} recent papers regarding the topic {title} from arxiv'
)
# building an ReAct Agent with the three tools.
agent = ReActAgent.from_tools([download_pdf_tool, rag_tool, fetch_arxiv_tool], llm=llm, verbose=True)
Let's Chat with Our Agent
We built a ReAct agent, which operates in two main stages:
- Reasoning: Upon receiving a query, the agent evaluates whether it has enough information to answer directly or if it needs to use a tool.
- Acting: If the agent decides to use a tool, it executes the tool and then returns to the Reasoning stage to determine whether it can now answer the query or if further tool usage is necessary.
# create a prompt template to chat with an agent
q_template = (
"I am interested in {topic}. \n"
"Find papers in your knowledge database related to this topic; use the following template to query research_paper_query_engine_tool tool: 'Provide title, summary, authors and link to download for papers related to {topic}'. If there are not, could you fetch the recent one from arXiv? \n"
)
answer = agent.chat(q_template.format(topic="Audio-Language Models"))
Markdown(answer.response)
The agent chose to use the RAG tool, found the relevant papers, and summarized them for us.
Since the agent retains the chat history, we can request to download the papers without mentioning them explicitly.
answer = agent.chat("Download the papers, which you mentioned above")
Markdown(answer.response)
Let's see what happens if we ask about a topic that is not available in the RAG.
answer = agent.chat(q_template.format(topic="Gaussian process"))
Markdown(answer.response)