PropertyGraph Index with Mistral AI and LlamaIndex

Property graphLlamaindex

In this notebook, we demonstrate the basic usage of the PropertyGraphIndex in LlamaIndex.

The property graph index will process unstructured documents, extract a property graph from them, and offer various methods for querying this graph.

Setup

%pip install llama-index-core
%pip install llama-index-llms-mistralai
%pip install llama-index-embeddings-mistralai
import nest_asyncio

nest_asyncio.apply()

from IPython.display import Markdown, display
import os
os.environ['MISTRAL_API_KEY'] = 'YOUR MISTRAL API KEY'
from llama_index.embeddings.mistralai import MistralAIEmbedding
from llama_index.llms.mistralai import MistralAI

llm = MistralAI(model='mistral-large-latest')
embed_model = MistralAIEmbedding()

Download Data

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Load Data

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

Create PropertyGraphIndex

The following steps occur during the creation of a PropertyGraph:

  1. PropertyGraphIndex.from_documents(): We load documents into an index.

  2. Parsing Nodes: The index parses the documents into nodes.

  3. Extracting Paths from Text: The nodes are passed to an LLM, which is prompted to generate knowledge graph triples (i.e., paths).

  4. Extracting Implicit Paths: The node.relationships property is used to infer implicit paths.

  5. Generating Embeddings: Embeddings are generated for each text node and graph node, occurring twice during the process.

from llama_index.core import PropertyGraphIndex


index = PropertyGraphIndex.from_documents(
    documents,
    llm=llm,
    embed_model=embed_model,
    show_progress=True,
)

For debugging purposes, the default SimplePropertyGraphStore includes a helper to save a networkx representation of the graph to an html file.

index.property_graph_store.save_networkx_graph(name="./kg.html")
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model

Querying

Querying a property graph index typically involves using one or more sub-retrievers and combining their results. The process of graph retrieval includes:

  1. Selecting Nodes: Identifying the initial nodes of interest within the graph.
  2. Traversing: Moving from the selected nodes to explore connected elements.

By default, two primary types of retrieval are employed simultaneously:

• Synonym/Keyword Expansion: Utilizing an LLM to generate synonyms and keywords derived from the query.

• Vector Retrieval: Employing embeddings to locate nodes within your graph.

Once nodes are identified, you can choose to:

• Return Paths: Provide the paths adjacent to the selected nodes, typically in the form of triples.

• Return Paths and Source Text: Provide both the paths and the original source text of the chunk, if available.

Retreival

retriever = index.as_retriever(
    include_text=False,  # include source text, default True
)

nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

for node in nodes:
    print(node.text)

QueryEngine

query_engine = index.as_query_engine(
    include_text=True
)

response = query_engine.query("What happened at Interleaf and Viaweb?")

display(Markdown(f"{response.response}"))

Storage

By default, storage is managed using our straightforward in-memory abstractions—SimpleVectorStore for embeddings and SimplePropertyGraphStore for the property graph.

We can save and load these structures to and from disk.

index.storage_context.persist(persist_dir="./storage")

from llama_index.core import StorageContext, load_index_from_storage

index = load_index_from_storage(
    StorageContext.from_defaults(persist_dir="./storage")
)

query_engine = index.as_query_engine(
    include_text=True
)

response = query_engine.query("What happened at Interleaf and Viaweb?")

display(Markdown(f"{response.response}"))

Vector Stores

While some graph databases, such as Neo4j, support vectors, you can still specify which vector store to use with your graph in cases where vectors are not supported, or when you want to override the default settings.

Below, we will demonstrate how to combine ChromaVectorStore with the default SimplePropertyGraphStore.

%pip install llama-index-vector-stores-chroma

Build and Save Index

from llama_index.core.graph_stores import SimplePropertyGraphStore
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

client = chromadb.PersistentClient("./chroma_db")
collection = client.get_or_create_collection("my_graph_vector_db")

index = PropertyGraphIndex.from_documents(
    documents,
    llm=llm,
    embed_model=embed_model,
    property_graph_store=SimplePropertyGraphStore(),
    vector_store=ChromaVectorStore(chroma_collection=collection),
    show_progress=True,
)

index.storage_context.persist(persist_dir="./storage")

Load Index

index = PropertyGraphIndex.from_existing(
    SimplePropertyGraphStore.from_persist_dir("./storage"),
    vector_store=ChromaVectorStore(chroma_collection=collection),
    llm=llm,
)

query_engine = index.as_query_engine(
    include_text=True
)

response = query_engine.query("why did author do at YC?")

display(Markdown(f"{response.response}"))