In this cookbook, we will use Mistral embeddings and generative models in 2 Haystack pipelines:
- We will build an indexing pipeline that can create embeddings for the contents of URLs and indexes them into a vector database
- We will build a retrieval-augmented chat pipeline to chat with the contents of the URLs
First, we install our dependencies
!pip install mistral-haystack
!pip install trafilatura
from haystack import version
version.__version__
Next, we need to set the MISTRAL_API_KEY
environment variable 👇
import os
from getpass import getpass
os.environ["MISTRAL_API_KEY"] = getpass("Mistral API Key:")
Index URLs with Mistral Embeddings
Below, we are using mistral-embed
in a full Haystack indexing pipeline. We create embeddings for the contents of the chosen URLs with mistral-embed
and write them to an InMemoryDocumentStore
using the MistralDocumentEmbedder
.
💡This document store is the simplest to get started with as it has no requirements to setup. Feel free to change this document store to any of the vector databases available for Haystack 2.0 such as Weaviate, Chroma, AstraDB etc.
from haystack import Pipeline
from haystack.components.converters import HTMLToDocument
from haystack.components.fetchers import LinkContentFetcher
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.mistral.document_embedder import MistralDocumentEmbedder
document_store = InMemoryDocumentStore()
fetcher = LinkContentFetcher()
converter = HTMLToDocument()
embedder = MistralDocumentEmbedder()
writer = DocumentWriter(document_store=document_store)
indexing = Pipeline()
indexing.add_component(name="fetcher", instance=fetcher)
indexing.add_component(name="converter", instance=converter)
indexing.add_component(name="embedder", instance=embedder)
indexing.add_component(name="writer", instance=writer)
indexing.connect("fetcher", "converter")
indexing.connect("converter", "embedder")
indexing.connect("embedder", "writer")
urls = ["https://mistral.ai/news/la-plateforme/", "https://mistral.ai/news/mixtral-of-experts"]
indexing.run({"fetcher": {"urls": urls}})
Chat With the URLs with Mistral Generative Models
Now that we have indexed the contents and embeddings of various URLs, we can create a RAG pipeline that uses the MistralChatGenerator
component with mistral-small
.
A few more things to know about this pipeline:
- We are using the
MistralTextEmbdder
to embed our question and retrieve the most relevant 1 document - We are enabling streaming responses by providing a
streaming_callback
documents
is being provided to the chat template by the retriever, while we providequery
to the pipeline when we run it.
from haystack.dataclasses import ChatMessage
chat_template = """Answer the following question based on the contents of the documents.\n
Question: {{query}}\n
Documents: {{documents[0].content}}
"""
user_message = ChatMessage.from_user(chat_template)
from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.utils import print_streaming_chunk
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack_integrations.components.embedders.mistral.text_embedder import MistralTextEmbedder
from haystack_integrations.components.generators.mistral import MistralChatGenerator
text_embedder = MistralTextEmbedder()
retriever = InMemoryEmbeddingRetriever(document_store=document_store, top_k=1)
prompt_builder = ChatPromptBuilder(template=user_message, variables=["query", "documents"], required_variables=["query", "documents"])
llm = MistralChatGenerator(model='mistral-small', streaming_callback=print_streaming_chunk)
rag_pipeline = Pipeline()
rag_pipeline.add_component("text_embedder", text_embedder)
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
question = "What generative endpoints does the Mistral platform have?"
messages = [ChatMessage.from_user(chat_template)]
result = rag_pipeline.run(
{
"text_embedder": {"text": question},
"prompt_builder": {"template": messages, "query": question},
"llm": {"generation_kwargs": {"max_tokens": 165}},
},
include_outputs_from=["text_embedder", "retriever", "llm"],
)