Large Language Models do not “know” your data. If you want a model to answer questions based on external knowledge—documents, PDFs, websites, or internal data—you need a mechanism to retrieve relevant information and feed it to the model at inference time.
This is where vector stores and retrievers come into play.
1. What Is a Document in LangChain?
LangChain introduces a Document abstraction to represent text and its associated metadata.
A document has two core attributes:
- page_content: the actual text
- metadata: a dictionary describing the source, origin, or relationships
This abstraction allows LangChain to treat chunks of text uniformly, regardless of whether they come from PDFs, web pages, or databases.
from langchain_core.documents import Document
Let’s define a small set of documents:
documents = [
Document(
page_content="Cats are independent animals and enjoy solitude.",
metadata={"source": "doc1"}
),
Document(
page_content="Dogs are loyal companions and are very friendly.",
metadata={"source": "doc2"}
),
Document(
page_content="Sharks are powerful predators living in the ocean.",
metadata={"source": "doc3"}
),
Document(
page_content="Ice cream is a popular dessert enjoyed worldwide.",
metadata={"source": "doc4"}
),
]
Each document could represent:
- A page from a PDF
- A paragraph from a website
- A chunk from a large file
2. Why Vector Stores Are Needed
Language models work with numbers, not raw text.
To search text semantically, we must:
- Convert text into vectors (embeddings)
- Store those vectors
- Search them efficiently
A vector store is a database optimized for storing and searching vector embeddings.
3. Creating Embeddings with Hugging Face
We use Hugging Face embeddings because they are open-source and work locally.
from langchain_huggingface import HuggingFaceEmbeddings
We will use the popular model:
sentence-transformers/all-MiniLM-L6-v2
This model maps text into a 384-dimensional vector space, ideal for semantic search.
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
The first time you run this, the model will be downloaded locally.
4. Creating a Vector Store with Chroma
Now that we have documents and embeddings, we can create a vector store.
from langchain_chroma import Chroma
vectorstore = Chroma.from_documents(
documents=documents,
embedding=embeddings
)
What happens internally:
- Each document is converted into a vector
- Vectors are stored inside Chroma
- Metadata is preserved alongside vectors
At this point, we have a working semantic database.
5. Similarity Search on the Vector Store
We can now perform semantic searches.
results = vectorstore.similarity_search("cat", k=2)
for doc in results:
print(doc.page_content)
This retrieves documents semantically similar, not keyword-matched.
6. Similarity Search with Scores
Sometimes we want to know how similar the match is.
results = vectorstore.similarity_search_with_score("dog", k=2)
for doc, score in results:
print(score, doc.page_content)
Lower scores indicate closer semantic similarity.
7. Why Vector Stores Cannot Be Used Directly in Chains
Vector stores are not runnables.
They cannot be plugged directly into LangChain Expression Language (LCEL) chains.
To integrate retrieval into chains, we need retrievers.
8. Creating a Retriever Using RunnableLambda
A retriever is simply a runnable that returns documents.
from langchain_core.runnables import RunnableLambda
retriever = RunnableLambda(
lambda query: vectorstore.similarity_search(query, k=1)
)
We can batch queries:
results = retriever.batch(["cat", "dog"])
for docs in results:
print(docs[0].page_content)
This works, but it is not the recommended approach.
9. Creating a Retriever Using as_retriever
LangChain provides a cleaner and more powerful way:
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 1}
)
Now the retriever:
- Is a runnable
- Supports invoke, batch, async
- Integrates cleanly with LCEL
results = retriever.batch(["cat", "dog"])
for docs in results:
print(docs[0].page_content)
10. Building a RAG Prompt
Retrieval-Augmented Generation (RAG) combines:
- Retrieved context
- User question
- Language model reasoning
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Answer the question using only the provided context."
),
(
"human",
"Context:\n{context}\n\nQuestion:\n{question}"
)
]
)
11. Creating the RAG Chain
Now we combine:
- Retriever
- Prompt
- Language model
from langchain_core.runnables import RunnablePassthrough
rag_chain = (
{
"context": retriever,
"question": RunnablePassthrough()
}
| prompt
| model
)
This chain:
- Retrieves documents
- Injects them into the prompt
- Generates a grounded answer
12. Invoking the RAG Chain
response = rag_chain.invoke("Tell me about dogs")
print(response.content)
The model answers only using retrieved documents.
Complete Runnable Program
This is the entire program, clean, correct, and runnable.
import os
from dotenv import load_dotenv
from langchain_groq import ChatGroq
from langchain_core.documents import Document
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
# --------------------------------------------------
# Load environment variables
# --------------------------------------------------
load_dotenv()
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
# --------------------------------------------------
# Initialize LLM
# --------------------------------------------------
model = ChatGroq(
api_key=GROQ_API_KEY,
model="llama-3.1-8b-instant"
)
# --------------------------------------------------
# Create documents
# --------------------------------------------------
documents = [
Document(
page_content="Cats are independent animals and enjoy solitude.",
metadata={"source": "doc1"}
),
Document(
page_content="Dogs are loyal companions and are very friendly.",
metadata={"source": "doc2"}
),
Document(
page_content="Sharks are powerful predators living in the ocean.",
metadata={"source": "doc3"}
),
Document(
page_content="Ice cream is a popular dessert enjoyed worldwide.",
metadata={"source": "doc4"}
),
]
# --------------------------------------------------
# Create embeddings
# --------------------------------------------------
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
# --------------------------------------------------
# Create vector store
# --------------------------------------------------
vectorstore = Chroma.from_documents(
documents=documents,
embedding=embeddings
)
# --------------------------------------------------
# Create retriever
# --------------------------------------------------
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 1}
)
# --------------------------------------------------
# Create RAG prompt
# --------------------------------------------------
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Answer the question using only the provided context."
),
(
"human",
"Context:\n{context}\n\nQuestion:\n{question}"
)
]
)
# --------------------------------------------------
# Build RAG chain
# --------------------------------------------------
rag_chain = (
{
"context": retriever,
"question": RunnablePassthrough()
}
| prompt
| model
)
# --------------------------------------------------
# Invoke RAG chain
# --------------------------------------------------
response = rag_chain.invoke("Tell me about dogs")
print("\n--- RAG Response ---\n")
print(response.content)
Output
--- RAG Response --- Dogs are loyal companions and are very friendly. (venv) D:\genai>python server.py --- RAG Response --- According to the provided information, dogs are described as: 1. Loyal companions 2. Very friendly
