Vector Stores, Retrievers, and RAG in LangChain

Large Language Models do not “know” your data. If you want a model to answer questions based on external knowledge—documents, PDFs, websites, or internal data—you need a mechanism to retrieve relevant information and feed it to the model at inference time.

This is where vector stores and retrievers come into play.

1. What Is a Document in LangChain?

LangChain introduces a Document abstraction to represent text and its associated metadata.

A document has two core attributes:

page_content: the actual text
metadata: a dictionary describing the source, origin, or relationships

This abstraction allows LangChain to treat chunks of text uniformly, regardless of whether they come from PDFs, web pages, or databases.

from langchain_core.documents import Document

Let’s define a small set of documents:

documents = [
    Document(
        page_content="Cats are independent animals and enjoy solitude.",
        metadata={"source": "doc1"}
    ),
    Document(
        page_content="Dogs are loyal companions and are very friendly.",
        metadata={"source": "doc2"}
    ),
    Document(
        page_content="Sharks are powerful predators living in the ocean.",
        metadata={"source": "doc3"}
    ),
    Document(
        page_content="Ice cream is a popular dessert enjoyed worldwide.",
        metadata={"source": "doc4"}
    ),
]

Each document could represent:

A page from a PDF
A paragraph from a website
A chunk from a large file

2. Why Vector Stores Are Needed

Language models work with numbers, not raw text.
To search text semantically, we must:

Convert text into vectors (embeddings)
Store those vectors
Search them efficiently

A vector store is a database optimized for storing and searching vector embeddings.

3. Creating Embeddings with Hugging Face

We use Hugging Face embeddings because they are open-source and work locally.

from langchain_huggingface import HuggingFaceEmbeddings

We will use the popular model:

sentence-transformers/all-MiniLM-L6-v2

This model maps text into a 384-dimensional vector space, ideal for semantic search.

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

The first time you run this, the model will be downloaded locally.

4. Creating a Vector Store with Chroma

Now that we have documents and embeddings, we can create a vector store.

from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=embeddings
)

What happens internally:

Each document is converted into a vector
Vectors are stored inside Chroma
Metadata is preserved alongside vectors

At this point, we have a working semantic database.

5. Similarity Search on the Vector Store

We can now perform semantic searches.

results = vectorstore.similarity_search("cat", k=2)

for doc in results:
    print(doc.page_content)

This retrieves documents semantically similar, not keyword-matched.

6. Similarity Search with Scores

Sometimes we want to know how similar the match is.

results = vectorstore.similarity_search_with_score("dog", k=2)

for doc, score in results:
    print(score, doc.page_content)

Lower scores indicate closer semantic similarity.

7. Why Vector Stores Cannot Be Used Directly in Chains

Vector stores are not runnables.
They cannot be plugged directly into LangChain Expression Language (LCEL) chains.

To integrate retrieval into chains, we need retrievers.

8. Creating a Retriever Using RunnableLambda

A retriever is simply a runnable that returns documents.

from langchain_core.runnables import RunnableLambda

retriever = RunnableLambda(
    lambda query: vectorstore.similarity_search(query, k=1)
)

We can batch queries:

results = retriever.batch(["cat", "dog"])

for docs in results:
    print(docs[0].page_content)

This works, but it is not the recommended approach.

9. Creating a Retriever Using `as_retriever`

LangChain provides a cleaner and more powerful way:

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 1}
)

Now the retriever:

Is a runnable
Supports invoke, batch, async
Integrates cleanly with LCEL

results = retriever.batch(["cat", "dog"])

for docs in results:
    print(docs[0].page_content)

10. Building a RAG Prompt

Retrieval-Augmented Generation (RAG) combines:

Retrieved context
User question
Language model reasoning

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the question using only the provided context."
        ),
        (
            "human",
            "Context:\n{context}\n\nQuestion:\n{question}"
        )
    ]
)

11. Creating the RAG Chain

Now we combine:

Retriever
Prompt
Language model

from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {
        "context": retriever,
        "question": RunnablePassthrough()
    }
    | prompt
    | model
)

This chain:

Retrieves documents
Injects them into the prompt
Generates a grounded answer

12. Invoking the RAG Chain

response = rag_chain.invoke("Tell me about dogs")

print(response.content)

The model answers only using retrieved documents.

Complete Runnable Program

This is the entire program, clean, correct, and runnable.

import os
from dotenv import load_dotenv

from langchain_groq import ChatGroq
from langchain_core.documents import Document
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# --------------------------------------------------
# Load environment variables
# --------------------------------------------------
load_dotenv()
GROQ_API_KEY = os.getenv("GROQ_API_KEY")

# --------------------------------------------------
# Initialize LLM
# --------------------------------------------------
model = ChatGroq(
    api_key=GROQ_API_KEY,
    model="llama-3.1-8b-instant"
)

# --------------------------------------------------
# Create documents
# --------------------------------------------------
documents = [
    Document(
        page_content="Cats are independent animals and enjoy solitude.",
        metadata={"source": "doc1"}
    ),
    Document(
        page_content="Dogs are loyal companions and are very friendly.",
        metadata={"source": "doc2"}
    ),
    Document(
        page_content="Sharks are powerful predators living in the ocean.",
        metadata={"source": "doc3"}
    ),
    Document(
        page_content="Ice cream is a popular dessert enjoyed worldwide.",
        metadata={"source": "doc4"}
    ),
]

# --------------------------------------------------
# Create embeddings
# --------------------------------------------------
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# --------------------------------------------------
# Create vector store
# --------------------------------------------------
vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=embeddings
)

# --------------------------------------------------
# Create retriever
# --------------------------------------------------
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 1}
)

# --------------------------------------------------
# Create RAG prompt
# --------------------------------------------------
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the question using only the provided context."
        ),
        (
            "human",
            "Context:\n{context}\n\nQuestion:\n{question}"
        )
    ]
)

# --------------------------------------------------
# Build RAG chain
# --------------------------------------------------
rag_chain = (
    {
        "context": retriever,
        "question": RunnablePassthrough()
    }
    | prompt
    | model
)

# --------------------------------------------------
# Invoke RAG chain
# --------------------------------------------------
response = rag_chain.invoke("Tell me about dogs")

print("\n--- RAG Response ---\n")
print(response.content)

Output

--- RAG Response ---

Dogs are loyal companions and are very friendly.

(venv) D:\genai>python server.py

--- RAG Response ---

According to the provided information, dogs are described as:

1. Loyal companions
2. Very friendly