Working with ChromaDB Using LangChain + Hugging Face Embeddings

Vector databases play a crucial role in modern LLM-powered applications. Whenever we want to store, search, or retrieve information semantically (using meaning instead of keywords), we rely on vector stores. In this tutorial, we will focus on ChromaDB, one of the most popular and developer-friendly open-source vector databases.

This guide is written in simple language and includes the complete working program that you successfully executed using Hugging Face embeddings.

1. What Is ChromaDB?

ChromaDB is:

A native and open-source vector database
Designed for LLM applications, semantic search, and RAG pipelines
Built for developer productivity and happiness
Licensed under Apache 2.0
Supports persistent storage using SQLite
Fast, lightweight, and local-first

ChromaDB is widely used in:

ChatGPT-style chatbots
RAG (Retrieval Augmented Generation)
Search engines
Document understanding applications

2. Installing ChromaDB and Required Libraries

To use ChromaDB with LangChain, we need to install several components:

chromadb → core vector DB
langchain-community → Chroma integration
langchain-text-splitters → for chunking
sentence-transformers → for HuggingFace embeddings
torch → required by transformer models

Update your requirements.txt:

langchain
langchain-community
langchain-text-splitters
chromadb
sentence-transformers
torch

Install everything:

pip install -r requirements.txt

3. Understanding the Workflow

The ChromaDB pipeline looks like this:

Load the document (speech.txt)
Split the document into smaller chunks
Generate embeddings using a model
(we use sentence-transformers/all-MiniLM-L6-v2)
Store embeddings inside ChromaDB
Persist the vector DB to disk (SQLite)
Reload it later
Perform similarity search
Use retriever for RAG-style tasks

This is exactly what we implement in the complete code below.

4. Document Loading and Splitting

We begin by loading speech.txt using LangChain’s TextLoader.
Then we split it using RecursiveCharacterTextSplitter.

chunk_size=500
chunk_overlap=50
(Overlap helps preserve context between chunks)

This prepares the text for embedding.

5. Embeddings with Hugging Face

We use:

sentence-transformers/all-MiniLM-L6-v2

Advantages:

Free
Works offline
Fast on CPU
High-quality embeddings

LangChain’s HuggingFaceEmbeddings wrapper makes it easy to use.

6. Creating the Vector Store

We feed our chunks + embeddings into ChromaDB using:

Chroma.from_documents()

Important parameters:

persist_directory → folder where Chroma stores SQLite DB
collection_name → name of the dataset
embedding → embedding model instance

Chroma automatically:

Stores vectors
Builds ANN index
Saves metadata

7. Persisting Chroma to Disk

Chroma stores data using:

A folder structure
A chroma.sqlite3 file

This allows:

Reloading the DB anytime
Reusing it without recomputing embeddings
Hosting the DB on servers or cloud

8. Similarity Search & Retriever

Two powerful features:

similarity_search(query)

Returns top-k most relevant chunks.

as_retriever()

Standard interface for RAG applications.

9. Complete Program

"""
ChromaDB + HuggingFace Embeddings Example
-----------------------------------------

This program:
- Loads speech.txt
- Splits text into chunks
- Creates embeddings using Hugging Face free model
- Builds Chroma vector store
- Saves it to disk
- Reloads it again
- Performs similarity search
- Uses retriever interface

Works on Windows (no Ollama, no OpenAI).
"""

import os
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings


# ----------------------------
# CONFIGURATION
# ----------------------------
SPEECH_FILE = "speech.txt"
PERSIST_DIR = "chroma_hf_db"
COLLECTION_NAME = "speech_collection"

# Hugging Face embedding model (FREE + FAST)
HF_MODEL = "sentence-transformers/all-MiniLM-L6-v2"


def load_document(file_path):
    print("[1] Loading document:", file_path)
    loader = TextLoader(file_path)
    docs = loader.load()
    print("    Loaded", len(docs), "document(s).")
    return docs


def split_documents(docs):
    print("[2] Splitting into chunks...")
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=500,
        chunk_overlap=50
    )
    chunks = splitter.split_documents(docs)
    print("    Total chunks:", len(chunks))
    return chunks


def create_embeddings():
    print("[3] Loading HuggingFace embedding model:", HF_MODEL)

    # This downloads the model only ONCE
    embedding = HuggingFaceEmbeddings(
        model_name=HF_MODEL,
        model_kwargs={'device': 'cpu'},      # use CPU
        encode_kwargs={'normalize_embeddings': True}
    )

    print("    Embedding model ready.")
    return embedding


def build_vectorstore(chunks, embeddings):
    print("[4] Creating Chroma vector DB and saving to disk...")

    vector_db = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory=PERSIST_DIR,
        collection_name=COLLECTION_NAME
    )

    vector_db.persist()
    print("    Saved to:", PERSIST_DIR)
    return vector_db


def similarity_search(vector_db):
    print("[5] Running similarity search...")

    query = "What does the speaker say about entering the war?"
    docs = vector_db.similarity_search(query, k=2)

    print("\nQUERY:", query)
    print("\nRESULT:\n", docs[0].page_content)


def reload_vectorstore(embeddings):
    print("[6] Reloading stored Chroma DB...")

    vector_db = Chroma(
        collection_name=COLLECTION_NAME,
        embedding_function=embeddings,
        persist_directory=PERSIST_DIR
    )

    print("    Reloaded successfully!")
    return vector_db


def retriever_example(vector_db):
    print("[7] Retriever example...")

    retriever = vector_db.as_retriever(search_kwargs={"k": 1})
    query = "What is the main message of the speech?"

    result = retriever.invoke(query)

    print("\nQUERY:", query)
    print("\nRETRIEVED:\n", result[0].page_content)


def main():
    docs = load_document(SPEECH_FILE)
    chunks = split_documents(docs)
    embeddings = create_embeddings()

    db = build_vectorstore(chunks, embeddings)
    similarity_search(db)

    db2 = reload_vectorstore(embeddings)
    retriever_example(db2)

    print("\n[✔] Completed successfully")


if __name__ == "__main__":
    main()