Vector databases play a crucial role in modern LLM-powered applications. Whenever we want to store, search, or retrieve information semantically (using meaning instead of keywords), we rely on vector stores. In this tutorial, we will focus on ChromaDB, one of the most popular and developer-friendly open-source vector databases.
This guide is written in simple language and includes the complete working program that you successfully executed using Hugging Face embeddings.
1. What Is ChromaDB?
ChromaDB is:
- A native and open-source vector database
- Designed for LLM applications, semantic search, and RAG pipelines
- Built for developer productivity and happiness
- Licensed under Apache 2.0
- Supports persistent storage using SQLite
- Fast, lightweight, and local-first
ChromaDB is widely used in:
- ChatGPT-style chatbots
- RAG (Retrieval Augmented Generation)
- Search engines
- Document understanding applications
2. Installing ChromaDB and Required Libraries
To use ChromaDB with LangChain, we need to install several components:
chromadb→ core vector DBlangchain-community→ Chroma integrationlangchain-text-splitters→ for chunkingsentence-transformers→ for HuggingFace embeddingstorch→ required by transformer models
Update your requirements.txt:
langchain langchain-community langchain-text-splitters chromadb sentence-transformers torch
Install everything:
pip install -r requirements.txt
3. Understanding the Workflow
The ChromaDB pipeline looks like this:
- Load the document (
speech.txt) - Split the document into smaller chunks
- Generate embeddings using a model
(we usesentence-transformers/all-MiniLM-L6-v2) - Store embeddings inside ChromaDB
- Persist the vector DB to disk (SQLite)
- Reload it later
- Perform similarity search
- Use retriever for RAG-style tasks
This is exactly what we implement in the complete code below.
4. Document Loading and Splitting
We begin by loading speech.txt using LangChain’s TextLoader.
Then we split it using RecursiveCharacterTextSplitter.
chunk_size=500chunk_overlap=50
(Overlap helps preserve context between chunks)
This prepares the text for embedding.
5. Embeddings with Hugging Face
We use:
sentence-transformers/all-MiniLM-L6-v2
Advantages:
- Free
- Works offline
- Fast on CPU
- High-quality embeddings
LangChain’s HuggingFaceEmbeddings wrapper makes it easy to use.
6. Creating the Vector Store
We feed our chunks + embeddings into ChromaDB using:
Chroma.from_documents()
Important parameters:
persist_directory→ folder where Chroma stores SQLite DBcollection_name→ name of the datasetembedding→ embedding model instance
Chroma automatically:
- Stores vectors
- Builds ANN index
- Saves metadata
7. Persisting Chroma to Disk
Chroma stores data using:
- A folder structure
- A
chroma.sqlite3file
This allows:
- Reloading the DB anytime
- Reusing it without recomputing embeddings
- Hosting the DB on servers or cloud
8. Similarity Search & Retriever
Two powerful features:
similarity_search(query)
Returns top-k most relevant chunks.
as_retriever()
Standard interface for RAG applications.
9. Complete Program
"""
ChromaDB + HuggingFace Embeddings Example
-----------------------------------------
This program:
- Loads speech.txt
- Splits text into chunks
- Creates embeddings using Hugging Face free model
- Builds Chroma vector store
- Saves it to disk
- Reloads it again
- Performs similarity search
- Uses retriever interface
Works on Windows (no Ollama, no OpenAI).
"""
import os
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
# ----------------------------
# CONFIGURATION
# ----------------------------
SPEECH_FILE = "speech.txt"
PERSIST_DIR = "chroma_hf_db"
COLLECTION_NAME = "speech_collection"
# Hugging Face embedding model (FREE + FAST)
HF_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
def load_document(file_path):
print("[1] Loading document:", file_path)
loader = TextLoader(file_path)
docs = loader.load()
print(" Loaded", len(docs), "document(s).")
return docs
def split_documents(docs):
print("[2] Splitting into chunks...")
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
chunks = splitter.split_documents(docs)
print(" Total chunks:", len(chunks))
return chunks
def create_embeddings():
print("[3] Loading HuggingFace embedding model:", HF_MODEL)
# This downloads the model only ONCE
embedding = HuggingFaceEmbeddings(
model_name=HF_MODEL,
model_kwargs={'device': 'cpu'}, # use CPU
encode_kwargs={'normalize_embeddings': True}
)
print(" Embedding model ready.")
return embedding
def build_vectorstore(chunks, embeddings):
print("[4] Creating Chroma vector DB and saving to disk...")
vector_db = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory=PERSIST_DIR,
collection_name=COLLECTION_NAME
)
vector_db.persist()
print(" Saved to:", PERSIST_DIR)
return vector_db
def similarity_search(vector_db):
print("[5] Running similarity search...")
query = "What does the speaker say about entering the war?"
docs = vector_db.similarity_search(query, k=2)
print("\nQUERY:", query)
print("\nRESULT:\n", docs[0].page_content)
def reload_vectorstore(embeddings):
print("[6] Reloading stored Chroma DB...")
vector_db = Chroma(
collection_name=COLLECTION_NAME,
embedding_function=embeddings,
persist_directory=PERSIST_DIR
)
print(" Reloaded successfully!")
return vector_db
def retriever_example(vector_db):
print("[7] Retriever example...")
retriever = vector_db.as_retriever(search_kwargs={"k": 1})
query = "What is the main message of the speech?"
result = retriever.invoke(query)
print("\nQUERY:", query)
print("\nRETRIEVED:\n", result[0].page_content)
def main():
docs = load_document(SPEECH_FILE)
chunks = split_documents(docs)
embeddings = create_embeddings()
db = build_vectorstore(chunks, embeddings)
similarity_search(db)
db2 = reload_vectorstore(embeddings)
retriever_example(db2)
print("\n[✔] Completed successfully")
if __name__ == "__main__":
main()
