Learnitweb

Building a Vector Store Using FAISS and HuggingFace Embeddings

In modern RAG (Retrieval-Augmented Generation) systems, embeddings and vector stores are core components.
This tutorial walks you step-by-step through:

  • Loading text data
  • Splitting into chunks
  • Generating embeddings
  • Storing in FAISS (Facebook AI Similarity Search)
  • Querying
  • Using retrievers
  • Saving and loading the vector store

We will use open-source embeddings and run everything locally with no external API requirements.


1. Understanding Vector Stores

A vector store is a special database designed to store high-dimensional vectors (embeddings).
It enables fast similarity search and is used in:

  • Chat-with-your-documents systems
  • RAG pipelines
  • Semantic search
  • Recommendation systems

Examples of vector stores:

  • FAISS
  • ChromaDB
  • AstraDB (Cassandra + Vector Search)
  • Pinecone
  • Weaviate
  • Milvus

Here we focus on FAISS, a fast and powerful library created by Facebook AI Research.

What is FAISS? (Facebook AI Similarity Search)

FAISS (Facebook AI Similarity Search) is an open-source library developed by Meta AI Research for performing fast similarity search and clustering on high-dimensional vectors.
It is one of the most widely used vector indexing and retrieval systems in AI/ML applications.

If you are building a semantic search engine, RAG pipeline, recommendation system, or document similarity tool, FAISS becomes an extremely powerful component.


Why Do We Need FAISS?

When you generate embeddings (vectors) for text, images, or documents, each item becomes a large list of floating-point numbers (for example: 384, 768, or 1024 dimensions).

If you have:

  • 100 documents → few embeddings → easy
  • 10,000 documents → search becomes slow
  • 1,000,000+ documents → almost impossible to search using brute force

A simple linear search quickly becomes too slow because comparing each vector with every other vector becomes O(n) – slow for large datasets.

FAISS solves this problem by offering optimized algorithms and indexes that make vector search extremely fast—even with millions of vectors.


Key Features of FAISS

1. Extremely Fast Similarity Search

FAISS is optimized using:

  • BLAS libraries
  • AVX / SSE CPU vector instructions
  • GPU acceleration (optional)

This allows similarity search (cosine similarity or Euclidean distance) to be performed orders of magnitude faster than standard Python loops.


2. Supports Multiple Index Types

FAISS provides different indexing strategies suited for different dataset sizes:

  • Flat Index
    • Exact search
    • Slower for huge data
    • Best quality
  • IVF (Inverted File Index)
    • Clustering-based
    • Fast search for large datasets
  • HNSW (Graph-based)
    • Good accuracy + speed
    • Used in many production systems
  • PQ (Product Quantization)
    • Reduces memory
    • Allows billion-scale vector search

In LangChain, the commonly used version is FAISS FlatL2, which performs exact similarity search.


3. Works on CPU or GPU

You can install:

  • faiss-cpu for local/CPU-only systems
  • faiss-gpu for high-performance computing

GPU FAISS is extremely fast and supports large-scale vector search for enterprise-grade systems.


4. Supports High-Dimensional Vectors

FAISS can index vectors of any dimension—whether 128, 384, 768, or even 4096-dimensional embeddings.
This makes it perfect for:

  • Text embeddings (OpenAI, HF models)
  • Image embeddings
  • Audio embeddings
  • Multimodal search

5. Save & Load Indexes Easily

FAISS indexes can be saved to disk and loaded back:

  • .faiss files store vector index
  • .pkl files store metadata

This makes FAISS good for production use where indexing is expensive and must be persisted for later reuse.


How FAISS Works (High-Level)

Here’s the conceptual workflow:

  1. You generate embeddings for your documents
  2. FAISS stores them in a compact index
  3. When you give a query, FAISS converts it to a vector
  4. FAISS compares it with all stored vectors
  5. It returns the most similar vectors in milliseconds

FAISS supports multiple distance metrics:

  • L2 distance (Euclidean) → default
  • Inner Product (cosine similarity equivalent)

Example: Using FAISS in LangChain

FAISS is commonly used with LangChain for:

  • Retrieval-Augmented Generation (RAG)
  • Semantic question answering
  • PDF → embeddings → semantic search
  • Chat-with-your-documents systems

LangChain wraps FAISS with a simple interface:

db = FAISS.from_documents(docs, embeddings)

results = db.similarity_search("your question here")

This gives you the top matching text chunks from your document set.


When Should You Use FAISS?

FAISS is perfect for these situations:

Use FAISS if:

✔ You are running vector search locally
✔ You want a lightweight, fast, free option
✔ You don’t need a distributed vector DB
✔ You want to embed documents and query them quickly
✔ You want to save/load vector indexes on disk

Don’t use FAISS if:

✘ You need cloud scaling or millions-to-billions of vectors
✘ You need multi-node distribution
✘ You want serverless or automatic scaling

In those cases, use:

  • Astra DB
  • Chroma Cloud
  • Pinecone
  • Weaviate
  • Milvus

2. Workflow Diagram

This diagram shows the complete flow from raw text → vector store → retrieval:

                 ┌───────────────────┐
                 │   Raw Text File   │
                 │   (speech.txt)    │
                 └─────────┬─────────┘
                           │
                           ▼
              ┌───────────────────────────┐
              │  Document Loader (TXT)    │
              │  TextLoader("speech.txt") │
              └─────────┬─────────────────┘
                        │  Loaded document(s)
                        ▼
         ┌────────────────────────────────────┐
         │      Text Splitter (Chunks)        │
         │ CharacterTextSplitter(chunk=1000)  │
         └─────────────┬──────────────────────┘
                       │  Split into chunks
                       ▼
            ┌─────────────────────────┐
            │     Embedding Model     │
            │  HuggingFaceEmbeddings  │
            └──────────┬──────────────┘
                       │  Embeddings (Vectors)
                       ▼
        ┌──────────────────────────────────┐
        │        FAISS Vector Store        │
        │ FAISS.from_documents(docs, emb)  │
        └──────┬───────────┬──────────────┘
               │           │
               │           │ Save / Load
               │           ▼
               │   ┌──────────────────┐
               │   │ Local Disk Index │
               │   │ (faiss_index/)   │
               │   └──────────────────┘
               │
               │ Query (Similarity Search)
               ▼
     ┌───────────────────────────────┐
     │        Retriever API          │
     │     db.as_retriever()         │
     └──────────┬────────────────────┘
                │  Query
                ▼
      ┌────────────────────────┐
      │   Relevant Chunks      │
      │  (vector similarity)   │
      └────────────────────────┘

3. Install Required Packages

Run:

pip install langchain-community
pip install sentence-transformers
pip install faiss-cpu
pip install langchain-text-splitters

All dependencies are CPU-friendly and work on any system.


4. Complete Working FAISS Program (Copy–Paste Ready)

Save this as faiss_pipeline.py
Ensure speech.txt is in the same folder.


faiss_pipeline.py

import os
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_text_splitters import CharacterTextSplitter


# ---------------------------------------------------------
# 1. Load Text File
# ---------------------------------------------------------
print("\nStep 1: Loading file...")

# Make sure speech.txt exists in the same folder
loader = TextLoader("speech.txt")
documents = loader.load()

print("Loaded documents:", len(documents))


# ---------------------------------------------------------
# 2. Split Text into Chunks
# ---------------------------------------------------------
print("\nStep 2: Splitting text into chunks...")

text_splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=30
)

docs = text_splitter.split_documents(documents)

print("Total chunks created:", len(docs))


# ---------------------------------------------------------
# 3. Create Embeddings (HuggingFace)
# ---------------------------------------------------------
print("\nStep 3: Initializing HuggingFace embeddings...")

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

print("Embeddings initialized.")


# ---------------------------------------------------------
# 4. Create FAISS Vector Store
# ---------------------------------------------------------
print("\nStep 4: Creating FAISS vector store...")

db = FAISS.from_documents(docs, embeddings)

print("FAISS index created successfully.")


# ---------------------------------------------------------
# 5. Query the Vector Store
# ---------------------------------------------------------
query = "How does the speaker describe the desired outcome of the war?"

print("\nStep 5: Running similarity search...")
results = db.similarity_search(query, k=1)

print("\nTop matching document:\n")
print(results[0].page_content)


# ---------------------------------------------------------
# 6. Similarity Search WITH Score
# ---------------------------------------------------------
print("\nStep 6: Similarity search with score...")

docs_with_score = db.similarity_search_with_score(query, k=3)

for idx, (doc, score) in enumerate(docs_with_score):
    print(f"\nResult #{idx + 1} | Score: {score}")
    print(doc.page_content[:200], "...")


# ---------------------------------------------------------
# 7. Query Using Embedding Vector
# ---------------------------------------------------------
print("\nStep 7: Querying using embedding vector directly...")

query_vector = embeddings.embed_query(query)

vector_results = db.similarity_search_by_vector(query_vector, k=1)

print("\nResult from vector-based query:\n")
print(vector_results[0].page_content)


# ---------------------------------------------------------
# 8. Convert VectorStore → Retriever
# ---------------------------------------------------------
print("\nStep 8: Using Retriever API...")

retriever = db.as_retriever()
retrieved_docs = retriever.invoke(query)

print("\nRetriever Output:\n")
print(retrieved_docs[0].page_content)


# ---------------------------------------------------------
# 9. Save FAISS Index Locally
# ---------------------------------------------------------
print("\nStep 9: Saving FAISS index...")

db.save_local("faiss_index")
print("Index saved at: ./faiss_index/")


# ---------------------------------------------------------
# 10. Load FAISS Index From Disk
# ---------------------------------------------------------
print("\nStep 10: Loading FAISS index from disk...")

new_db = FAISS.load_local(
    "faiss_index",
    embeddings,
    allow_dangerous_deserialization=True
)

print("FAISS index loaded successfully.")


# ---------------------------------------------------------
# 11. Query Loaded DB
# ---------------------------------------------------------
print("\nStep 11: Querying loaded FAISS DB...")

loaded_results = new_db.similarity_search(query, k=1)

print("\nResponse from loaded FAISS index:\n")
print(loaded_results[0].page_content)

print("\n---- PROGRAM COMPLETED SUCCESSFULLY ----")