Understanding How Documents Are Stored in Elasticsearch

In this short but important section, we will understand how Elasticsearch internally stores documents, and how this structure differs from what we usually see in a relational database.

This concept is extremely important because once you understand how Elasticsearch represents data internally, everything else—searching, updating, deleting, and versioning—starts making much more sense.

How Data Looks in a Relational Database

In a traditional relational database, data is stored in tables.

Each table:

Has rows and columns
Each row represents a record
Each column represents a field

For example, in a books table:

id	title	author	price
1	Book A	Author X	399
2	Book B	Author Y	499

Here:

id is usually an auto-generated primary key
Every time we insert a new record, a new row is created
The structure is fixed and schema-driven

This is the traditional relational model most developers are familiar with.

How Elasticsearch Stores Data

Elasticsearch works differently. Instead of rows and tables, it uses:

Indexes (similar to tables)
Documents (similar to rows)
Fields (similar to columns)

However, when you retrieve a document from Elasticsearch, you will notice that the structure looks quite different.

A Typical Elasticsearch Document Response

When you fetch a document from Elasticsearch, you will see something like this:

{
  "_index": "books",
  "_id": "abc123",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "_source": {
    "title": "The Alchemist",
    "author": "Paulo Coelho",
    "price": 399
  }
}

This response contains two major parts:

Metadata fields (fields starting with _)
Actual document data (_source)

Let’s understand both clearly.

1. Metadata Fields (Fields Starting with `_`)

All fields that start with an underscore (_) are called metadata fields.

These fields are automatically managed by Elasticsearch, and every document will have them.

`_index`

This tells you which index the document belongs to.

Example:

"_index": "books"

This means the document is stored inside the books index.

`_id`

This is the unique identifier for the document.

It can be auto-generated by Elasticsearch
Or you can provide your own ID while inserting the document

Every document must have a unique _id.

`_version`

This represents the version number of the document.

Each time the document is updated, this number increases.
It helps Elasticsearch manage data consistency.

`_seq_no` and `_primary_term`

These two fields are related to optimistic concurrency control.

Their main purpose is:

To avoid conflicts when multiple requests try to update the same document at the same time
To ensure data consistency in distributed systems

For now, you don’t need to deeply understand these fields.
We will discuss them later when we talk about updates and concurrency control.

2. The `_source` Field (Most Important Part)

The _source field is the actual document data that you stored.

Example:

"_source": {
  "title": "The Alchemist",
  "author": "Paulo Coelho",
  "price": 399
}

This is the real content of your document.

Everything else outside _source exists mainly for:

Metadata management
Versioning
Replication
Conflict resolution

When people say “document data”, they usually mean what is inside _source.

Understanding How Documents Are Stored in Elasticsearch

How Data Looks in a Relational Database

How Elasticsearch Stores Data

A Typical Elasticsearch Document Response

1. Metadata Fields (Fields Starting with _)

_index

_id

_version