Replication in Elasticsearch

Replication is the mechanism that protects your data from node failures and also improves read performance under heavy search load. While sharding helps with scaling, replication helps with availability and throughput. Both are equally important in real-world systems.

Why Replication Is Needed

In a distributed cluster, node failures are inevitable.
Nodes can go down due to scheduled maintenance, hardware crashes, network failures, or cloud infrastructure issues. When a node goes down, any data stored only on that node becomes temporarily unavailable.
Some applications can tolerate downtime, but many cannot.
For small internal tools or analytics jobs, temporary unavailability may be acceptable. However, for mission-critical systems such as large e-commerce platforms, payment systems, or real-time search applications, even a few seconds of downtime can be extremely costly.
Replication exists to guarantee high availability.
The goal is simple: even if a node fails, the system should continue serving reads and writes without losing data.

Primary Shards vs Replica Shards

Primary shards are responsible for scaling data and indexing.
When an index is created with four primary shards, the data is distributed across those shards so that indexing and storage load is shared across multiple nodes.
Replica shards are exact copies of primary shards.
Every replica shard contains the same indexed data as its corresponding primary shard. This duplication ensures that data remains available even if the primary shard becomes unreachable.
Replication is about availability, not distribution.
Unlike primary shards, replicas do not split data further. Instead, they duplicate data for safety and performance.

High-Level Flow of Document Indexing with Replication

The application sends an indexing request to the cluster as usual.
The client does not need to know anything about replicas or shard placement. It simply sends a request to the index.
The document is routed to the correct primary shard.
Using routing logic (typically based on the document ID), Elasticsearch determines which primary shard is responsible for this document.
Indexing happens only on the primary shard.
This includes analysis steps such as tokenization, building the inverted index, and preparing the document for search.
The indexed data is then copied to replica shards.
The primary shard sends the already indexed data to its replica shard(s), ensuring that replicas stay in sync.

Important Design Rules of Replica Shards

Primary and replica shards are never stored on the same node.
Storing both on the same machine would completely defeat the purpose of replication, because a single node failure would still cause data loss.
Replica shards are always placed on different nodes.
This guarantees that a node failure does not take down both the primary and its replica at the same time.
Replica shards do not perform indexing under normal conditions.
They simply receive indexed data from the primary shard, which reduces duplicate work and keeps indexing efficient.

What Happens When a Node Fails

If a node containing a primary shard goes down, Elasticsearch reacts automatically.
The cluster detects the failure and immediately looks for an available replica shard.
One of the replica shards is promoted to primary.
This promotion happens automatically without application involvement, ensuring minimal disruption.
Once promoted, the replica shard starts acting as a primary shard.
From this point onward, it handles indexing, updates, and deletes exactly like the original primary shard.
The system continues operating without downtime.
Reads and writes continue as long as at least one shard copy remains available.

Replica Shards and Search Performance

Replica shards can serve search requests.
Because replica shards contain fully indexed data, they are capable of executing search queries just like primary shards.
Search load can be distributed across primaries and replicas.
Instead of sending all search traffic to primary shards, Elasticsearch intelligently uses replicas to spread the load.
This significantly improves read throughput.
As the number of replicas increases, the cluster can handle more concurrent search requests without becoming a bottleneck.
This concept is similar to read replicas in cloud databases.
Just like AWS or GCP read replicas, Elasticsearch replica shards improve scalability for read-heavy workloads.

Multiple Replica Shards

Elasticsearch allows more than one replica per primary shard.
You can configure multiple replicas if your system requires very high availability or extremely high read throughput.
The number of replicas is a configuration choice.
More replicas mean better fault tolerance and read scalability, but also higher storage and network costs.
These configurations can be changed dynamically.
Replica count can be adjusted later without reindexing data, which gives great operational flexibility.