Learnitweb

Optimistic Concurrency Control in Elasticsearch

In this tutorial, we are going to study a concept that becomes extremely important the moment multiple applications or multiple users start updating the same data concurrently. This concept is called optimistic concurrency control.

We have briefly seen something related to versioning earlier, but that old mechanism is now considered deprecated. In modern Elasticsearch, concurrency control is no longer based on the old _version field. Instead, it is based on two internal metadata values that Elasticsearch maintains for every document: the sequence number and the primary term. These two values together are what we must use today if we want to protect our data from race conditions and lost updates.

In this tutorial, we will first understand what these two values are, then we will see why they are needed, and finally we will verify everything with hands-on experiments in Kibana.

Step 1: Start a Simple Cluster

For this experiment, we do not need any special cluster configuration. A simple three-node cluster is more than enough.

  • If you already have some other cluster running, bring it down first.
  • Then start the simple cluster that you have been using in previous lectures.
  • Once the cluster is up, go to Kibana Dev Console and first check the nodes.
GET /_cat/nodes?v

You should see three nodes, and one of them will be the master. In the transcript example, es02 is the master, which is perfectly fine.

Step 2: Create the Index and Insert Some Documents

Now let us create a simple products index using the default settings.

PUT /products

By default, this index will have:

  • One primary shard
  • One replica shard

Now let us insert a few documents.

PUT /products/_doc/1
{
  "name": "Product One"
}

When you send this request, look carefully at the response. You will see something like:

"_seq_no": 0,
"_primary_term": 1

This is extremely important.

At this point, Elasticsearch is telling you:

  • This is the first change to this index, so the sequence number starts from 0.
  • The primary term is 1, which means this primary shard is in its first “lifetime”.

Now insert a second document:

PUT /products/_doc/2
{
  "name": "Product Two"
}

You will now see:

"_seq_no": 1,
"_primary_term": 1

Insert a third document:

PUT /products/_doc/3
{
  "name": "Product Three"
}

Now you will see:

"_seq_no": 2,
"_primary_term": 1

So we can already observe a very important rule:

Every write operation — insert, update, or delete — increases the sequence number by one.

Now update document with ID 3:

PUT /products/_doc/3
{
  "name": "Product Three Updated"
}

Now the sequence number becomes:

"_seq_no": 3,
"_primary_term": 1

Step 3: Sequence Number and Primary Term Are Part of the Document Metadata

If you now fetch the documents using GET:

GET /products/_doc/1
GET /products/_doc/2
GET /products/_doc/3

You will notice something interesting:

  • Document 1 still has _seq_no = 0
  • Document 2 still has _seq_no = 1
  • Document 3 now has _seq_no = 3 because it was updated

This proves that:

Sequence number and primary term are not just cluster-level values. They are part of each document’s metadata and represent the last modification made to that document.

Step 4: Why Do We Even Need This? Understanding the Race Condition Problem

To understand the real problem, imagine a document like this:

  • ID = 1
  • Sequence number = 48
  • Primary term = 1
  • Source: { "count": 5 }

Now imagine two different applications, app1 and app2. Both want to increase this count by 1.

Here is what happens in real life:

  • Both applications send a GET request at almost the same time.
  • Both receive the same response: count = 5.
  • Both compute the new value as count = 6.
  • Both send an update request setting count = 6.

What will be the final value in the database?

It will be 6, even though two updates happened. The correct result should have been 7.

This is a classic race condition and it happens because both applications updated the document based on stale data.

Step 5: How Elasticsearch Solves This Using Optimistic Concurrency Control

Elasticsearch’s solution is very elegant and very strict.

When an application sends an update or delete request, it must say:

“Apply this update only if the document is still in the same state that I read.”

And how does Elasticsearch represent that state?

Using the sequence number and the primary term.

So the update request must look like this conceptually:

PUT /products/_doc/1?if_seq_no=48&if_primary_term=1
{
  "count": 6
}

This means:

“Update this document only if the current sequence number is 48 and the primary term is 1, because my calculation is based on that version of the document.”

Now imagine what happens:

  • The first request arrives, matches 48 / 1, so Elasticsearch updates the document.
  • Elasticsearch automatically increments the sequence number to 49.
  • The second request arrives and says: “Update only if sequence number is 48.”
  • Elasticsearch sees that the current sequence number is now 49 and immediately rejects the request with a version conflict.

So the second application is forced to:

  • Read the document again
  • Get the new value
  • Recalculate
  • Send a new update request

This is exactly how optimistic concurrency control prevents lost updates.

Step 6: Try This in Kibana (Wrong Sequence Number)

Now let us test this in practice.

First, get document with ID 2:

GET /products/_doc/2

You will see something like:

  • _seq_no = 1
  • _primary_term = 1

Now try to update it using a wrong sequence number:

PUT /products/_doc/2?if_seq_no=10&if_primary_term=1
{
  "name": "Product Two Updated"
}

Elasticsearch will reject this with a version conflict engine exception, clearly telling you that the document has changed and your update is based on an incorrect version.

Step 7: Try Again with the Correct Sequence Number

Now send the update with the correct values:

PUT /products/_doc/2?if_seq_no=1&if_primary_term=1
{
  "name": "Product Two Updated"
}

This time, the update will succeed, and Elasticsearch will automatically increase the sequence number to a new value.

From now on, any further update to this document must use this new sequence number and primary term.

Step 8: Now Let Us Understand the Primary Term (Failover Scenario)

So far, the sequence number solves the concurrent update problem. But what about primary shard failover?

Imagine this situation:

  • You have one primary shard and one or more replica shards.
  • The primary shard is continuously sending updates to replicas.
  • Suddenly, there is a network issue.
  • The master decides that the primary is lost and promotes a replica as the new primary.
  • The primary term is now increased from 1 to 2.

Now imagine something dangerous:

  • The old primary did not yet realize it is no longer the primary.
  • It continues to send updates with primary term = 1.
  • The new primary sends updates with primary term = 2.

How should a replica decide which update is valid?

It always accepts the update with the higher primary term and rejects the older one.

This is why the primary term exists. It is a generation number of the primary shard.

Step 9: Observe Primary Term Change in Practice

Go to Kibana and check shard allocation:

GET /_cat/shards/products?v

You will see something like:

  • es01 is primary
  • es03 is replica

Now go to the terminal and stop the primary node:

docker compose stop es01

Now Elasticsearch will promote the replica as the new primary.

Insert a new document:

PUT /products/_doc/4
{
  "name": "Product Four"
}

Now look at the response carefully.

You will see:

"_primary_term": 2

This proves:

Whenever there is a primary shard failover, the primary term is incremented.