Learnitweb

Concurrency Anomalies in Databases: Dirty Read, Non-Repeatable Read, and Phantom Read

Introduction

Concurrency control is crucial in database management systems (DBMS) to ensure data consistency and integrity when multiple transactions execute simultaneously. However, improper handling of concurrent transactions can lead to anomalies such as Dirty Read, Non-Repeatable Read, and Phantom Read. Understanding these anomalies helps in choosing the appropriate isolation level to maintain data consistency.

1. Dirty Read

Definition:

A Dirty Read occurs when a transaction reads uncommitted changes made by another transaction. If the other transaction rolls back, the read data becomes invalid or “dirty.”

Example:

Consider two transactions, T1 and T2 operating on the accounts table.

  1. T1 updates the balance of account A from 5000 to 7000 but has not yet committed.
  2. T2 reads the balance as 7000.
  3. T1 rolls back, reverting the balance back to 5000.
  4. T2 now has incorrect data, as the balance it read (7000) never actually existed in a committed state.
-- Transaction T1
BEGIN TRANSACTION;
UPDATE accounts SET balance = 7000 WHERE id = 1;
-- No COMMIT yet

-- Transaction T2
SELECT balance FROM accounts WHERE id = 1;  -- Reads 7000 (dirty read)

-- Transaction T1 ROLLBACK
ROLLBACK;

Solution:

To prevent dirty reads, use the Read Committed or higher isolation level.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

2. Non-Repeatable Read

Definition:

A Non-Repeatable Read occurs when a transaction reads the same row twice but gets different values because another transaction modified and committed changes between the two reads.

Example:

  1. T1 reads the balance of account A as 5000.
  2. T2 updates the balance of account A to 7000 and commits.
  3. T1 reads the balance of account A again and gets 7000.
  4. The two reads in T1 return different results, leading to a non-repeatable read anomaly.
-- Transaction T1
BEGIN TRANSACTION;
SELECT balance FROM accounts WHERE id = 1;  -- Reads 5000

-- Transaction T2
BEGIN TRANSACTION;
UPDATE accounts SET balance = 7000 WHERE id = 1;
COMMIT;

-- Transaction T1 (Continued)
SELECT balance FROM accounts WHERE id = 1;  -- Reads 7000 (non-repeatable read)

Solution:

To prevent non-repeatable reads, use the Repeatable Read or higher isolation level.

To prevent non-repeatable reads, use the Repeatable Read or higher isolation level.

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

3. Phantom Read

Definition:

A Phantom Read occurs when a transaction executes the same query multiple times and gets different results due to insertions or deletions by other transactions.

Example:

  1. T1 executes SELECT COUNT(*) FROM accounts WHERE balance > 5000 and gets 5 rows.
  2. T2 inserts a new row with a balance of 6000 and commits.
  3. T1 executes the same SELECT again and gets 6 rows.
-- Transaction T1
BEGIN TRANSACTION;
SELECT COUNT(*) FROM accounts WHERE balance > 5000;  -- Returns 5

-- Transaction T2
BEGIN TRANSACTION;
INSERT INTO accounts (id, balance) VALUES (10, 6000);
COMMIT;

-- Transaction T1 (Continued)
SELECT COUNT(*) FROM accounts WHERE balance > 5000;  -- Returns 6 (phantom read)

Solution:

To prevent phantom reads, use the Serializable isolation level, which locks the range of rows affected.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Conclusion

Understanding concurrency anomalies helps in selecting the right isolation level to balance data consistency and performance. While higher isolation levels prevent anomalies, they also introduce overhead and reduce concurrency. Choosing the appropriate level depends on the application’s consistency requirements and performance trade-offs.