Introduction
Isolation is a key property of database transactions, ensuring that concurrent transactions do not interfere with each other and maintain data consistency. Databases enforce isolation using different concurrency control mechanisms, including:
- Locking – Pessimistic Concurrency Control
- Multiversion Concurrency Control (MVCC) – Optimistic Concurrency Control
- Snapshot Isolation
Each method has its trade-offs between consistency, performance, and concurrency. Let’s explore them in detail.
1. Locking – Pessimistic Concurrency Control
Definition
Locking is a pessimistic concurrency control mechanism that prevents conflicts by restricting access to data using locks. It assumes that conflicts will happen and prevents them by making transactions wait for each other. The fundamental idea is that if a transaction needs to read or modify a record, it must first obtain a lock, preventing other transactions from making conflicting changes until the lock is released.
Locking ensures strict consistency but can reduce system performance due to increased contention, particularly in high-concurrency environments. Lock contention can result in blocked transactions, potentially leading to deadlocks where two or more transactions wait indefinitely for each other’s resources. To mitigate this, databases implement deadlock detection mechanisms or timeouts to resolve conflicts efficiently.
Types of Locks:
- Shared Lock (S Lock / Read Lock):
- Allows multiple transactions to read the data concurrently.
- Prevents other transactions from modifying the data.
- Typically used in read-only queries where no modifications are expected.
- Exclusive Lock (X Lock / Write Lock):
- Only one transaction can acquire an exclusive lock on a record.
- Prevents both reading and writing by other transactions.
- Ensures that modifications occur without interference from other transactions.
- Intent Locks:
- Used to indicate that a transaction intends to acquire a lock at a finer granularity.
- Helps coordinate locking strategies in hierarchical data structures.
- Reduces the likelihood of deadlocks by providing an early warning to other transactions about lock requirements.
Locking Protocols:
- Two-Phase Locking (2PL): Ensures serializability by dividing transaction execution into two phases:
- Growing Phase: Acquires locks but does not release any.
- Shrinking Phase: Releases locks but does not acquire any new ones.
- Deadlocks: When two transactions wait for each other’s locks, leading to a cycle of waiting.
- Avoided using timeouts, deadlock detection, or ordering strategies.
Pros & Cons:
- Ensures strong consistency
- Well-suited for high-conflict workloads
- Can cause deadlocks
- Reduces concurrency (transactions may be blocked)
2. Multiversion Concurrency Control (MVCC) – Optimistic Concurrency Control
Definition
MVCC is an optimistic concurrency control method that allows multiple versions of data to exist simultaneously, avoiding locks. It assumes conflicts are rare and instead of blocking transactions, it maintains multiple versions of data and checks for conflicts at commit time.
MVCC enables non-blocking reads by ensuring that readers see a snapshot of the data at the start of their transaction, even if other transactions are concurrently modifying the same data. Instead of acquiring locks, MVCC preserves multiple versions of a record and allows transactions to proceed in parallel. However, if a transaction attempts to update data that has changed since it started, a conflict occurs, requiring a rollback or retry.
MVCC is widely used in databases such as PostgreSQL, MySQL (InnoDB), and Oracle to enhance concurrency while minimizing contention. It is particularly useful in read-heavy workloads where updates are relatively infrequent, as it allows reads to proceed without being blocked by writes.
How MVCC Works:
- Each transaction gets a consistent snapshot of the database at the start.
- Instead of modifying data directly, changes are stored as new versions.
- Older versions are maintained until they are no longer needed (garbage collection).
- At commit time, the database checks if any conflicts have occurred.
- If no conflict exists, the transaction commits successfully; otherwise, it is rolled back.
Pros & Cons:
- No blocking (higher concurrency)
- Better for read-heavy workloads
- More storage required (multiple versions stored)
- Can lead to conflicts requiring rollback
3. Snapshot Isolation (SI)
Definition
Snapshot Isolation is a specific implementation of MVCC that provides a consistent view of the database at the start of a transaction. It ensures that a transaction always sees the same snapshot of data, even if other transactions commit changes. This is particularly useful in cases where repeatable reads are necessary, such as financial applications where consistent reporting is required.
Snapshot Isolation guarantees that a transaction sees a stable dataset throughout its execution, eliminating non-repeatable reads and phantom reads. However, it does not provide full serializability, meaning write conflicts may still occur, requiring rollback mechanisms.
Unlike strict locking mechanisms, Snapshot Isolation provides high concurrency by reducing blocking, making it well-suited for modern, high-throughput systems where multiple transactions operate simultaneously.
How Snapshot Isolation Works:
- When a transaction starts, it sees a snapshot of the database as it existed at that moment.
- Changes made by other transactions after the snapshot are not visible.
- A transaction writes only if no other transaction has modified the same data since the snapshot.
- Conflicting writes are detected at commit time, preventing lost updates.
Pros & Cons:
- Consistent snapshot for transactions
- No blocking reads (better concurrency)
- Write conflicts can cause rollbacks
- More storage needed (old versions retained)
Conclusion
Databases enforce isolation using different concurrency control techniques:
- Locking (Pessimistic Concurrency Control): Prevents conflicts using locks but may cause blocking and deadlocks.
- MVCC (Optimistic Concurrency Control): Allows multiple versions of data to exist and checks for conflicts at commit time.
- Snapshot Isolation: Provides a stable view of the database but can lead to write conflicts.
Choosing the right approach depends on the workload:
- High-conflict, update-heavy workloads → Use Locking (2PL).
- Read-heavy applications with fewer conflicts → Use MVCC.
- Need consistent snapshots without blocking reads → Use Snapshot Isolation.
Understanding these mechanisms helps in designing efficient database transactions that balance consistency, concurrency, and performance.