Learnitweb

Raft and Paxos

Introduction to Consensus Algorithms

In a distributed system, multiple nodes (servers) often need to agree on a single value or system state, even when some nodes fail, messages are delayed, or nodes crash. This is called the consensus problem, and solving it correctly is critical for ensuring fault-tolerant and consistent systems.

Consensus is fundamental in systems like:

  • Distributed databases: Ensures that replicated data across nodes is consistent and reliable.
  • Distributed file systems: Guarantees that all replicas of a file remain synchronized.
  • Blockchain networks: Ensures all participants agree on transaction history.
  • Fault-tolerant services: Used for leader election, replicated state machines, or configuration management.

Two widely-used consensus algorithms are Paxos and Raft, each providing safety and liveness guarantees:

  1. Safety: Only one value is chosen and agreed upon by all nodes.
  2. Liveness: A value will eventually be chosen if the network and nodes are operating correctly.

Paxos Consensus Algorithm

Overview

Paxos, introduced by Leslie Lamport, is a fault-tolerant consensus algorithm designed for asynchronous distributed systems. Its primary goal is to ensure consensus on a single value, even in the presence of node failures and message delays. Paxos is widely used for general consensus problems but is often considered hard to implement correctly due to multiple roles and message phases.


Roles in Paxos

  1. Proposers
    • Responsible for proposing a value to the system.
    • Can propose multiple values, but only one value can ultimately be chosen.
    • Uses proposal numbers to ensure ordering and avoid conflicts.
  2. Acceptors
    • Act as the voting authority for proposals.
    • Decide whether to accept a proposed value based on promise rules.
    • Must maintain state to ensure consistency across proposals.
  3. Learners
    • Learn the final agreed-upon value once consensus is reached.
    • Typically don’t participate in decision-making but are essential for applying the chosen value in the system.

Steps in Basic Paxos

  1. Prepare Phase
    • A proposer selects a unique proposal number n and sends a prepare(n) request to a majority of acceptors.
    • Acceptors respond with a promise not to accept proposals with numbers smaller than n.
    • They may also return the highest-numbered proposal they have accepted previously.
  2. Accept Phase
    • The proposer sends an accept(n, value) request to the acceptors.
    • Acceptors accept the proposal if they haven’t promised a higher-numbered proposal.
    • Once a majority of acceptors accept, the value is considered chosen.
  3. Learning Phase
    • Once accepted by a majority, learners are notified of the chosen value.
    • This ensures all nodes eventually agree on the same value.

Example

  • Nodes: A, B, C
  • Node A proposes value X with proposal number 1.
  • Majority (A and B) promise to accept it.
  • A sends accept(1, X) → accepted by majority → X is chosen.
  • All nodes learn X as the agreed-upon value.

Characteristics of Paxos

  1. Fault-tolerant
    • Can handle up to f failures in a system of 2f+1 nodes.
    • Ensures system continues to function even if some nodes fail.
  2. Safety guaranteed
    • Only one value is chosen, no conflicts arise.
  3. Eventual liveness
    • As long as the network stabilizes, consensus is eventually reached.
  4. Complexity
    • Difficult to implement due to multiple roles and asynchronous communication.

Raft Consensus Algorithm

Overview

Raft, introduced by Diego Ongaro and John Ousterhout in 2014, is designed to simplify consensus for practical implementation. Raft provides the same guarantees as Paxos but emphasizes understandability and clarity, making it easier to implement in distributed systems.

Raft is widely used in replicated log systems, such as etcd, Consul, and RethinkDB.


Key Concepts

  1. Leader-Based Consensus
    • Raft elects a single leader to manage all client requests.
    • The leader ensures all followers replicate the same log entries in order.
    • Reduces conflicts and simplifies reasoning about the system.
  2. Terms
    • Time is divided into terms, which start with an election.
    • Term numbers increment monotonically.
    • Ensures that nodes recognize the most recent leader.
  3. Log Replication
    • The leader receives client requests, appends them to its log, and replicates entries to followers.
    • Once a majority acknowledges the entry, it is committed and applied to the state machine.

Raft Roles

  1. Leader
    • Central authority for a term.
    • Handles all client interactions, ensuring log consistency.
    • Sends periodic heartbeat messages to followers to maintain leadership.
  2. Follower
    • Passive role, waits for messages from leader or candidates.
    • Responds to requests but does not initiate actions.
  3. Candidate
    • If a follower does not receive heartbeats within a timeout, it becomes a candidate.
    • Starts an election by requesting votes from other nodes.
    • Becomes a leader if it receives a majority of votes.

Raft Election Process

  1. Followers wait for heartbeats from the leader.
  2. Timeout without heartbeat → follower becomes a candidate.
  3. Candidate requests votes from other nodes.
  4. If the candidate receives a majority, it becomes the new leader.
  5. Leader sends heartbeats to maintain authority and prevent new elections.

Raft Log Replication Steps

  1. Client sends request to the leader.
  2. Leader appends the request as a new log entry.
  3. Leader replicates the entry to followers.
  4. Followers append the entry to their logs.
  5. Once a majority acknowledge, entry is committed.
  6. Leader applies the entry to its state machine and informs followers.

Advantages of Raft

  1. Understandable
    • Explicit roles (leader, follower, candidate) and structured phases make it easier to implement and reason about.
  2. Strong consistency
    • Provides linearizability, ensuring all committed entries are consistent across nodes.
  3. Fault tolerance
    • Tolerates failures of less than half the nodes without losing availability.
  4. Efficiency
    • Leader-based replication reduces coordination overhead compared to multi-leader Paxos variants.

Differences Between Raft and Paxos

FeaturePaxosRaft
UnderstandabilityComplex, hard to implementDesigned to be simpler and clear
LeaderOptionalExplicit single leader
Log replicationNeeds extra logicBuilt-in replication mechanism
Use casesGeneral consensusDistributed databases, replicated logs
ImplementationHard to implement correctlyEasier due to structured phases

Real-World Use Cases

  1. Paxos
    • Google Chubby lock service
    • Some distributed database consensus protocols
  2. Raft
    • etcd (Kubernetes configuration storage)
    • Consul (service discovery and configuration)
    • RethinkDB (distributed database with replicated logs)

Summary

  • Consensus algorithms are essential for coordinating distributed systems reliably.
  • Paxos is powerful and general but complex to implement.
  • Raft prioritizes understandability and practical implementation, with leader-based replication.
  • Both algorithms provide safety, liveness, and fault tolerance, but Raft is widely preferred for production systems due to clarity.
  • Consensus algorithms enable fault-tolerant, highly available, and consistent distributed systems.