Distributed Consensus, Simplified

2026-04-02 · 2 min read · systems distributed architecture


Distributed consensus is one of those topics that sounds intimidating but is built on simple ideas. Let’s break it down.

The Problem

You have three servers. A client sends a write to server A. How do servers B and C learn about it? And what happens if server A dies before telling them?

This is the consensus problem: getting multiple machines to agree on a value, even when some of them might fail.

Raft: The Understandable Protocol

Raft was designed to be understandable. It breaks consensus into three sub-problems:

1. Leader Election

One server is the leader. It handles all writes. If the leader dies, the remaining servers hold an election.

Server A: "I haven't heard from the leader in 300ms. I'm running for election."
Server B: "Sure, you have my vote."
Server C: "You have mine too."
Server A: "I'm the new leader."
The election timeout is randomized (150-300ms) so servers don’t all try to become leader simultaneously. Simple but effective.

2. Log Replication

The leader receives writes and replicates them to followers:

  1. Client sends write to leader
  2. Leader appends to its log
  3. Leader sends the entry to all followers
  4. Once a majority confirms, the write is committed
  5. Leader responds to client

The key insight: you only need a majority (2 of 3, 3 of 5), not unanimity. This is what makes the system fault-tolerant.

3. Safety

Raft guarantees that once a log entry is committed, it will never be lost — even if servers crash and restart. This is ensured by:

When You Need This

Most applications don’t need to implement consensus directly. But you use it every day through:

Understanding how it works helps you reason about consistency, availability, and the trade-offs your infrastructure makes.

The CAP Theorem Connection

You can’t have all three:

Raft chooses CP — consistency and partition tolerance. During a network partition, the minority side stops accepting writes rather than risk inconsistency.

This is a deliberate trade-off, and it’s the right one for coordination systems. For user-facing data, you might prefer AP systems like Cassandra.