Distributed consensus is one of those topics that sounds intimidating but is built on simple ideas. Let’s break it down.
The Problem
You have three servers. A client sends a write to server A. How do servers B and C learn about it? And what happens if server A dies before telling them?
This is the consensus problem: getting multiple machines to agree on a value, even when some of them might fail.
Raft: The Understandable Protocol
Raft was designed to be understandable. It breaks consensus into three sub-problems:
1. Leader Election
One server is the leader. It handles all writes. If the leader dies, the remaining servers hold an election.
Server A: "I haven't heard from the leader in 300ms. I'm running for election."
Server B: "Sure, you have my vote."
Server C: "You have mine too."
Server A: "I'm the new leader."
The election timeout is randomized (150-300ms) so servers don’t all try to become leader simultaneously. Simple but effective.
2. Log Replication
The leader receives writes and replicates them to followers:
- Client sends write to leader
- Leader appends to its log
- Leader sends the entry to all followers
- Once a majority confirms, the write is committed
- Leader responds to client
The key insight: you only need a majority (2 of 3, 3 of 5), not unanimity. This is what makes the system fault-tolerant.
3. Safety
Raft guarantees that once a log entry is committed, it will never be lost — even if servers crash and restart. This is ensured by:
- Only electing leaders that have all committed entries
- Never overwriting committed entries
When You Need This
Most applications don’t need to implement consensus directly. But you use it every day through:
- etcd (Kubernetes uses it for cluster state)
- CockroachDB and TiKV (distributed databases)
- Consul (service discovery)
Understanding how it works helps you reason about consistency, availability, and the trade-offs your infrastructure makes.
The CAP Theorem Connection
You can’t have all three:
- Consistency — every read gets the latest write
- Availability — every request gets a response
- Partition tolerance — the system works despite network failures
Raft chooses CP — consistency and partition tolerance. During a network partition, the minority side stops accepting writes rather than risk inconsistency.
This is a deliberate trade-off, and it’s the right one for coordination systems. For user-facing data, you might prefer AP systems like Cassandra.