Distributed Consensus, Simplified

Distributed consensus is one of those topics that sounds intimidating but is built on simple ideas. Let’s break it down.

The Problem

You have three servers. A client sends a write to server A. How do servers B and C learn about it? And what happens if server A dies before telling them?

This is the consensus problem: getting multiple machines to agree on a value, even when some of them might fail.

Raft: The Understandable Protocol

Raft was designed to be understandable. It breaks consensus into three sub-problems:

1. Leader Election

One server is the leader. It handles all writes. If the leader dies, the remaining servers hold an election.

Server A: "I haven't heard from the leader in 300ms. I'm running for election."
Server B: "Sure, you have my vote."
Server C: "You have mine too."
Server A: "I'm the new leader."

The election timeout is randomized (150-300ms) so servers don’t all try to become leader simultaneously. Simple but effective.

2. Log Replication

The leader receives writes and replicates them to followers:

Client sends write to leader
Leader appends to its log
Leader sends the entry to all followers
Once a majority confirms, the write is committed
Leader responds to client

The key insight: you only need a majority (2 of 3, 3 of 5), not unanimity. This is what makes the system fault-tolerant.

3. Safety

Raft guarantees that once a log entry is committed, it will never be lost — even if servers crash and restart. This is ensured by:

Only electing leaders that have all committed entries
Never overwriting committed entries

When You Need This

Most applications don’t need to implement consensus directly. But you use it every day through:

etcd (Kubernetes uses it for cluster state)
CockroachDB and TiKV (distributed databases)
Consul (service discovery)

Understanding how it works helps you reason about consistency, availability, and the trade-offs your infrastructure makes.

The CAP Theorem Connection

You can’t have all three:

Consistency — every read gets the latest write
Availability — every request gets a response
Partition tolerance — the system works despite network failures

Raft chooses CP — consistency and partition tolerance. During a network partition, the minority side stops accepting writes rather than risk inconsistency.

This is a deliberate trade-off, and it’s the right one for coordination systems. For user-facing data, you might prefer AP systems like Cassandra.