protocol specification
raft implementation details, safety properties, and production extensions.
Phalanx strictly adheres to the Raft paper while implementing modern extensions for production stability. Every safety property listed here is covered by dedicated unit tests.
§5 & §6 — core consensus
leader election
Election timeouts are randomized in [ET, 2×ET) ticks to prevent correlated elections. The random source is seeded deterministically from the node ID via DJB2 hashing, making elections reproducible in tests while still distributed in production.
func (r *Raft) resetElectionTimeout() {
r.electionTimeout = r.baseElectionTimeout +
r.rand.Intn(r.baseElectionTimeout)
}log replication
Leaders replicate entries via AppendEntries. Followers perform the consistency check — matching PrevLogIndex and PrevLogTerm before accepting entries. On conflict, the follower truncates only from the point of conflict, not the entire log:
for i, entry := range msg.Entries {
idx := msg.PrevLogIndex + uint64(i) + 1
if idx <= r.lastLogIndex() {
if r.logTerm(idx) != entry.Term {
// Truncate ONLY from conflict point
r.log = r.log[:idx]
r.log = append(r.log, msg.Entries[i:]...)
break
}
// Matching entry — preserve
} else {
r.log = append(r.log, msg.Entries[i:]...)
break
}
}commit advancement
The leader advances commitIndex only when an entry from the current term has been replicated to a majority. This is the §5.4.2 safety constraint that prevents committed entries from being overwritten.
§8 — no-op commit safety
When a leader is elected, it immediately appends an empty no-op entry in its current term and broadcasts it to all followers:
func (r *Raft) becomeLeader() {
r.state = Leader
r.leaderID = r.id
// §8: Append no-op to unlock commit pipeline
noop := &pb.LogEntry{
Index: lastIdx + 1,
Term: r.currentTerm,
Type: pb.EntryCommand,
Data: nil, // empty marker
}
r.log = append(r.log, noop)
r.broadcastHeartbeat()
}without the no-op, a new leader cannot determine which entries from prior terms are committed. §5.4.2 only allows committing entries from the current term. the no-op “unlocks” the commit pipeline — once replicated to a majority, all preceding entries are also committed.
§9.6 — pre-vote extension
Before starting a real election, a candidate sends pre-vote requests. These do not increment the local term:
Follower → [election timeout expires]
→ Send MsgRequestVote with IsPreVote=true
→ term is NOT incremented
→ If quorum of pre-votes received:
→ Start REAL election (increment term, become Candidate)
→ If no quorum:
→ Stay follower (term unchanged, no disruption)A node isolated by a network partition will continuously timeout and attempt pre-votes. But since it never gets a quorum, it never increments its term. When it reconnects, its term hasn't inflated, so it doesn't force the cluster to step down. This eliminates the “disruptive rejoin” problem.
lease-based linearizable reads
Standard Raft requires a round-trip to the quorum for every read (ReadIndex). Phalanx uses lease-based reads for lower latency:
| step | mechanism |
|---|---|
| 1 | leader resets heartbeatAcked = 1 (self) at start of each heartbeat round |
| 2 | each successful AppendEntriesResponse increments the counter |
| 3 | HasLeaderQuorum() returns true only if heartbeatAcked >= quorumSize() |
| 4 | reads served from FSM only when leader confirms majority lease |
func (r *Raft) HasLeaderQuorum() bool {
return r.state == Leader &&
r.heartbeatAcked >= r.quorumSize()
}If the leader has lost quorum (network partition), reads are rejected with an error. No stale reads are ever served.
leader stickiness
When a follower has heard from a valid leader within the election timeout, it rejects all vote requests — both pre-vote and real. This prevents a partitioned node from triggering unnecessary elections when it rejoins.
Stickiness decays automatically: when electionElapsed >= electionTimeout without hearing from the leader, leaderActive is set to false.
memory safety
In broadcastHeartbeat(), log entries are deep-copied via LogEntry.Clone() before being placed in outbound messages. This prevents a subtle bug where subsequent append/truncate operations on the internal log could silently corrupt in-flight messages. Verified by TestHeartbeatMemorySafety.