protocol specification

raft implementation details, safety properties, and production extensions.

Phalanx strictly adheres to the Raft paper while implementing modern extensions for production stability. Every safety property listed here is covered by dedicated unit tests.

§5 leader election§5.3 log replication§5.4.2 commit safety§6 membership§8 no-op entry§9.6 pre-votelease readsleader stickiness

§5 & §6 — core consensus

leader election

Election timeouts are randomized in [ET, 2×ET) ticks to prevent correlated elections. The random source is seeded deterministically from the node ID via DJB2 hashing, making elections reproducible in tests while still distributed in production.

func (r *Raft) resetElectionTimeout() {
    r.electionTimeout = r.baseElectionTimeout +
        r.rand.Intn(r.baseElectionTimeout)
}

log replication

Leaders replicate entries via AppendEntries. Followers perform the consistency check — matching PrevLogIndex and PrevLogTerm before accepting entries. On conflict, the follower truncates only from the point of conflict, not the entire log:

for i, entry := range msg.Entries {
    idx := msg.PrevLogIndex + uint64(i) + 1
    if idx <= r.lastLogIndex() {
        if r.logTerm(idx) != entry.Term {
            // Truncate ONLY from conflict point
            r.log = r.log[:idx]
            r.log = append(r.log, msg.Entries[i:]...)
            break
        }
        // Matching entry — preserve
    } else {
        r.log = append(r.log, msg.Entries[i:]...)
        break
    }
}

commit advancement

The leader advances commitIndex only when an entry from the current term has been replicated to a majority. This is the §5.4.2 safety constraint that prevents committed entries from being overwritten.

§8 — no-op commit safety

When a leader is elected, it immediately appends an empty no-op entry in its current term and broadcasts it to all followers:

func (r *Raft) becomeLeader() {
    r.state = Leader
    r.leaderID = r.id

    // §8: Append no-op to unlock commit pipeline
    noop := &pb.LogEntry{
        Index: lastIdx + 1,
        Term:  r.currentTerm,
        Type:  pb.EntryCommand,
        Data:  nil,  // empty marker
    }
    r.log = append(r.log, noop)
    r.broadcastHeartbeat()
}

without the no-op, a new leader cannot determine which entries from prior terms are committed. §5.4.2 only allows committing entries from the current term. the no-op “unlocks” the commit pipeline — once replicated to a majority, all preceding entries are also committed.

§9.6 — pre-vote extension

Before starting a real election, a candidate sends pre-vote requests. These do not increment the local term:

Follower → [election timeout expires]
  → Send MsgRequestVote with IsPreVote=true
  → term is NOT incremented
  → If quorum of pre-votes received:
      → Start REAL election (increment term, become Candidate)
  → If no quorum:
      → Stay follower (term unchanged, no disruption)

A node isolated by a network partition will continuously timeout and attempt pre-votes. But since it never gets a quorum, it never increments its term. When it reconnects, its term hasn't inflated, so it doesn't force the cluster to step down. This eliminates the “disruptive rejoin” problem.

lease-based linearizable reads

Standard Raft requires a round-trip to the quorum for every read (ReadIndex). Phalanx uses lease-based reads for lower latency:

step	mechanism
1	leader resets `heartbeatAcked = 1` (self) at start of each heartbeat round
2	each successful `AppendEntriesResponse` increments the counter
3	`HasLeaderQuorum()` returns true only if `heartbeatAcked >= quorumSize()`
4	reads served from FSM only when leader confirms majority lease

func (r *Raft) HasLeaderQuorum() bool {
    return r.state == Leader &&
           r.heartbeatAcked >= r.quorumSize()
}

If the leader has lost quorum (network partition), reads are rejected with an error. No stale reads are ever served.

leader stickiness

When a follower has heard from a valid leader within the election timeout, it rejects all vote requests — both pre-vote and real. This prevents a partitioned node from triggering unnecessary elections when it rejoins.

Stickiness decays automatically: when electionElapsed >= electionTimeout without hearing from the leader, leaderActive is set to false.

memory safety

In broadcastHeartbeat(), log entries are deep-copied via LogEntry.Clone() before being placed in outbound messages. This prevents a subtle bug where subsequent append/truncate operations on the internal log could silently corrupt in-flight messages. Verified by TestHeartbeatMemorySafety.