PHALANXconsensus engine
v1.0.0

deployment

from local development to a 5-node global mesh on fly.io.

local development

build

go build -ldflags="-s -w" -o phalanx-server ./cmd/server
go build -ldflags="-s -w" -o phalanx ./cmd/phalanx

single node

NODE_ID=node-1 DATA_DIR=./data \
  GRPC_ADDR=127.0.0.1:9000 DEBUG_ADDR=127.0.0.1:8080 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 \
  ./phalanx-server

local development uses faster tick values (100ms/10/3) since there's no cross-continental latency. production defaults are tuned for global RTT.

5-node local cluster

# terminal 1 (simulating Johannesburg)
NODE_ID=node-0 PEERS=node-1,node-2,node-3,node-4 DATA_DIR=./data/0 \
  GRPC_ADDR=127.0.0.1:9000 DEBUG_ADDR=127.0.0.1:8080 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server

# terminal 2 (simulating London)
NODE_ID=node-1 PEERS=node-0,node-2,node-3,node-4 DATA_DIR=./data/1 \
  GRPC_ADDR=127.0.0.1:9001 DEBUG_ADDR=127.0.0.1:8081 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server

# terminal 3 (simulating Chicago)
NODE_ID=node-2 PEERS=node-0,node-1,node-3,node-4 DATA_DIR=./data/2 \
  GRPC_ADDR=127.0.0.1:9002 DEBUG_ADDR=127.0.0.1:8082 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server

# terminal 4 (simulating Singapore)
NODE_ID=node-3 PEERS=node-0,node-1,node-2,node-4 DATA_DIR=./data/3 \
  GRPC_ADDR=127.0.0.1:9003 DEBUG_ADDR=127.0.0.1:8083 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server

# terminal 5 (simulating Frankfurt)
NODE_ID=node-4 PEERS=node-0,node-1,node-2,node-3 DATA_DIR=./data/4 \
  GRPC_ADDR=127.0.0.1:9004 DEBUG_ADDR=127.0.0.1:8084 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server

docker

docker build -t phalanx .
docker run -v phalanx_data:/data -p 9000:9000 -p 8080:8080 phalanx

global deployment — fly.io

Phalanx is designed to run as a 5-node global mesh across 5 continents. Each node runs in a different Fly.io region with persistent storage and automatic peer discovery via SWIM gossip.

regioncodecontinentrole
JohannesburgJNBAfricavoter
LondonLHREuropevoter
ChicagoORDNorth Americaprimary region
SingaporeSINAsia-Pacificvoter
FrankfurtFRACentral Europevoter

step 1 — create the app

fly launch --copy-config --name phalanx --region ord

step 2 — create persistent volumes (one per region)

Each node requires its own BadgerDB volume. Volumes are region-bound and survive VM restarts and redeploys.

fly volumes create phalanx_data --size 1 --region jnb
fly volumes create phalanx_data --size 1 --region lhr
fly volumes create phalanx_data --size 1 --region ord
fly volumes create phalanx_data --size 1 --region sin
fly volumes create phalanx_data --size 1 --region fra

step 3 — deploy

fly deploy

step 4 — scale to 5 nodes

fly scale count 5

step 5 — verify global mesh

# check each region
fly proxy 8080:8080
curl http://localhost:8080/debug/status | jq .

# write from any region
phalanx put hello world -addr <fly-app>:9000

# read from any region (linearizable — leader verifies quorum)
phalanx get hello -addr <fly-app>:9000

why 5 nodes, not 6?

Raft requires a strict majority to commit writes. For odd and even cluster sizes:

nodesquorumfault tolerance
321 failure
431 failure
532 failures
642 failures
743 failures

4 nodes requires the same quorum as 3 (Q=3 vs Q=2) but only gains one more machine to fail — the 4th node adds cost without improving fault tolerance. 6 nodes has the same fault tolerance as 5 (both survive 2 failures) but requires an extra machine and more heartbeat traffic. Odd numbers are always more efficient for consensus.

cross-continental latency tuning

The worst-case RTT in the global mesh is Johannesburg ↔ Singapore (~300ms). The timing constants are tuned to prevent election “flapping” on high-latency paths:

parametervalueeffective timerationale
TICK_MS200200ms per tickabsorbs cross-Atlantic RTT without wasting CPU on fast ticks
HEARTBEAT5 ticks1 secondallows 3 RTTs within a heartbeat interval for reliable ack delivery
ELECTION20 ticks4–8 seconds (randomized)wide window prevents false elections from transient latency spikes

rule of thumb: election timeout should be at least 4× heartbeat interval. here it's 4× minimum (20 vs 5 ticks), 8× maximum (40 vs 5 ticks).

how start.sh works

stepmechanism
1derives NODE_ID from FLY_MACHINE_ID
2detects region from FLY_REGION for operational tagging
3discovers peers via DNS AAAA records on <app>.internal
4builds gossip seed list from discovered IPv6 addresses
5logs effective timing constants for operational visibility
6starts the server with persistent storage at /data

configuration reference

variabledefaultdescription
NODE_IDhostnameunique node identifier
PEERScomma-separated peer node IDs
DATA_DIR/dataBadgerDB storage directory
GRPC_ADDR[::]:9000gRPC listen address
DEBUG_ADDR[::]:8080debug HTTP listen address
SEEDScomma-separated gossip seed addresses
BIND_ADDR::gossip protocol bind address
BIND_PORT7946gossip protocol bind port
TICK_MS200tick interval in milliseconds (tuned for global RTT)
ELECTION20election timeout in ticks (= 4s effective)
HEARTBEAT5heartbeat interval in ticks (= 1s effective)

chaos testing

Phalanx ships with a chaos script that proves zero-downtime availability under machine restarts in the 5-node global mesh:

./scripts/chaos.sh phalanx "[fdaa::1]:9000"

The script starts a background writer, randomly restarts Fly.io machines across different regions, and verifies read-after-write consistency after each round. With Q=3 and N=5, the cluster survives any 2 simultaneous region failures.