deployment
from local development to a 5-node global mesh on fly.io.
local development
build
go build -ldflags="-s -w" -o phalanx-server ./cmd/server
go build -ldflags="-s -w" -o phalanx ./cmd/phalanxsingle node
NODE_ID=node-1 DATA_DIR=./data \
GRPC_ADDR=127.0.0.1:9000 DEBUG_ADDR=127.0.0.1:8080 \
TICK_MS=100 ELECTION=10 HEARTBEAT=3 \
./phalanx-serverlocal development uses faster tick values (100ms/10/3) since there's no cross-continental latency. production defaults are tuned for global RTT.
5-node local cluster
# terminal 1 (simulating Johannesburg)
NODE_ID=node-0 PEERS=node-1,node-2,node-3,node-4 DATA_DIR=./data/0 \
GRPC_ADDR=127.0.0.1:9000 DEBUG_ADDR=127.0.0.1:8080 \
TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server
# terminal 2 (simulating London)
NODE_ID=node-1 PEERS=node-0,node-2,node-3,node-4 DATA_DIR=./data/1 \
GRPC_ADDR=127.0.0.1:9001 DEBUG_ADDR=127.0.0.1:8081 \
TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server
# terminal 3 (simulating Chicago)
NODE_ID=node-2 PEERS=node-0,node-1,node-3,node-4 DATA_DIR=./data/2 \
GRPC_ADDR=127.0.0.1:9002 DEBUG_ADDR=127.0.0.1:8082 \
TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server
# terminal 4 (simulating Singapore)
NODE_ID=node-3 PEERS=node-0,node-1,node-2,node-4 DATA_DIR=./data/3 \
GRPC_ADDR=127.0.0.1:9003 DEBUG_ADDR=127.0.0.1:8083 \
TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server
# terminal 5 (simulating Frankfurt)
NODE_ID=node-4 PEERS=node-0,node-1,node-2,node-3 DATA_DIR=./data/4 \
GRPC_ADDR=127.0.0.1:9004 DEBUG_ADDR=127.0.0.1:8084 \
TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-serverdocker
docker build -t phalanx .
docker run -v phalanx_data:/data -p 9000:9000 -p 8080:8080 phalanxglobal deployment — fly.io
Phalanx is designed to run as a 5-node global mesh across 5 continents. Each node runs in a different Fly.io region with persistent storage and automatic peer discovery via SWIM gossip.
| region | code | continent | role |
|---|---|---|---|
| Johannesburg | JNB | Africa | voter |
| London | LHR | Europe | voter |
| Chicago | ORD | North America | primary region |
| Singapore | SIN | Asia-Pacific | voter |
| Frankfurt | FRA | Central Europe | voter |
step 1 — create the app
fly launch --copy-config --name phalanx --region ordstep 2 — create persistent volumes (one per region)
Each node requires its own BadgerDB volume. Volumes are region-bound and survive VM restarts and redeploys.
fly volumes create phalanx_data --size 1 --region jnb
fly volumes create phalanx_data --size 1 --region lhr
fly volumes create phalanx_data --size 1 --region ord
fly volumes create phalanx_data --size 1 --region sin
fly volumes create phalanx_data --size 1 --region frastep 3 — deploy
fly deploystep 4 — scale to 5 nodes
fly scale count 5step 5 — verify global mesh
# check each region
fly proxy 8080:8080
curl http://localhost:8080/debug/status | jq .
# write from any region
phalanx put hello world -addr <fly-app>:9000
# read from any region (linearizable — leader verifies quorum)
phalanx get hello -addr <fly-app>:9000why 5 nodes, not 6?
Raft requires a strict majority to commit writes. For odd and even cluster sizes:
| nodes | quorum | fault tolerance |
|---|---|---|
| 3 | 2 | 1 failure |
| 4 | 3 | 1 failure |
| 5 | 3 | 2 failures |
| 6 | 4 | 2 failures |
| 7 | 4 | 3 failures |
4 nodes requires the same quorum as 3 (Q=3 vs Q=2) but only gains one more machine to fail — the 4th node adds cost without improving fault tolerance. 6 nodes has the same fault tolerance as 5 (both survive 2 failures) but requires an extra machine and more heartbeat traffic. Odd numbers are always more efficient for consensus.
cross-continental latency tuning
The worst-case RTT in the global mesh is Johannesburg ↔ Singapore (~300ms). The timing constants are tuned to prevent election “flapping” on high-latency paths:
| parameter | value | effective time | rationale |
|---|---|---|---|
TICK_MS | 200 | 200ms per tick | absorbs cross-Atlantic RTT without wasting CPU on fast ticks |
HEARTBEAT | 5 ticks | 1 second | allows 3 RTTs within a heartbeat interval for reliable ack delivery |
ELECTION | 20 ticks | 4–8 seconds (randomized) | wide window prevents false elections from transient latency spikes |
rule of thumb: election timeout should be at least 4× heartbeat interval. here it's 4× minimum (20 vs 5 ticks), 8× maximum (40 vs 5 ticks).
how start.sh works
| step | mechanism |
|---|---|
| 1 | derives NODE_ID from FLY_MACHINE_ID |
| 2 | detects region from FLY_REGION for operational tagging |
| 3 | discovers peers via DNS AAAA records on <app>.internal |
| 4 | builds gossip seed list from discovered IPv6 addresses |
| 5 | logs effective timing constants for operational visibility |
| 6 | starts the server with persistent storage at /data |
configuration reference
| variable | default | description |
|---|---|---|
NODE_ID | hostname | unique node identifier |
PEERS | — | comma-separated peer node IDs |
DATA_DIR | /data | BadgerDB storage directory |
GRPC_ADDR | [::]:9000 | gRPC listen address |
DEBUG_ADDR | [::]:8080 | debug HTTP listen address |
SEEDS | — | comma-separated gossip seed addresses |
BIND_ADDR | :: | gossip protocol bind address |
BIND_PORT | 7946 | gossip protocol bind port |
TICK_MS | 200 | tick interval in milliseconds (tuned for global RTT) |
ELECTION | 20 | election timeout in ticks (= 4s effective) |
HEARTBEAT | 5 | heartbeat interval in ticks (= 1s effective) |
chaos testing
Phalanx ships with a chaos script that proves zero-downtime availability under machine restarts in the 5-node global mesh:
./scripts/chaos.sh phalanx "[fdaa::1]:9000"The script starts a background writer, randomly restarts Fly.io machines across different regions, and verifies read-after-write consistency after each round. With Q=3 and N=5, the cluster survives any 2 simultaneous region failures.