deployment

from local development to a 5-node global mesh on fly.io.

local development

build

go build -ldflags="-s -w" -o phalanx-server ./cmd/server
go build -ldflags="-s -w" -o phalanx ./cmd/phalanx

single node

NODE_ID=node-1 DATA_DIR=./data \
  GRPC_ADDR=127.0.0.1:9000 DEBUG_ADDR=127.0.0.1:8080 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 \
  ./phalanx-server

local development uses faster tick values (100ms/10/3) since there's no cross-continental latency. production defaults are tuned for global RTT.

5-node local cluster

# terminal 1 (simulating Johannesburg)
NODE_ID=node-0 PEERS=node-1,node-2,node-3,node-4 DATA_DIR=./data/0 \
  GRPC_ADDR=127.0.0.1:9000 DEBUG_ADDR=127.0.0.1:8080 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server

# terminal 2 (simulating London)
NODE_ID=node-1 PEERS=node-0,node-2,node-3,node-4 DATA_DIR=./data/1 \
  GRPC_ADDR=127.0.0.1:9001 DEBUG_ADDR=127.0.0.1:8081 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server

# terminal 3 (simulating Chicago)
NODE_ID=node-2 PEERS=node-0,node-1,node-3,node-4 DATA_DIR=./data/2 \
  GRPC_ADDR=127.0.0.1:9002 DEBUG_ADDR=127.0.0.1:8082 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server

# terminal 4 (simulating Singapore)
NODE_ID=node-3 PEERS=node-0,node-1,node-2,node-4 DATA_DIR=./data/3 \
  GRPC_ADDR=127.0.0.1:9003 DEBUG_ADDR=127.0.0.1:8083 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server

# terminal 5 (simulating Frankfurt)
NODE_ID=node-4 PEERS=node-0,node-1,node-2,node-3 DATA_DIR=./data/4 \
  GRPC_ADDR=127.0.0.1:9004 DEBUG_ADDR=127.0.0.1:8084 \
  TICK_MS=100 ELECTION=10 HEARTBEAT=3 ./phalanx-server

docker

docker build -t phalanx .
docker run -v phalanx_data:/data -p 9000:9000 -p 8080:8080 phalanx

global deployment — fly.io

Phalanx is designed to run as a 5-node global mesh across 5 continents. Each node runs in a different Fly.io region with persistent storage and automatic peer discovery via SWIM gossip.

region	code	continent	role
Johannesburg	`JNB`	Africa	voter
London	`LHR`	Europe	voter
Chicago	`ORD`	North America	primary region
Singapore	`SIN`	Asia-Pacific	voter
Frankfurt	`FRA`	Central Europe	voter

step 1 — create the app

fly launch --copy-config --name phalanx --region ord

step 2 — create persistent volumes (one per region)

Each node requires its own BadgerDB volume. Volumes are region-bound and survive VM restarts and redeploys.

fly volumes create phalanx_data --size 1 --region jnb
fly volumes create phalanx_data --size 1 --region lhr
fly volumes create phalanx_data --size 1 --region ord
fly volumes create phalanx_data --size 1 --region sin
fly volumes create phalanx_data --size 1 --region fra

step 3 — deploy

fly deploy

step 4 — scale to 5 nodes

fly scale count 5

step 5 — verify global mesh

# check each region
fly proxy 8080:8080
curl http://localhost:8080/debug/status | jq .

# write from any region
phalanx put hello world -addr <fly-app>:9000

# read from any region (linearizable — leader verifies quorum)
phalanx get hello -addr <fly-app>:9000

why 5 nodes, not 6?

Raft requires a strict majority to commit writes. For odd and even cluster sizes:

nodes	quorum	fault tolerance
3	2	1 failure
4	3	1 failure
5	3	2 failures
6	4	2 failures
7	4	3 failures

4 nodes requires the same quorum as 3 (Q=3 vs Q=2) but only gains one more machine to fail — the 4th node adds cost without improving fault tolerance. 6 nodes has the same fault tolerance as 5 (both survive 2 failures) but requires an extra machine and more heartbeat traffic. Odd numbers are always more efficient for consensus.

cross-continental latency tuning

The worst-case RTT in the global mesh is Johannesburg ↔ Singapore (~300ms). The timing constants are tuned to prevent election “flapping” on high-latency paths:

parameter	value	effective time	rationale
`TICK_MS`	200	200ms per tick	absorbs cross-Atlantic RTT without wasting CPU on fast ticks
`HEARTBEAT`	5 ticks	1 second	allows 3 RTTs within a heartbeat interval for reliable ack delivery
`ELECTION`	20 ticks	4–8 seconds (randomized)	wide window prevents false elections from transient latency spikes

rule of thumb: election timeout should be at least 4× heartbeat interval. here it's 4× minimum (20 vs 5 ticks), 8× maximum (40 vs 5 ticks).

how start.sh works

step	mechanism
1	derives `NODE_ID` from `FLY_MACHINE_ID`
2	detects region from `FLY_REGION` for operational tagging
3	discovers peers via DNS AAAA records on `<app>.internal`
4	builds gossip seed list from discovered IPv6 addresses
5	logs effective timing constants for operational visibility
6	starts the server with persistent storage at `/data`

configuration reference

variable	default	description
`NODE_ID`	hostname	unique node identifier
`PEERS`	—	comma-separated peer node IDs
`DATA_DIR`	/data	BadgerDB storage directory
`GRPC_ADDR`	[::]:9000	gRPC listen address
`DEBUG_ADDR`	[::]:8080	debug HTTP listen address
`SEEDS`	—	comma-separated gossip seed addresses
`BIND_ADDR`	::	gossip protocol bind address
`BIND_PORT`	7946	gossip protocol bind port
`TICK_MS`	200	tick interval in milliseconds (tuned for global RTT)
`ELECTION`	20	election timeout in ticks (= 4s effective)
`HEARTBEAT`	5	heartbeat interval in ticks (= 1s effective)

chaos testing

Phalanx ships with a chaos script that proves zero-downtime availability under machine restarts in the 5-node global mesh:

./scripts/chaos.sh phalanx "[fdaa::1]:9000"

The script starts a background writer, randomly restarts Fly.io machines across different regions, and verifies read-after-write consistency after each round. With Q=3 and N=5, the cluster survives any 2 simultaneous region failures.