Skip to content

ADR-0009: Availability zones

Proposed
Status

proposed

Date

2026-03-09

Group

cross-cutting

Depends-on

ADR-0002, ADR-0003

Context

Government workloads require high availability and disaster resilience. A single datacenter is a single point of failure (power, cooling, network, physical incidents). The number of availability zones determines the resilience model, the complexity of data replication, and the network architecture between sites.

Options

Option 1: Single AZ (1 datacenter)

  • Pros: simplest operations, no cross-site networking, no replication latency

  • Cons: single point of failure, unacceptable for government continuity requirements

Option 2: 2 AZs

  • Pros: survives single-site failure, simpler than 3-site

  • Cons: split-brain risk for distributed systems (no quorum possible), failover capacity requires 2x provisioning

Option 3: 3+ AZs

  • Pros: quorum-based consensus possible (etcd, Ceph, etc.), survives single-site failure without split-brain, capacity can be distributed (each site runs at ~66% instead of 50%)

  • Cons: cross-site network complexity, data replication across 3 sites, higher infrastructure investment

Decision

Minimum 3 availability zones across physically separate government datacenters (ODCs). Three is the minimum for quorum-based distributed systems. Each AZ must be independently operational (separate power, cooling, network uplinks). Gardener Seed clusters, etcd, and storage replication all require odd-numbered site counts for consensus.

Consequences

  • Cross-AZ networking must be low-latency and high-bandwidth (separate ADR)

  • Storage replication strategy must span 3 AZs (separate ADR)

  • Gardener Seed placement across AZs needs to be defined

  • Each AZ must have sufficient capacity to absorb failure of one other AZ