Skip to content

ADR-0005: Bare-metal provisioning tool

Proposed
Status

proposed

Date

2026-03-09

Group

hardware

Depends-on

ADR-0003, ADR-0004

Context

With bare-metal infrastructure (ADR-0003) and spine-leaf networking (ADR-0004) chosen, we need a tool that provisions servers and manages their lifecycle. At 50,000 servers, manual provisioning is not viable.

Options

Option 1: metal-stack

  • Pros: integrated compute + network provisioning (spine-leaf BGP/EVPN built in), production-proven at scale (German financial sector under BaFin/ECB), European origin (x-cellent), API-first zero-touch provisioning, proven Gardener integration, small team viable at scale (estimated 10-30 FTE for 500-50,000 servers, to be validated)

  • Cons: smaller community than CAPI/Metal³, opinionated network architecture (but matches ADR-0004), not a CNCF project, Dutch market expertise limited

Option 2: Cluster API + Metal³ + Ironic

  • Pros: CNCF sandbox project, declarative and Kubernetes-native, modular — infrastructure provider is swappable

  • Cons: compute-only — network provisioning is a separate problem, Ironic unproven at 50k server scale, high initial complexity, management cluster is extra operational burden

Option 3: Talos Linux + CAPI

  • Pros: minimal attack surface (no SSH, no shell), immutable, fast bootstrap

  • Cons: single vendor (Sidero Labs), no network provisioning, smaller community, culture shift required

Option 4: kubeadm + Tinkerbell

  • Pros: most familiar bootstrap tool, CNCF-aligned, maximum OS flexibility

  • Cons: CNCF Sandbox (immature), no integrated lifecycle management, no network provisioning, all integration is custom glue

Decision

metal-stack. Integrated compute + network provisioning is the key differentiator at our scale. Separate network automation for 50,000 servers would be a project in itself. The proven Gardener integration provides a path to cluster lifecycle management. European governance aligns with sovereignty requirements.

Consequences

  • Datacenter switches must be Edgecore with SONiC (or compatible)

  • Cluster lifecycle management via Gardener becomes the natural next choice (separate ADR)

  • The platform team needs metal-stack expertise