Kubernetes – Canary Deployment with Gateway API: Building a from Scratch

Introduction

Modern software delivery is no longer about simply deploying code—it’s about controlling risk. Canary deployments are one of the most effective techniques to gradually introduce new versions into production while minimizing impact.

In this article, I’ll walk through a real implementation of a canary deployment using Kubernetes Gateway API, fully managed via Helm. This is not a theoretical overview—it’s a practical, step-by-step evolution from a working single-service deployment to a controlled traffic-splitting setup.

Along the way, I’ll explain not only what was done, but more importantly why each decision was made.


Design Principles

Before diving into implementation, a few principles guided the architecture:

  • Keep it simple: Avoid unnecessary abstractions early on
  • Use native capabilities: Prefer Gateway API over custom CRDs when possible
  • Single source of truth: Everything is managed via Helm
  • Separation of concerns:
    • Application logic ≠ traffic routing
  • Deterministic naming: Avoid dynamic or duplicated naming logic


Step 1 — Making the Application Canary-Friendly

https://github.com/faustobranco/devops-db/tree/master/infrastructure/resources/devops-api

The first step was not infrastructure—it was application design.

Instead of building multiple Docker images for each version, I introduced a simple mechanism:

  • The application reads its version from an environment variable
  • A new endpoint /version exposes that value

Why?

This allows us to:

  • Avoid rebuilding images for every test
  • Change behavior purely through Helm values
  • Clearly observe traffic distribution during testing

Example behavior

GET /version

{"version":"v1.1.1"}
{"version":"v1.1.1-canary"}

Step 2 — Helm as the Control Plane

https://github.com/faustobranco/devops-db/tree/master/infrastructure/resources/devops-api/helm-canary/devops-api

Everything in this setup is managed via Helm:

  • Deployments
  • Services
  • Gateway
  • HTTPRoute

Why Helm?

  • Declarative configuration
  • Version-controlled infrastructure
  • Easy promotion via helm upgrade
  • Natural fit for progressive delivery

Step 3 — Splitting Stable and Canary

Instead of modifying an existing deployment, we create two independent releases:

helm upgrade --install devops-api-stable . \
 --values values-stable.yaml \
 -n devops-api \
 --create-namespace


helm upgrade --install devops-api-canary  . \
 --values values-canary.yaml \
 -n devops-api \
 --create-namespace

Result

We now have:

devops-api-stable
devops-api-canary

Each with:

  • its own Deployment
  • its own Service
  • its own label (role=stable / role=canary)


Step 4 — Why Separate Services?

Each version must be independently addressable.

Service → stable pods
Service → canary pods

Why this matters

Without separate services:

  • Traffic cannot be split
  • Routing becomes ambiguous
  • Observability is lost


Step 5 — Introducing Gateway API

Instead of using Ingress, we use Gateway API, which provides:

  • Explicit routing rules
  • Native traffic splitting
  • Better separation between infrastructure and routing


Step 6 — The HTTPRoute as the Brain

The canary logic lives entirely in the HTTPRoute.

backendRefs:
  - name: devops-api-stable
    port: 80
    weight: 90
  - name: devops-api-canary
    port: 80
    weight: 10

Why this approach?

  • No need for Traefik-specific CRDs
  • Uses standard Kubernetes API
  • Fully declarative
  • Easy to adjust via Helm


Step 7 — Stable Owns the Traffic

A key design decision:

Only the stable release defines the HTTPRoute.

Why?

  • Avoids conflicting routes
  • Centralizes traffic control
  • Keeps canary passive

stable → controls routing
canary → only provides capacity

Step 8 — Helm Values Structure

To keep things simple, we used two values files only:

values-stable.yaml

role: stable

traffic:
  stable: 90
  canary: 10

image:
  tag: "1.1.1"

gateway:
  enabled: true

values-canary.yaml

role: canary

image:
  tag: "1.1.1"

env:
  DEVOPSAPI_VERSION: v1.1.1-canary

gateway:
  enabled: false

Step 9 — Why Avoid Backend Names in Values?

Initially, backend names were configurable:

stableBackend: devops-api-stable
canaryBackend: devops-api-canary

This was removed.

Why?

  • Redundant information
  • Prone to mismatch errors
  • Already derivable from .Chart.Name

Final approach:

name: {{ printf "%s-stable" .Chart.Name }}

Step 10 — Observing the Canary

Testing the rollout:

for i in {1..20}; do curl -s /version; echo; done

Expected output

for i in {1..20}; do curl -s "http://devops-api.devops-db.internal/version"; echo; done
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1-canary"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1"}
{"version":"v1.1.1-canary"}

Interpretation

  • Majority → stable
  • Minority → canary
  • Traffic splitting confirmed

Step 11 — Promotion Strategy

Once validated, promotion is straightforward.

Step 1 — Shift traffic – values-stable.yaml

traffic:
  stable: 0
  canary: 100

Step 2 — Upgrade stable – values-stable.yaml

image:
  tag: v1.1.1-canary

Step 3 — Reset – values-stable.yaml

traffic:
  stable: 100
  canary: 0

Apply helm:

helm upgrade --install devops-api-stable . \
 --values values-stable.yaml \
 -n devops-api \
 --create-namespace

Step 12 — Why This Works Well

This architecture has several strengths:

✔ Simplicity

  • No extra controllers
  • No custom CRDs
  • Minimal moving parts

✔ Observability

  • Clear version endpoint
  • Easy traffic validation

✔ Safety

  • Gradual rollout
  • Instant rollback via weights

✔ Extensibility

  • Ready for automation
  • Compatible with Argo Rollouts

1. Gateway API is powerful enough

You don’t need TraefikService or custom routing logic for basic canary.


2. Naming matters more than expected

Many early issues came from:

  • mismatched service names
  • duplicated suffixes
  • inconsistent release naming

3. Canary is not just routing

It requires:

  • API compatibility
  • consistent endpoints
  • predictable behavior

4. Debugging requires observability

The /version endpoint was critical.

Without it:

  • impossible to confirm routing behavior
  • harder to isolate issues

Leave a Reply

Your email address will not be published. Required fields are marked *