Skip to main content

Canary Deployments

Canary deployments let you roll out a new service version by gradually shifting traffic from the stable version to the canary. If the canary misbehaves, you shift the weight back — no redeployment, no rollback pipeline, only a configuration change.

Traefik Hub API Gateway supports canary deployments through Weighted Round Robin (WRR) services, optionally combined with mirroring for zero-risk shadow testing before the rollout begins.

Shadow Testing with Mirroring

Before sending live traffic to the canary, mirror a percentage of production traffic to validate that the new version handles real request patterns correctly. Users are unaffected — they always receive the response from the stable service.

Monitor the canary's logs, traces, and error rates during mirroring — since responses are discarded, failures won't appear in client-facing metrics.

Configuration Example

apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
name: orders-mirror
namespace: apps
spec:
mirroring:
name: orders-stable
port: 80
mirrors:
- name: orders-canary
port: 80
percent: 20

Monitor the canary's logs, error rates, and latency during mirroring. Once you're satisfied, transition to live traffic splitting.

Progressive Traffic Shifting

Use a WRR TraefikService to control the percentage of live traffic sent to each version. Update the weights at each phase of the rollout.

Configuration Example

Phase 1: Initial Canary (5%)

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: orders
namespace: apps
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.example.com`) && PathPrefix(`/api/orders`)
kind: Rule
services:
- name: orders-canary-wrr
namespace: apps
kind: TraefikService
tls: {}

Advancing the Rollout

To shift more traffic to the canary, update the weights in the TraefikService:

spec:
weighted:
services:
- name: orders-stable
port: 80
weight: 50
- name: orders-canary
port: 80
weight: 50

Rollback

To rollback, set the canary weight to 0 and the stable weight to 100. Traffic shifts immediately with no redeployment required.

Health Checks

Configure health checks on the canary service so Traefik automatically stops sending traffic to it if it becomes unhealthy. This provides an automatic safety net during the rollout.

Configuration Example

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: orders
namespace: apps
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.example.com`) && PathPrefix(`/api/orders`)
kind: Rule
services:
- name: orders-canary
port: 80
healthCheck:
path: /health
interval: 10s
timeout: 3s
note

Health checks are configured on the individual backend service, not on the TraefikService WRR wrapper. When using a canary WRR setup, configure the health check on the orders-canary Kubernetes service reference. If the canary fails its health check, Traefik removes it from the load-balancer pool and traffic falls back to the stable service automatically.

Multi-Cluster Canary

When your stable version runs on different infrastructure than the canary (for example, a monolith on VMs and a new microservice on Kubernetes), you can use multi-cluster traffic distribution to canary across infrastructure boundaries.

On the parent cluster, create a weighted TraefikService that references auto-generated multi-cluster services:

apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
name: orders-cross-cluster-canary
namespace: apps
spec:
weighted:
services:
- name: apps-orders-workload-vm-cluster@multicluster
kind: TraefikService
weight: 90
- name: apps-orders-workload-k8s-cluster@multicluster
kind: TraefikService
weight: 10

This sends 90% of traffic to the monolith on VMs and 10% to the new microservice on Kubernetes. Adjust weights as confidence grows.

See the multi-cluster traffic distribution reference for setup details.