Skip to main content

Upgrade Traefik Hub API Gateway

This document explains how to upgrade an existing Traefik Hub API Gateway installation.


You may need to upgrade your Traefik Hub API Gateway installation for the following reasons:

  • A new version of Traefik Hub API Gateway is available.
  • New Custom Resource Definitions (CRDs) have been released.

To upgrade your Traefik Hub API Gateway, follow these two steps:

  1. Upgrade the CRDs.
  2. Upgrade the Helm charts.
info

The changelog provides detailed explanation of what has changed between versions.

Upgrade the CRDs

warning

Please make sure to carefully read the release notes and adapt your values accordingly before starting the upgrade

First, use kubectl to upgrade the CRDs:

CLI
kubectl apply --server-side --force-conflicts \
-k https://github.com/traefik/traefik-helm-chart/traefik/crds/

Upgrade the Helm Charts

Refresh the Helm repository to get the newest version, and then upgrade:

CLI
helm repo update
helm upgrade traefik traefik/traefik

Best Practices For Upgrading Traefik Hub Gateway with High Availability

Rolling Updates With Connection Draining

Traefik Hub exposes a built‑in health‑check endpoint (/ping) that can be used as a readiness & liveness probe. When combined with a rolling‑update strategy and a long enough termination grace period, the Service/LoadBalancer stops sending traffic to the old pod before it is removed, eliminating connection drops.

Below is a sample values file that can enable this:

Helm Values
deployment:
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 20"] # drain connections

updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # never pull all replicas at once
maxSurge: 1 # add one extra pod during the rollout
terminationGracePeriodSeconds: 60 # > LB idle timeout

readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
  • Why 60s for the terminationGracePeriodSeconds? Many managed LBs keep connections open for 30s; doubling gives a safety margin.

  • The preStop sleep gives Traefik time to finish in‑flight requests after it is marked NotReady.

  • Pods are recycled one‑by‑one thanks to maxUnavailable: 0.

readinessProbe

You do not need to set httpGet yourself in the readinessProbe block. The chart automatically points both probes to Traefik’s /ping endpoint on the correct port. Adjust deployment.readinessPath & deployment.healthchecksPort values if you need a different URL, and use the readinessProbe block only for timing or threshold tweaks.

Enable Autoscaling

If you enable the built‑in HPA (autoscaling.enabled) do not set a fixed replicas count. Use the following pattern, adding topology spread constraints and a priority class so that extra replicas are evenly scheduled and treated as critical:

Helm Values
replicas: null       # let HPA control replica count

autoscaling: # Setup Autoscaling
enabled: true
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

# Evenly spread Traefik pods across nodes - See https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/ to learn more
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: traefik

# Ensure Traefik stays running during node pressure events
priorityClassName: system-cluster-critical

terminationGracePeriodSeconds: 60 # > LB idle timeout

readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
How HPA “drains” Pods during scale-down

The Horizontal Pod Autoscaler (HPA) never terminates Pods directly. When it reduces spec.replicas, the Deployment / ReplicaSet controller picks the Pods to remove and Kubernetes runs the standard graceful-termination sequence for each one:

  1. The Pod is marked Terminating and immediately removed from the cluster’s EndpointSlices, so new connections stop.
  1. Kubernetes waits for the Pod’s preStop hook and the full terminationGracePeriodSeconds, letting in-flight requests finish.
  2. If the process is still running at the end of the grace period, the kubelet sends SIGKILL.

Because the Pod leaves load-balancer endpoints as soon as it enters “Terminating,” connection draining happens automatically.

You get exactly the same behaviour as during a rolling update, provided you keep a readinessProbe, an optional preStop hook, and a terminationGracePeriodSeconds in place.

Read the autoscaling documentation to learn more autoscaling Traefik Hub.

Additional High-Availability Enhancements

  • Retry middleware - add an automatic retry to idempotent requests so a transient 502/504 error does not impact clients.

    For Example:

    apiVersion: traefik.io/v1alpha1
    kind: Middleware
    metadata:
    name: retry
    namespace: traefik
    spec:
    retry:
    attempts: 4
    initialInterval: 100ms
  • nativeLBByDefault: true - Enabling this option in the Kubernetes Ingress/CRD providers forces Traefik Hub to call Service ClusterIPs instead of Pod IPs, letting kube‑proxy handle endpoint failover.

Both settings complement rolling or HPA‑driven updates by masking short‑lived pod churn from end‑users.