Upgrade Traefik Hub API Gateway
This document explains how to upgrade an existing Traefik Hub API Gateway installation.
You may need to upgrade your Traefik Hub API Gateway installation for the following reasons:
- A new version of Traefik Hub API Gateway is available.
- New Custom Resource Definitions (CRDs) have been released.
To upgrade your Traefik Hub API Gateway, follow these two steps:
- Upgrade the CRDs.
- Upgrade the Helm charts.
The changelog provides detailed explanation of what has changed between versions.
Upgrade the CRDs
Please make sure to carefully read the release notes and adapt your values accordingly before starting the upgrade
First, use kubectl
to upgrade the CRDs:
kubectl apply --server-side --force-conflicts \
-k https://github.com/traefik/traefik-helm-chart/traefik/crds/
Upgrade the Helm Charts
Refresh the Helm repository to get the newest version, and then upgrade:
helm repo update
helm upgrade traefik traefik/traefik
Best Practices For Upgrading Traefik Hub Gateway with High Availability
Rolling Updates With Connection Draining
Traefik Hub exposes a built‑in health‑check endpoint (/ping
) that can be used as a readiness & liveness probe.
When combined with a rolling‑update strategy and a
long enough termination grace period, the Service/LoadBalancer stops sending traffic to the old pod before it is removed, eliminating connection drops.
Below is a sample values file that can enable this:
deployment:
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 20"] # drain connections
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # never pull all replicas at once
maxSurge: 1 # add one extra pod during the rollout
terminationGracePeriodSeconds: 60 # > LB idle timeout
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
-
Why 60s for the
terminationGracePeriodSeconds
? Many managed LBs keep connections open for 30s; doubling gives a safety margin. -
The
preStop
sleep gives Traefik time to finish in‑flight requests after it is markedNotReady
. -
Pods are recycled one‑by‑one thanks to
maxUnavailable: 0
.
You do not need to set httpGet
yourself in the readinessProbe
block. The chart automatically points both probes to Traefik’s /ping
endpoint on the correct port.
Adjust deployment.readinessPath
& deployment.healthchecksPort
values if you need a different URL, and use the readinessProbe
block only for timing or threshold tweaks.
Enable Autoscaling
If you enable the built‑in HPA (autoscaling.enabled
) do not set a fixed replicas count. Use the following pattern, adding topology spread constraints and a priority class so that extra replicas
are evenly scheduled and treated as critical:
replicas: null # let HPA control replica count
autoscaling: # Setup Autoscaling
enabled: true
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Evenly spread Traefik pods across nodes - See https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/ to learn more
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: traefik
# Ensure Traefik stays running during node pressure events
priorityClassName: system-cluster-critical
terminationGracePeriodSeconds: 60 # > LB idle timeout
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
The Horizontal Pod Autoscaler (HPA) never terminates Pods directly.
When it reduces spec.replicas
, the Deployment / ReplicaSet controller picks the Pods to remove and Kubernetes runs the standard graceful-termination sequence for each one:
- The Pod is marked Terminating and immediately removed from the cluster’s
EndpointSlices
, so new connections stop.
- Kubernetes waits for the Pod’s
preStop
hook and the fullterminationGracePeriodSeconds
, letting in-flight requests finish. - If the process is still running at the end of the grace period, the kubelet sends
SIGKILL
.
Because the Pod leaves load-balancer endpoints as soon as it enters “Terminating,” connection draining happens automatically.
You get exactly the same behaviour as during a rolling update, provided you keep a readinessProbe
, an optional preStop
hook, and a terminationGracePeriodSeconds
in place.
Read the autoscaling documentation to learn more autoscaling Traefik Hub.
Additional High-Availability Enhancements
-
Retry middleware - add an automatic retry to idempotent requests so a transient 502/504 error does not impact clients.
For Example:
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: retry
namespace: traefik
spec:
retry:
attempts: 4
initialInterval: 100ms -
nativeLBByDefault: true
- Enabling this option in the Kubernetes Ingress/CRD providers forces Traefik Hub to call Service ClusterIPs instead of Pod IPs, letting kube‑proxy handle endpoint failover.
Both settings complement rolling or HPA‑driven updates by masking short‑lived pod churn from end‑users.