Upgrading Traefik Hub API Gateway In Nomad

High‑Availability Upgrade Strategy

Nomad provides native rolling‑upgrade semantics via the update stanza. To achieve zero‑downtime upgrades you must:

Run ≥ 2 Traefik Hub allocations (count >= 2).
Add an update stanza with canary, max_parallel, and stagger.
Expose a health‑check (/ping) so Nomad only shifts traffic once the new allocation is healthy.

Drain existing connections gracefully with Traefik’s lifecycle grace‑timeout or a prestop hook.

Below is a minimal, production‑ready job spec that fulfils those requirements:

Nomad Job
job "traefik-hub" {
  datacenters = ["dc1"]
  type        = "service"

  #––– Rolling‑upgrade policy –––––––––––––––––––––––––––––––––––––––
  update {
    stagger      = "30s"   # wait 30 s between replacements
    max_parallel = 1       # replace one allocation at a time
    canary       = 1       # spin up a single canary allocation first
  }

  group "traefik" {
    count = 3               # run three Hub instances for HA

    # spread them across different nodes
    spread {
      attribute = "${node.unique.name}"
      weight    = 100
    }

    network {
      mode = "bridge"
      port "web" { static = 8080 }
    }

    service {
      name     = "traefik"
      provider = "nomad"
      port     = "web"
      check {
        type     = "http"
        path     = "/ping"       # Hub readiness endpoint
        interval = "10s"
        timeout  = "2s"
      }
      # standard tags for the Nomad provider
      tags = [
        "traefik.enable=true",
        "traefik.http.routers.api.entrypoints=web",
        "traefik.http.routers.api.rule=PathPrefix(`/api`) || PathPrefix(`/dashboard`)",
        "traefik.http.routers.api.service=api@internal",
        "traefik.http.services.dummy-svc.loadbalancer.server.port=9999",
      ]
    }

    task "traefik" {
      driver = "docker"

      config {
        image = "ghcr.io/traefik/traefik-hub:v3.16.0"  # You can update the tag here if needed

        args = [
          "traefik-hub",
          "--entrypoints.web.address=:8080/tcp",
          "--entrypoints.web.transport.lifecycle.gracetimeout=20s",  # connection draining
          "--api.dashboard=true",
          "--providers.nomad.endpoint.address=${NOMAD_ADDR}",
          "--providers.nomad.exposedByDefault=false",
          "--hub.token=${HUB_TOKEN}",
          "--log.level=INFO",
        ]

        ports    = ["web"]
        cap_add  = ["NET_BIND_SERVICE"]
        cap_drop = ["ALL"]
      }

      resources {
        cpu    = 500
        memory = 256
      }

      lifecycle {
        hook    = "prestop"
        sidecar = false
        command = "sleep"
        args    = ["25"]   # ≥ gracetimeout to let in‑flight reqs finish
      }
    }
  }
}

Zero‑Downtime Upgrade Procedure

# 1 Update the image tag in the job file, e.g. v3.17.0
sed -i 's/v3\.16\.0/v3.17.0/' traefik-hub.nomad

# 2 Run a rolling upgrade – Nomad will start a canary and continue only if healthy
nomad job run -detach traefik-hub.nomad

# 3 Watch progress (Ctrl‑C to exit)
nomad job status -watch traefik-hub

If the canary allocation reports an unhealthy status, Nomad aborts the deployment and rolls back automatically, ensuring continuous availability.

Rollback:

nomad job revert -yes traefik-hub <previous_version>

Blue‑Green With `reusePort`

On Linux you can combine --entrypoints.<name>.reusePort=true with --entrypoints.<name>.transport.lifecycle.gracetimeout=<seconds> to implement true blue‑green upgrades:

Deploy the new job in parallel on the same host/port.
Because reusePort uses the kernel load‑balancer, connections are automatically distributed between old and new processes.
After verifying the new job, stop the old one; existing connections drain gracefully.

Kernel caveat

reusePort relies on the SO_REUSEPORT socket option. Some older Linux kernels may trigger sporadic TCP resets (see https://lwn.net/Articles/853637/). Upgrade the kernel or disable the flag if you observe anomalies.

High‑Availability Upgrade Strategy​

Zero‑Downtime Upgrade Procedure​

Blue‑Green With reusePort​

Related Content​

High‑Availability Upgrade Strategy

Zero‑Downtime Upgrade Procedure

Blue‑Green With `reusePort`

Related Content