Skip to main content

Multi-Cluster Traffic Distribution

Multi-Cluster traffic distribution enables automatic cross-cluster service discovery and HTTP traffic routing. A parent Traefik cluster automatically discovers workloads advertised by child clusters running on different infrastructure - VMs, Kubernetes, Docker Compose, or different cloud platforms - and creates services to route traffic to them. This includes any protocol that runs over HTTP such as gRPC and WebSockets, and works with HTTP/1.1, HTTP/2, and HTTP/3.

License Requirement

Multi-Cluster is a licensed feature. The multi-cluster feature must be included in your license for both parent and child clusters. Contact the Traefik Labs sales team for access.

This guide walks you through setting up multi-cluster routing and covers common use cases like weighted load balancing, failover, canary deployments, and traffic mirroring.

Overview

In a multi-cluster setup:

  • A parent cluster acts as the entry point for all traffic and makes routing decisions
  • Child clusters advertise their workloads using Uplink resources
  • The parent automatically discovers child workloads and creates services to route traffic to them

Key Concepts

ConceptDescription
UplinkA resource on child clusters that advertises a workload to parent clusters
Uplink Entry PointA specialized entry point on child clusters for inter-cluster communication
Auto-generated ServicesServices automatically created on the parent when uplinks are discovered
Auto-generated RoutesRoutes automatically created on the child when associating uplink to a route

Prerequisites

Before setting up multi-cluster routing, ensure you have:

  • At least two Traefik Hub instances (one parent, one or more children)
  • Network connectivity from the parent cluster to each child cluster on the child's uplink entry point port (e.g., 9443). Traffic flows in one direction only: parent → child. Firewall rules should allow inbound connections on the uplink port on child clusters from the parent cluster.
  • TLS certificates for secure inter-cluster communication (mTLS recommended for production)

Step 1: Configure Child Clusters

Each child cluster needs an uplink entry point and uplink resources to advertise its workloads.

Configure Uplink Entry Point

On each child cluster, configure an uplink entry point in the static configuration:

ports:
multicluster:
port: 9443
uplink: true # Marks this port as an uplink entry point
asDefault: true # Uplinks without explicit entryPoints use this one
expose:
default: true # Exposes this port on the existing LoadBalancer service
http:
tls:
enabled: true # Enables TLS (self-signed certificate by default)
note

None of these settings are configured by default — each must be set explicitly. If you define multiple uplink entry points and none has asDefault: true, all of them are used as defaults.

Security Considerations

The uplink entry point exposes an internal discovery API and forwards traffic to child routers. If the port is publicly reachable, an attacker could discover advertised routes and send requests to backend services. To prevent unauthorized access:

  • Restrict network access to the uplink port using firewall rules or private networks so only the parent cluster can reach it
  • Use mTLS in production so the child verifies the parent's client certificate (see Securing Inter-Cluster Communication)
  • Add middlewares on child routers (e.g., IP allowlisting, rate limiting) for defense-in-depth — authentication is typically handled at the parent level, but child-side middlewares provide an additional layer of protection

Enable the Multi-Cluster Provider (Child Clusters)

Child clusters must enable the Multi-Cluster provider so the parent cluster can discover their advertised workloads.

hub:
providers:
multicluster:
enabled: true

# Required when using Uplink CRDs (Kubernetes)
providers:
kubernetesCRD:
enabled: true

Each provider exposes the Uplink concept in its own way. On Kubernetes, uplinks are declared as CRDs and referenced from IngressRoutes via annotations. On VMs or other platforms, uplinks and routers are defined through the file provider.

Create an Uplink resource for each workload you want to advertise to the parent cluster:

apiVersion: hub.traefik.io/v1alpha1
kind: Uplink
metadata:
name: api-workload
namespace: apps

Then connect your router to the uplink using the hub.traefik.io/router.uplinks annotation with the fully-qualified uplink name (<namespace>-<name>):

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: api-route
namespace: apps
annotations:
hub.traefik.io/router.uplinks: "apps-api-workload"
spec:
routes:
- match: PathPrefix(`/api`)
kind: Rule
services:
- name: api-backend
port: 8080
Important

When a router references an uplink:

  • Do not specify entryPoints on the router (inherited from the uplink)
  • Do not specify tls configuration (handled by the uplink entry point)

Step 2: Configure Parent Cluster

The parent cluster needs the Multi-Cluster provider configured to connect to child clusters. Each child address must be reachable from the parent on the uplink port (see Prerequisites and Security Considerations).

hub:
providers:
multicluster:
enabled: true
pollInterval: 5
pollTimeout: 5
children:
child-1:
address: "https://child1.example.com:9443"
child-2:
address: "https://child2.example.com:9443"
Self-Signed Certificates

If your child clusters use self-signed TLS certificates, the parent cluster will fail to connect with a certificate validation error (e.g., tls: failed to verify certificate: x509: cannot validate certificate for 10.38.248.230 because it doesn't contain any IP SANs).

To allow connections with self-signed certificates during development or testing, add insecureSkipVerify: true to each child's configuration:

children:
child-1:
address: "https://child1.example.com:9443"
serversTransport:
insecureSkipVerify: true

Warning: Only use insecureSkipVerify in development/testing environments. For production, use properly signed certificates and configure mutual TLS (mTLS) with certificate authorities.

Route Traffic to Child Clusters

Once configured, the parent automatically creates services for discovered uplinks. Reference these services in your routers:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: api-parent-route
namespace: apps
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.example.com`)
kind: Rule
services:
- name: apps-api-workload@multicluster
kind: TraefikService
tls: {}

The service name follows the pattern <namespace>-<uplink-name>@multicluster.

Cross-Platform Naming

When running child clusters on different platforms (e.g., Kubernetes and VMs), the service names that appear on the parent must match. Kubernetes Uplinks default to <namespace>-<name>, but file provider uplinks use the key name as-is with no namespace prefix. Use spec.exposeName on Kubernetes Uplinks to align names across platforms. See VM to Kubernetes Migration for an example.

Use Cases

Weighted Load Balancing

Distribute traffic across multiple child clusters. By default, traffic is distributed equally. If needed, you can assign a different weight to each cluster's Uplink to control the traffic proportion.

# On child cluster 1 - receives 90% of traffic
apiVersion: hub.traefik.io/v1alpha1
kind: Uplink
metadata:
name: api-workload
namespace: apps
spec:
weight: 90

The parent cluster automatically creates a weighted round-robin service that distributes traffic according to these weights. You can also create your own service on the parent that targets the per-child services to override weights or use a different routing strategy such as failover or traffic mirroring (see the Multi-Cluster Provider Reference).

Automatic Failover

Configure failover to automatically route traffic to a backup cluster when the primary becomes unavailable.

On the parent cluster, create a failover service that references the auto-generated per-child services:

apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
name: api-failover
namespace: apps
spec:
failover:
service: apps-api-workload-child-1@multicluster
fallback: apps-api-workload-child-2@multicluster

Then reference this failover service in your router:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: api-route
namespace: apps
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.example.com`)
kind: Rule
services:
- name: api-failover
kind: TraefikService
tls: {}

When the primary cluster (child-1) fails health checks or becomes unreachable, traffic automatically shifts to the fallback cluster (child-2).

High Availability Pair

For high availability with two clusters that serve as mutual backups, each cluster can use its local service as primary with the remote cluster as fallback. This creates a bidirectional failover configuration.

On Cluster A (e.g., EU region):

First, configure an Uplink with health checks:

apiVersion: hub.traefik.io/v1alpha1
kind: Uplink
metadata:
name: api-workload
namespace: apps
spec:
entryPoints:
- uplink
exposeName: api-workload-cluster-a
healthCheck:
hostname: "api.cluster-a.example.com"
path: /health
interval: 10s
timeout: 3s
status: 200
port: 443

Then configure the multicluster provider to connect to Cluster B and define a failover service using the file provider:

http:
services:
api-ha-failover:
failover:
service: apps-api-workload@kubernetescrd # Local service
fallback: api-workload-cluster-b@multicluster # Remote cluster

Create an IngressRoute that uses the failover service:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: api-route
namespace: apps
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.example.com`)
kind: Rule
services:
- name: api-ha-failover@file
kind: TraefikService
tls: {}

On Cluster B (e.g., US region):

Configure an Uplink with health checks:

apiVersion: hub.traefik.io/v1alpha1
kind: Uplink
metadata:
name: api-workload
namespace: apps
spec:
entryPoints:
- uplink
exposeName: api-workload-cluster-b
healthCheck:
hostname: "api.cluster-b.example.com"
path: /health
interval: 10s
timeout: 3s
status: 200
port: 443

Then configure the multicluster provider to connect to Cluster A and define the symmetric failover:

http:
services:
api-ha-failover:
failover:
service: apps-api-workload@kubernetescrd # Local service
fallback: api-workload-cluster-a@multicluster # Remote cluster

Create the same IngressRoute configuration:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: api-route
namespace: apps
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.example.com`)
kind: Rule
services:
- name: api-ha-failover@file
kind: TraefikService
tls: {}

Each cluster serves traffic from its local service by default. When health checks detect the local service is unavailable, traffic automatically fails over to the remote cluster.

Canary Deployments

Gradually shift traffic between clusters for canary deployments. This is controlled from the parent cluster rather than the children.

Create a weighted service on the parent:

apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
name: api-canary
namespace: apps
spec:
weighted:
services:
- name: apps-api-workload-child-1@multicluster
kind: TraefikService
weight: 90
- name: apps-api-workload-child-2@multicluster
kind: TraefikService
weight: 10

To shift more traffic to the new version, update the weights:

spec:
weighted:
services:
- name: apps-api-workload-child-1@multicluster
kind: TraefikService
weight: 50
- name: apps-api-workload-child-2@multicluster
kind: TraefikService
weight: 50
Cookie-Based Sticky Sessions

Cookie-based sticky sessions (client-side session affinity) are not yet available for multi-cluster load balancing. However, server-side stickiness is supported through Highest Random Weight (HRW), which deterministically routes clients to the same child cluster based on request attributes.

For use cases requiring client-side session persistence, consider using application-level session management or implementing stickiness at the child cluster level.

Traffic Mirroring

Mirror a percentage of production traffic to a secondary cluster for testing without affecting users.

apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
name: api-mirrored
namespace: apps
spec:
mirroring:
name: apps-api-workload-child-1@multicluster
kind: TraefikService
mirrors:
- name: apps-api-workload-child-2@multicluster
kind: TraefikService
percent: 10

This sends all traffic to child-1 while mirroring 10% to child-2. Responses from the mirror are discarded.

Consistent Hashing (HRW) for Stateful Services

For stateful services like MCP (Model Context Protocol) servers where clients must reach the same backend consistently, Traefik Hub supports Highest Random Weight (HRW), also known as rendezvous hashing or consistent hashing. This provides server-side stickiness without requiring cookies, routing requests to the same child cluster based on request attributes such as source IP or headers.

HRW ensures that clients using stateful protocols like MCP maintain their session with the same child cluster, which is essential when the server maintains conversation context and state across multiple requests.

To use HRW with multi-cluster services, create a dedicated TraefikService with highestRandomWeight that references the multi-cluster services:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: mcp-route
namespace: apps
spec:
entryPoints:
- websecure
routes:
- match: Host(`mcp.example.com`)
kind: Rule
services:
- name: mymcp-hrw-service
kind: TraefikService
middlewares:
- name: mcp-jwt
- name: mcp-gateway
tls: {}

When multiple child clusters advertise the same MCP server uplink, HRW deterministically routes each client to the same child based on the client's request characteristics. This maintains session affinity without the overhead of cookie management, making it ideal for API Gateway and MCP Gateway scenarios.

VM to Kubernetes Migration

Migrate workloads from VMs to Kubernetes by running both in parallel and gradually shifting traffic.

On VMs, uplinks are defined through the file provider. On Kubernetes, the Uplink CRD automatically prefixes the uplink name with the namespace (<namespace>-<name>). Since the file provider has no namespace concept, you must align the names, so the parent sees both clusters under the same service. Use the exposeName field on the Kubernetes Uplink to match the file provider name.

# VM cluster - file provider routing configuration
http:
uplinks:
banking-api:
weight: 80

routers:
banking:
rule: PathPrefix(`/banking`)
service: banking-backend
uplinks:
- banking-api

services:
banking-backend:
loadBalancer:
servers:
- url: http://127.0.0.1:8080

Both clusters now advertise under the name banking-api, and the parent creates a single banking-api@multicluster weighted service. As confidence in the new version grows, adjust weights to shift more traffic to Kubernetes.

Dedicated Infrastructure per Customer

Route specific customers to dedicated clusters based on JWT claims or other request attributes.

On the parent cluster, use Multi-Layer Routing so a parent router authenticates the request and injects claim-derived headers, and a child router makes the routing decision based on those headers. (You can't match on JWT-derived headers in the same router because middleware runs after router matching.)

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: api-parent
namespace: apps
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.example.com`)
kind: Rule
middlewares:
- name: jwt-auth
tls: {}

In this example, both routers live on the parent cluster. The parent router authenticates the request and injects the X-Customer-Tier header, and the second-layer router makes the routing decision based on that header, sending enterprise traffic to a dedicated child cluster (apps-api-workload-dedicated@multicluster) while other traffic goes to the shared cluster (apps-api-workload-shared@multicluster).

Per-Child Service Names

This example uses specific per-child services (-dedicated and -shared suffixes) rather than the generic apps-api-workload@multicluster weighted round-robin service. If you used the generic service for the catch-all route, non-enterprise customers could randomly be routed to the dedicated cluster through load balancing, defeating the purpose of tier-based routing. The per-child services ensure traffic isolation between customer tiers.

caution

Since JWT authentication runs on the parent cluster, the child cluster's uplink entry point does not enforce it. If the uplink port is publicly reachable, requests sent directly to the child bypass the parent's JWT middleware entirely. Use mTLS and network restrictions on the uplink entry point to ensure only the parent can reach child clusters (see Security Considerations and Securing Inter-Cluster Communication).

Securing Inter-Cluster Communication

For production deployments, secure communication between parent and child clusters using mutual TLS (mTLS). With mTLS, both sides authenticate each other during the TLS handshake:

  1. The child presents its server certificate to the parent — the parent verifies it against a trusted CA (rootCAs)
  2. The parent presents its client certificate to the child — the child verifies it against the same or a different trusted CA (clientAuth.caFiles)

This two-way verification ensures that only authorized parent clusters can communicate with child clusters, and that the parent connects to legitimate child clusters.

Certificate Files

mTLS requires the following certificates, which must be generated outside of Traefik (e.g., using openssl or a PKI tool). You can use a single CA for both purposes, or use separate CAs:

FileLocationPurpose
ca.crtParent + ChildCA certificate that signed both the parent's client cert and the child's server cert (or use separate CAs)
client.crt / client.keyParentClient certificate the parent presents to the child during the TLS handshake
child.crt / child.keyChildServer certificate the child presents to the parent during the TLS handshake

Configure mTLS on Parent

On the parent, the serversTransport configures both sides of the parent's TLS behavior:

  • rootCAs: CA certificates used to verify the child's server certificate (the standard TLS direction — without this, you'd need insecureSkipVerify: true)
  • certificates: Client certificate and key that the parent presents to the child (this is what makes it mTLS — without this, it's one-way TLS/HTTPS)
hub:
providers:
multicluster:
children:
child-1:
address: "https://child1.example.com:9443"
serversTransport:
rootCAs:
- /certs/ca.crt # Verify the child's server certificate
certificates:
- certFile: /certs/client.crt # Present to child as client identity
keyFile: /certs/client.key

Configure mTLS on Child Entry Point

On the child, two things must be configured:

  1. A TLS option with clientAuth — this tells the child to demand a client certificate from the parent and verify it
  2. A server certificate — the child's own certificate, presented to the parent during the TLS handshake

Reference a TLS option that requires client certificates on the child's uplink entry point:

ports:
multicluster:
port: 9443
uplink: true
asDefault: true
expose:
default: true
http:
tls:
enabled: true
options: strict-mtls@file

Then define the strict-mtls TLS option in a file provider configuration on the child cluster:

# Routing configuration (file provider) on the child cluster
tls:
options:
strict-mtls:
clientAuth:
caFiles:
- /certs/ca.crt # Verify the parent's client certificate
clientAuthType: RequireAndVerifyClientCert # Reject connections without a valid client cert
minVersion: VersionTLS12
stores:
default:
defaultCertificate:
certFile: /certs/child.crt # Child's server certificate (presented to parent)
keyFile: /certs/child.key

For Kubernetes, you can define the TLS option as a CRD and reference it using the format namespace-name@kubernetescrd:

apiVersion: traefik.io/v1alpha1
kind: TLSOption
metadata:
name: mtls-uplink
namespace: traefik
spec:
clientAuth:
secretNames:
- mtlsca # Secret containing ca.crt
clientAuthType: RequireAndVerifyClientCert

The child's server certificate must be added to the TLS store (not in the TLSOption):

# Helm values - TLS store configuration
tlsStore:
default:
defaultCertificate:
secretName: default-cert # Default certificate
certificates:
- secretName: uplinkcert # Child's server certificate for mTLS

Then reference the TLS option in your uplink entry point configuration:

# Helm values - Uplink entry point with mTLS
ports:
multicluster:
port: 9443
uplink: true

additionalArguments:
- --hub.uplinkEntryPoints.multicluster.http.tls.options=traefik-mtls-uplink@kubernetescrd

The key setting is clientAuthType: RequireAndVerifyClientCert — this is what enforces the "mutual" part of mTLS. Without it, the child would accept any TLS connection (only HTTPS), even from unauthorized clients. With it, only clients presenting a certificate signed by the trusted CA (i.e., the parent cluster) can connect to the uplink entry point.

Testing mTLS Connection

To verify that mTLS is properly configured, test the connection to the child's uplink endpoint using curl with client certificates:

curl --cert client.pem --key client.key -k https://child-cluster.example.com:9443/api/uplinks

This command:

  • Uses the parent's client certificate (--cert client.pem) and private key (--key client.key)
  • Connects to the child's uplink entry point on the configured port (e.g., 9443)
  • Queries the /api/uplinks endpoint to verify authentication

If mTLS is correctly configured, the child will accept the connection and return uplink information. Without valid client certificates, the connection will be rejected.

For complete TLS configuration options, see the Multi-Cluster Provider Reference.

Troubleshooting

Uplinks Not Appearing on Parent
  1. Verify the child cluster's uplink entry point is reachable from the parent
  2. Check that the child address in the parent configuration is correct
  3. Check parent logs for polling errors
Traffic Not Reaching Child Clusters
  1. Verify the service name follows the pattern <uplink-expose-name>@multicluster (the expose name defaults to <namespace>-<uplink-name>, but can be overridden by spec.exposeName)
  2. Check that the router on the child references the uplink correctly
  3. Ensure the child cluster's backend service is healthy
Connection Errors
  1. Verify TLS certificates are valid and not expired
  2. Check firewall rules allow traffic on the uplink entry point port
  3. Ensure serversTransport configuration matches the child's TLS setup