AI Gateway Failover

The AI Gateway failover service automatically switches between LLM providers when the primary returns an error. Use it to handle provider outages, rate limits (HTTP 429), and model unavailability without interrupting consumers.

Failover builds on two capabilities:

Failover service with error detection -- triggers a fallback when the primary responds with a configured HTTP status code.
Service-level middlewares -- each backend gets its own Chat Completion middleware so every provider independently handles model selection, token injection, and GenAI metrics.

How It Works

The failover chain is built from nested TraefikService resources. Each backend references an ExternalName service with its own chat-completion middleware attached at the service level (the middlewares field on the TraefikService, not on the route). This means every provider independently handles model selection, token injection, and GenAI metrics -- so switching from OpenAI to Mistral transparently rewrites the model name, swaps the API key, and records the correct provider in traces.

Failover Triggers

Trigger	When it fires	Configuration
Status code	Primary responds with a status matching the status codes	`failover.errors.status`
Health check	Primary becomes unreachable based on periodic health probes	`failover.healthCheck` + `loadBalancer.healthCheck`

This guide focuses on status-code-based failover.

Prerequisites

Before configuring failover, ensure you have:

Traefik Hub with AI Gateway enabled:

helm upgrade traefik traefik/traefik -n traefik --wait \
  --reset-then-reuse-values \
  --set hub.aigateway.enabled=true

API keys for each LLM provider (OpenAI, Mistral, etc.)
A dedicated namespace for the AI Gateway resources:
```
kubectl create namespace ai-failover
```

Configuration Example

This example configures a 3-provider failover chain: OpenAI GPT-4o → Mistral Large → Mistral Small.

Step 1: Define Kubernetes Resources

ExternalName Services
Secrets
Chat Completion Middlewares

external-services.yaml
apiVersion: v1
kind: Service
metadata:
  name: openai-external
  namespace: ai-failover
spec:
  type: ExternalName
  externalName: api.openai.com
  ports:
    - port: 443
      targetPort: 443
---
apiVersion: v1
kind: Service
metadata:
  name: mistral-external
  namespace: ai-failover
spec:
  type: ExternalName
  externalName: api.mistral.ai
  ports:
    - port: 443
      targetPort: 443
---
apiVersion: v1
kind: Service
metadata:
  name: mistral-small-external
  namespace: ai-failover
spec:
  type: ExternalName
  externalName: api.mistral.ai
  ports:
    - port: 443
      targetPort: 443

secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: openai-key
  namespace: ai-failover
type: Opaque
data:
  token: <base64-encoded-openai-api-key>
---
apiVersion: v1
kind: Secret
metadata:
  name: mistral-key
  namespace: ai-failover
type: Opaque
data:
  token: <base64-encoded-mistral-api-key>

chat-completion-middlewares.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: chatcompletion-gpt4o
  namespace: ai-failover
spec:
  plugin:
    chat-completion:
      token: urn:k8s:secret:openai-key:token
      model: gpt-4o
      allowModelOverride: false
      allowParamsOverride: true
      params:
        temperature: 1
        topP: 1
        maxTokens: 2048
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: chatcompletion-mistral-large
  namespace: ai-failover
spec:
  plugin:
    chat-completion:
      token: urn:k8s:secret:mistral-key:token
      model: mistral-large-latest
      allowModelOverride: false
      allowParamsOverride: true
      params:
        temperature: 1
        topP: 1
        maxTokens: 2048
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: chatcompletion-mistral-small
  namespace: ai-failover
spec:
  plugin:
    chat-completion:
      token: urn:k8s:secret:mistral-key:token
      model: mistral-small-latest
      allowModelOverride: false
      allowParamsOverride: true
      params:
        temperature: 1
        topP: 1
        maxTokens: 2048

Step 2: Configure the Failover Chain

Create nested TraefikService resources that define the failover order. Each backend attaches its own chat-completion middleware so model selection, token injection, and metrics work independently per provider.

failover-traefik-services.yaml
# Top-level failover: OpenAI GPT-4o → secondary failover
apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
  name: failover-primary
  namespace: ai-failover
spec:
  failover:
    service:
      name: openai-external
      port: 443
      middlewares:
        - name: chatcompletion-gpt4o
    fallback:
      name: failover-secondary
      kind: TraefikService
    errors:
      status:
        - "429"
        - "500-504"
      maxRequestBodyBytes: 2097152
---
# Secondary failover: Mistral Large → Mistral Small
apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
  name: failover-secondary
  namespace: ai-failover
spec:
  failover:
    service:
      name: mistral-external
      port: 443
      middlewares:
        - name: chatcompletion-mistral-large
    fallback:
      name: mistral-small-external
      port: 443
      middlewares:
        - name: chatcompletion-mistral-small
    errors:
      status:
        - "429"
        - "500-504"
      maxRequestBodyBytes: 2097152

Step 3: Configure the IngressRoute

Route traffic to the failover chain by referencing the top-level TraefikService:

ingressroute.yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: ai-failover
  namespace: ai-failover
spec:
  entryPoints:
    - websecure
  routes:
    - kind: Rule
      match: Host(`ai.example.com`) && Path(`/v1/chat/completions`)
      services:
        - name: failover-primary
          kind: TraefikService
  tls: {}

Path Rewriting for Custom Routes

This example routes requests to /v1/chat/completions, which is the standard path for OpenAI and Mistral chat completion endpoints.

If you want to expose the failover service on a different path (for example, PathPrefix(/api/ai)), you should attach a path rewrite middleware to each service in the failover chain. This ensures the backend providers receive requests at their expected paths.

For example, add a replacePathRegex middleware to rewrite /api/ai/* to /v1/chat/completions and attach it to the service-level middlewares field alongside the chat-completion middleware. See the NVIDIA NIMs Integration guide for a complete path rewriting example.

Step 4: Test the Failover Chain

Send a request to the failover endpoint:

curl -s -X POST "https://ai.example.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Hello, which model are you?"
      }
    ]
  }' | jq .

The response comes from OpenAI GPT-4o (the primary). Check the model field in the response to confirm which provider handled the request. When the primary returns a status matching errors.status (for example, 429 or 500-504), Traefik replays the request to the next provider in the chain automatically -- no client-side retry logic required.

Circuit Breaker Integration

Integrate circuit breakers with failover to implement cooldown periods that prevent the gateway from repeatedly trying unhealthy services. When a circuit breaker trips (opens), it stops sending requests to that service for a configured duration (fallbackDuration), allowing the service to recover before attempting requests again.

This is particularly useful for AI Gateway workloads where:

A provider experiences an outage or rate limit
You want to prevent cascading failures from overwhelming degraded services
You need automatic recovery without manual intervention

How It Works

Normal operation (circuit closed): Requests flow through the failover chain normally
Service degrades: Circuit breaker detects errors/latency issues and trips (opens)
Cooldown period: Circuit breaker immediately returns 503 for fallbackDuration (e.g., 30s), triggering failover to the next provider
Recovery attempt: After cooldown, circuit breaker enters recovering state and gradually sends traffic back to the service
Service healthy: Circuit breaker closes, resuming normal operation

Configuration Example

Attach circuit breaker middlewares to each service in the failover chain to protect against cascading failures:

circuit-breaker-middlewares.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: circuit-breaker-primary
  namespace: ai-failover
spec:
  circuitBreaker:
    expression: ResponseCodeRatio(500, 600, 0, 600) > 0.30 || NetworkErrorRatio() > 0.10
    checkPeriod: 5s
    fallbackDuration: 30s  # Cooldown period - don't retry for 30s
    recoveryDuration: 20s  # Gradually restore traffic over 20s
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: circuit-breaker-secondary
  namespace: ai-failover
spec:
  circuitBreaker:
    expression: ResponseCodeRatio(500, 600, 0, 600) > 0.30 || NetworkErrorRatio() > 0.10
    checkPeriod: 5s
    fallbackDuration: 30s
    recoveryDuration: 20s

Then attach these to your failover services before the chat-completion middleware:

failover-with-circuit-breaker.yaml
apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
  name: failover-primary
  namespace: ai-failover
spec:
  failover:
    service:
      name: openai-external
      port: 443
      middlewares:
        - name: circuit-breaker-primary  # Circuit breaker first
        - name: chatcompletion-gpt4o
    fallback:
      name: failover-secondary
      kind: TraefikService
    errors:
      status:
        - "429"
        - "500-504"
      maxRequestBodyBytes: 2097152
---
apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
  name: failover-secondary
  namespace: ai-failover
spec:
  failover:
    service:
      name: mistral-external
      port: 443
      middlewares:
        - name: circuit-breaker-secondary  # Circuit breaker first
        - name: chatcompletion-mistral-large
    fallback:
      name: mistral-small-external
      port: 443
      middlewares:
        - name: chatcompletion-mistral-small  # No circuit breaker on last fallback
    errors:
      status:
        - "429"
        - "500-504"
      maxRequestBodyBytes: 2097152

Testing Circuit Breaker Behavior

Simulate a service outage to observe the circuit breaker tripping and recovery:

Trigger circuit breaker on primary (e.g., by blocking the OpenAI endpoint or simulating 500 errors)
Observe immediate failover to Mistral Large (secondary) - no retries to OpenAI during cooldown
Trigger circuit breaker on secondary (same technique)
Observe failover to Mistral Small (tertiary)
Wait for recovery - after 30s cooldown + 20s recovery, services gradually come back online in reverse order

This ensures smooth degradation and recovery without overwhelming unhealthy services with retry traffic.

Circuit Breaker Expression Tuning

The example uses ResponseCodeRatio(500, 600, 0, 600) > 0.30 (trip when 30% of requests return 5xx). For AI Gateway workloads, consider also checking latency:

expression: ResponseCodeRatio(500, 600, 0, 600) > 0.30 || LatencyAtQuantileMS(95.0) > 10000

This trips the circuit breaker when either 30% of requests fail OR 95th percentile latency exceeds 10 seconds.

See the CircuitBreaker middleware reference for complete configuration options.

Configuration Reference

Failover Errors

Field	Description	Default
`errors.status`	List of HTTP status codes or ranges that trigger failover. Supports single codes (`"500"`) and ranges (`"500-504"`).	None (failover only triggers on health check)
`errors.maxRequestBodyBytes`	Maximum request body size (in bytes) to buffer for replay to the fallback service. Set to `-1` for no limit.	`-1`

Request Body Buffering and DoS Protection

When errors is configured, failover buffers the entire request body in memory so it can replay the request to the fallback service. If the body exceeds maxRequestBodyBytes, the gateway returns 413 Request Entity Too Large instead of attempting failover.

Security consideration: Always set maxRequestBodyBytes to a reasonable limit for your use case. The default value of -1 (unlimited) can expose your gateway to denial-of-service attacks where malicious clients send extremely large payloads to exhaust memory. For AI Gateway workloads, set this based on your maximum expected prompt size (for example, 2 MB for typical chat completions, higher for document processing with vision models).

Service Middlewares

Field	Description
`middlewares`	List of middleware references to apply to this service. Each entry uses `name` (and optionally `namespace`) to reference a CRD Middleware.

Read the Chat Completion middleware documentation.
Read the Traefik failover service reference.
Read the Traefik TraefikService CRD reference.

How It Works​

Failover Triggers​

Prerequisites​

Configuration Example​

Step 1: Define Kubernetes Resources​

Step 2: Configure the Failover Chain​

Step 3: Configure the IngressRoute​

Step 4: Test the Failover Chain​

Circuit Breaker Integration​

How It Works​

Configuration Example​

Testing Circuit Breaker Behavior​

Configuration Reference​

Failover Errors​

Service Middlewares​

Related Content​

How It Works

Failover Triggers

Prerequisites

Configuration Example

Step 1: Define Kubernetes Resources

Step 2: Configure the Failover Chain

Step 3: Configure the IngressRoute

Step 4: Test the Failover Chain

Circuit Breaker Integration

How It Works

Configuration Example

Testing Circuit Breaker Behavior

Configuration Reference

Failover Errors

Service Middlewares

Related Content