AI Gateway Failover
The AI Gateway failover service automatically switches between LLM providers when the primary returns an error. Use it to handle provider outages, rate limits (HTTP 429), and model unavailability without interrupting consumers.
Failover builds on two capabilities:
- Failover service with error detection -- triggers a fallback when the primary responds with a configured HTTP status code.
- Service-level middlewares -- each backend gets its own Chat Completion middleware so every provider independently handles model selection, token injection, and GenAI metrics.
How It Works
The failover chain is built from nested TraefikService resources.
Each backend references an ExternalName service with its own chat-completion middleware attached at the service level
(the middlewares field on the TraefikService, not on the route).
This means every provider independently handles model selection, token injection, and GenAI metrics --
so switching from OpenAI to Mistral transparently rewrites the model name, swaps the API key,
and records the correct provider in traces.
Failover Triggers
| Trigger | When it fires | Configuration |
|---|---|---|
| Status code | Primary responds with a status matching the status codes | failover.errors.status |
| Health check | Primary becomes unreachable based on periodic health probes | failover.healthCheck + loadBalancer.healthCheck |
This guide focuses on status-code-based failover.
Prerequisites
Before configuring failover, ensure you have:
-
Traefik Hub with AI Gateway enabled:
helm upgrade traefik traefik/traefik -n traefik --wait \--reset-then-reuse-values \--set hub.aigateway.enabled=true -
API keys for each LLM provider (OpenAI, Mistral, etc.)
-
A dedicated namespace for the AI Gateway resources:
kubectl create namespace ai-failover
Configuration Example
This example configures a 3-provider failover chain: OpenAI GPT-4o → Mistral Large → Mistral Small.
Step 1: Define Kubernetes Resources
- ExternalName Services
- Secrets
- Chat Completion Middlewares
apiVersion: v1
kind: Service
metadata:
name: openai-external
namespace: ai-failover
spec:
type: ExternalName
externalName: api.openai.com
ports:
- port: 443
targetPort: 443
---
apiVersion: v1
kind: Service
metadata:
name: mistral-external
namespace: ai-failover
spec:
type: ExternalName
externalName: api.mistral.ai
ports:
- port: 443
targetPort: 443
---
apiVersion: v1
kind: Service
metadata:
name: mistral-small-external
namespace: ai-failover
spec:
type: ExternalName
externalName: api.mistral.ai
ports:
- port: 443
targetPort: 443
apiVersion: v1
kind: Secret
metadata:
name: openai-key
namespace: ai-failover
type: Opaque
data:
token: <base64-encoded-openai-api-key>
---
apiVersion: v1
kind: Secret
metadata:
name: mistral-key
namespace: ai-failover
type: Opaque
data:
token: <base64-encoded-mistral-api-key>
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: chatcompletion-gpt4o
namespace: ai-failover
spec:
plugin:
chat-completion:
token: urn:k8s:secret:openai-key:token
model: gpt-4o
allowModelOverride: false
allowParamsOverride: true
params:
temperature: 1
topP: 1
maxTokens: 2048
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: chatcompletion-mistral-large
namespace: ai-failover
spec:
plugin:
chat-completion:
token: urn:k8s:secret:mistral-key:token
model: mistral-large-latest
allowModelOverride: false
allowParamsOverride: true
params:
temperature: 1
topP: 1
maxTokens: 2048
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: chatcompletion-mistral-small
namespace: ai-failover
spec:
plugin:
chat-completion:
token: urn:k8s:secret:mistral-key:token
model: mistral-small-latest
allowModelOverride: false
allowParamsOverride: true
params:
temperature: 1
topP: 1
maxTokens: 2048
Step 2: Configure the Failover Chain
Create nested TraefikService resources that define the failover order.
Each backend attaches its own chat-completion middleware so model selection, token injection, and metrics work independently per provider.
# Top-level failover: OpenAI GPT-4o → secondary failover
apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
name: failover-primary
namespace: ai-failover
spec:
failover:
service:
name: openai-external
port: 443
middlewares:
- name: chatcompletion-gpt4o
fallback:
name: failover-secondary
kind: TraefikService
errors:
status:
- "429"
- "500-504"
maxRequestBodyBytes: 2097152
---
# Secondary failover: Mistral Large → Mistral Small
apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
name: failover-secondary
namespace: ai-failover
spec:
failover:
service:
name: mistral-external
port: 443
middlewares:
- name: chatcompletion-mistral-large
fallback:
name: mistral-small-external
port: 443
middlewares:
- name: chatcompletion-mistral-small
errors:
status:
- "429"
- "500-504"
maxRequestBodyBytes: 2097152
Step 3: Configure the IngressRoute
Route traffic to the failover chain by referencing the top-level TraefikService:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: ai-failover
namespace: ai-failover
spec:
entryPoints:
- websecure
routes:
- kind: Rule
match: Host(`ai.example.com`) && Path(`/v1/chat/completions`)
services:
- name: failover-primary
kind: TraefikService
tls: {}
This example routes requests to /v1/chat/completions, which is the standard path for OpenAI and Mistral chat completion endpoints.
If you want to expose the failover service on a different path (for example, PathPrefix(/api/ai)),
you should attach a path rewrite middleware to each service in the failover chain.
This ensures the backend providers receive requests at their expected paths.
For example, add a replacePathRegex middleware to rewrite /api/ai/* to /v1/chat/completions
and attach it to the service-level middlewares field alongside the chat-completion middleware.
See the NVIDIA NIMs Integration guide
for a complete path rewriting example.
Step 4: Test the Failover Chain
Send a request to the failover endpoint:
curl -s -X POST "https://ai.example.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "Hello, which model are you?"
}
]
}' | jq .
The response comes from OpenAI GPT-4o (the primary).
Check the model field in the response to confirm which provider handled the request.
When the primary returns a status matching errors.status (for example, 429 or 500-504),
Traefik replays the request to the next provider in the chain automatically --
no client-side retry logic required.
Circuit Breaker Integration
Integrate circuit breakers with failover to implement cooldown periods that prevent the gateway from repeatedly trying unhealthy services.
When a circuit breaker trips (opens), it stops sending requests to that service for a configured duration (fallbackDuration),
allowing the service to recover before attempting requests again.
This is particularly useful for AI Gateway workloads where:
- A provider experiences an outage or rate limit
- You want to prevent cascading failures from overwhelming degraded services
- You need automatic recovery without manual intervention
How It Works
- Normal operation (circuit closed): Requests flow through the failover chain normally
- Service degrades: Circuit breaker detects errors/latency issues and trips (opens)
- Cooldown period: Circuit breaker immediately returns 503 for
fallbackDuration(e.g., 30s), triggering failover to the next provider - Recovery attempt: After cooldown, circuit breaker enters recovering state and gradually sends traffic back to the service
- Service healthy: Circuit breaker closes, resuming normal operation
Configuration Example
Attach circuit breaker middlewares to each service in the failover chain to protect against cascading failures:
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: circuit-breaker-primary
namespace: ai-failover
spec:
circuitBreaker:
expression: ResponseCodeRatio(500, 600, 0, 600) > 0.30 || NetworkErrorRatio() > 0.10
checkPeriod: 5s
fallbackDuration: 30s # Cooldown period - don't retry for 30s
recoveryDuration: 20s # Gradually restore traffic over 20s
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: circuit-breaker-secondary
namespace: ai-failover
spec:
circuitBreaker:
expression: ResponseCodeRatio(500, 600, 0, 600) > 0.30 || NetworkErrorRatio() > 0.10
checkPeriod: 5s
fallbackDuration: 30s
recoveryDuration: 20s
Then attach these to your failover services before the chat-completion middleware:
apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
name: failover-primary
namespace: ai-failover
spec:
failover:
service:
name: openai-external
port: 443
middlewares:
- name: circuit-breaker-primary # Circuit breaker first
- name: chatcompletion-gpt4o
fallback:
name: failover-secondary
kind: TraefikService
errors:
status:
- "429"
- "500-504"
maxRequestBodyBytes: 2097152
---
apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
name: failover-secondary
namespace: ai-failover
spec:
failover:
service:
name: mistral-external
port: 443
middlewares:
- name: circuit-breaker-secondary # Circuit breaker first
- name: chatcompletion-mistral-large
fallback:
name: mistral-small-external
port: 443
middlewares:
- name: chatcompletion-mistral-small # No circuit breaker on last fallback
errors:
status:
- "429"
- "500-504"
maxRequestBodyBytes: 2097152
Testing Circuit Breaker Behavior
Simulate a service outage to observe the circuit breaker tripping and recovery:
- Trigger circuit breaker on primary (e.g., by blocking the OpenAI endpoint or simulating 500 errors)
- Observe immediate failover to Mistral Large (secondary) - no retries to OpenAI during cooldown
- Trigger circuit breaker on secondary (same technique)
- Observe failover to Mistral Small (tertiary)
- Wait for recovery - after 30s cooldown + 20s recovery, services gradually come back online in reverse order
This ensures smooth degradation and recovery without overwhelming unhealthy services with retry traffic.
The example uses ResponseCodeRatio(500, 600, 0, 600) > 0.30 (trip when 30% of requests return 5xx).
For AI Gateway workloads, consider also checking latency:
expression: ResponseCodeRatio(500, 600, 0, 600) > 0.30 || LatencyAtQuantileMS(95.0) > 10000
This trips the circuit breaker when either 30% of requests fail OR 95th percentile latency exceeds 10 seconds.
See the CircuitBreaker middleware reference for complete configuration options.
Configuration Reference
Failover Errors
| Field | Description | Default |
|---|---|---|
errors.status | List of HTTP status codes or ranges that trigger failover. Supports single codes ("500") and ranges ("500-504"). | None (failover only triggers on health check) |
errors.maxRequestBodyBytes | Maximum request body size (in bytes) to buffer for replay to the fallback service. Set to -1 for no limit. | -1 |
When errors is configured, failover buffers the entire request body in memory
so it can replay the request to the fallback service.
If the body exceeds maxRequestBodyBytes, the gateway returns 413 Request Entity Too Large
instead of attempting failover.
Security consideration: Always set maxRequestBodyBytes to a reasonable limit for your use case.
The default value of -1 (unlimited) can expose your gateway to denial-of-service attacks
where malicious clients send extremely large payloads to exhaust memory.
For AI Gateway workloads, set this based on your maximum expected prompt size
(for example, 2 MB for typical chat completions, higher for document processing with vision models).
Service Middlewares
| Field | Description |
|---|---|
middlewares | List of middleware references to apply to this service. Each entry uses name (and optionally namespace) to reference a CRD Middleware. |
Related Content
- Read the Chat Completion middleware documentation.
- Read the Traefik failover service reference.
- Read the Traefik TraefikService CRD reference.
