Skip to main content

Traefik Hub AI Gateway

The AI Gateway adds a thin, dedicated control layer on top of Traefik Hub’s API Gateway. It focuses on chat‑completion traffic and other LLM calls and exposes a set of specialised middlewares you can attach to any route.

Why Traefik Hub AI Gateway?

ChallengeWhat the AI Gateway brings
Multiple provider SDKs, inconsistent request formatsRoute every request through one gateway using a single, OpenAI‑compatible JSON schema.
Model & parameter sprawlApply governance policies in one place (lock or allow overrides for model, temperature, topP, etc.).
High inference costs & duplicate callsUse the Semantic Cache middleware to reuse previous responses for semantically similar requests, cutting token spend and latency.
Safety, compliance, data‑loss preventionRun Content Guard rules on requests and responses before they reach the model.
Visibility into cost driversExport GenAI‑specific metrics (semconv) and traces to your existing backend.
Run several local models behind one endpointUse the Model(`<pattern>`) matcher to route by model field in the JSON body.

Two Personas, Two Modes

PersonaGoalTypical topology
API PublisherStrictly control which external LLMs and models consumers may call.Client → Hub AI Gateway → Provider Cloud
Model‑as‑a‑Service ProviderExpose all local models with all tunable options.Client → Hub API Gateway → Hub AI Gateway → Cluster of local models

The same AI Gateway binary serves both use‑cases. You decide the behaviour by attaching (or omitting) middlewares and by toggling model‑override settings.

How It Works

  1. Enable the AI Gateway feature. Set the dedicated flag in your Helm values (see Enabling the AI Gateway below).

  2. Attach AI middlewares to a route. A route becomes an “AI endpoint” as soon as you add at least one AI middleware—typically chat-completion.

  3. (Optional) Match on model before routing. Use the Model(`<pattern>`) matcher to direct traffic to different services inside the cluster.

Supported AI Providers

Traefik Hub works with any provider that exposes an OpenAI‑compatible chat‑completion endpoint. Verified providers include:

  • OpenAI
  • Azure OpenAI
  • Anthropic
  • Cohere (Compatibility API)
  • DeepSeek
  • Gemini
  • Mistral
  • Ollama
  • Qwen
  • Amazon Bedrock
  • Local models served via KServe, vLLM, or similar

Provider Compatibility Information

Here is a list of providers and their compatibility information:

ProviderCompatibility
CohereCompatibility API
AnthropicOpenAI SDK
GeminiOpenAI
OllamaOpenAI Compatibility
MistralAPI
Amazon BedrockAWS

Enabling the AI Gateway

To enable the AI Gateway, you can add the following to your Helm values or static configuration:

helm install traefik -n traefik --wait \
--set hub.aigateway.enabled=true \
--set hub.aigateway.maxRequestBodySize=10485760 # optional, default to 1MiB
  • hub.aigateway.enabled=true: turns on all AI features.
  • hub.aigateway.maxRequestBodySize: hard limit for the size of request bodies inspected by the gateway (protects against OOM/DoS attacks). Accepts a plain integer representing bytes, for example 10485760 for 10 MiB. The default value is 1048576 (1 MiB).

AI Middlewares

Traefik Hub AI Gateway ships three purpose‑built middlewares you can attach individually—or in combination—to any route:

MiddlewareWhat it does
Chat CompletionTurns any route into a chat-completion endpoint, adds GenAI metrics, and enforces (or lets clients override) model/parameter settings.
Semantic CacheStores and replays responses when a new request is semantically similar, cutting token spend and latency.
Content GuardInspects prompts and completions for policy violations before they reach or leave the model.

Each middleware is configured in its own Kubernetes Middleware resource and can be chained like any other Traefik Hub middleware.

Chaining

The three middlewares can be chained; order matters for streaming and compression (see individual docs).

Request‑Body Limits & Model Matcher Costs

The flag hub.aigateway.maxRequestBodySize protects Traefik Hub from oversized bodies and from requests that use chunked upload (Content‑Length: -1).

  • If the body exceeds the limit, the gateway returns 413 Payload Too Large.
  • If the request is chunked, the gateway returns 400 Bad Request.

Model(`<pattern>`) needs to read the body during the routing phase. For best performance:

  • Put more‑selective matchers (Host(), PathPrefix(), etc.) before Model() so non‑AI traffic is routed without parsing bodies.
  • Give routes that contain Model() a lower priority than normal API routes.
Model Matcher with Body Size Limits

When using Model() matchers with hub.aigateway.maxRequestBodySize, the behavior differs:

  • The Model matcher reads the entire request body to evaluate the pattern
  • If the body exceeds the size limit, the matcher treats it as "not matching" rather than returning 413
  • The request falls through to the next matching route without raising an error
  • This allows oversized requests to be handled by non-AI routes while protecting AI endpoints

Frequently Asked Questions

I’m seeing a log error that says chat completion middleware requires an AI gateway configuration. What does this mean?

This message means that the AI Gateway feature is not enabled for the Traefik Hub instance handling your request.

  • Helm users: add hub.aigateway=true in your values.yaml or helm install command.
  • Non-Kubernetes users: start Traefik Hub with the --hub.aigateway=true flag in the static configuration. Once the flag is active, restart Traefik Hub and the error will disappear.
How do I rotate an upstream provider's API key without downtime?

Update the token or apiKey reference in your middleware resource. Because this key is only used between Traefik Hub and the upstream LLM provider, no client-side changes are required. Existing applications keep calling the same Hub endpoint while the gateway transparently switches to the new credentials.

Can I monitor AI service performance?

Yes, Traefik AI Gateway integrates with OpenTelemetry to provide detailed metrics on token usage and operation durations. You can visualize these metrics using monitoring tools like Prometheus and Grafana.

grafana.com

AI Gateway metrics on Grafana

Can I still use the Content Guard Middleware on normal APIs?

Yes, as long as the AI Gateway flag is enabled. Attach the generic content-guard middleware to any route—even non‑AI traffic.

Does model‑based routing hurt performance?

Only the routes that include a Model() matcher trigger body parsing. Use narrow route prefixes and set priorities, so non‑AI traffic bypasses the matcher.