Responses API
The Responses API middleware promotes any route to an OpenAI-compatible Responses API endpoint. It adds GenAI metrics, central governance of model/parameters, and (optionally) lets clients override them.
Key Features and Benefits
- One-line enablement: attach middleware, your route becomes an AI Responses API endpoint.
- Governance: lock or allow overrides for
model,temperature,topP, tools, and more. - Metrics: emits OpenTelemetry GenAI spans and counters.
- Tool control: configure and limit the number of tools clients can use.
- Works for local or cloud models: All you need is a Kubernetes
Servicepointing at the upstream host.
Requirements
-
You must have AI Gateway enabled:
helm upgrade traefik traefik/traefik -n traefik --wait \
--reset-then-reuse-values \
--set hub.aigateway.enabled=true -
If routing to a cloud LLM provider, define a Kubernetes
ExternalNameservice.
Model Compatibility
The middleware is designed for the OpenAI Responses API format. When routing to other providers:
- Parameter names may differ (e.g.,
maxOutputTokensvsmax_tokens) - Parameter limits vary by model and provider
- Tool support is provider-specific
For non-OpenAI providers, you may need to use a proxy service that translates between the Responses API format and your target provider's format.
How It Works
- Intercepts the request and validates it against the OpenAI Responses API schema.
- Applies governance by rewriting
model, param fields, orinstructionsif overrides are denied. - Starts a GenAI span and records the input tokens.
- Forwards the (possibly rewritten) request to the upstream LLM.
- Records usage metrics from the response (
model, input/output tokens, latency).
Configuration Example
- Middleware
- Secret
- IngressRoute
- Service
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: responsesapi
spec:
plugin:
responses-api:
token: urn:k8s:secret:ai-keys:openai-token
model: gpt-4o
allowModelOverride: false
allowParamsOverride: true
params:
temperature: 1
topP: 0.9
maxOutputTokens: 1024
maxToolCall: 20
store: true
tools:
- type: web_search
apiVersion: v1
kind: Secret
metadata:
name: ai-keys
type: Opaque
# Option 1: Plain text
stringData:
openai-token: sk-proj-XXXXX
# Option 2: Pre-base64 encoded data
# data:
# openai-token: c2stcHJvai1YWFhYWA==
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: openai-responses
spec:
routes:
- kind: Rule
match: Host(`ai.example.com`)
middlewares:
- name: responsesapi
services:
- name: openai
port: 443
passHostHeader: false
apiVersion: v1
kind: Service
metadata:
name: openai
spec:
type: ExternalName
externalName: api.openai.com
ports:
- port: 443
Configuration Options
| Field | Description | Required | Default |
|---|---|---|---|
token | URN of a Kubernetes Secret holding the API key (for example, urn:k8s:secret:<secretname>:<key>) | No | |
model | Default model to use (for example, gpt-4o, gpt-4-turbo) | Yes | |
allowModelOverride | true = clients may set the model field; false = middleware rewrites to model | No | auto (true if model empty, else false) |
allowParamsOverride | true = clients may override params; false = middleware enforces params | No | true |
instructions | System instructions to include in every request | No | |
params | Block containing default generation parameters | No | |
params.temperature | Sampling temperature between 0 and 2. Higher values make output more random | No | |
params.topP | Nucleus sampling parameter. An alternative to temperature sampling | No | |
params.maxOutputTokens | Maximum number of tokens to generate in the response (OpenAI Responses API format) | No | |
params.maxToolCall | Maximum number of tools that can be configured in a request. Requests exceeding this limit will be rejected | No | |
params.store | Whether to store the conversation for future reference (OpenAI feature) | No | |
params.tools | Array of tool configurations. Each tool must have a type field (for example, web_search, file_search, function) | No | |
params.tools[].type | Type of tool: web_search, file_search, code_interpreter, image_generation, function, mcp, etc. | Yes | |
params.tools[].name | Name of the tool (required for function type) | No | |
params.tools[].description | Description of what the tool does (for function type) | No | |
params.tools[].parameters | JSON Schema object describing the tool's parameters (for function type) | No |
Parameter Override Behavior
The middleware supports two modes for handling parameters:
Mode 1: Allow Parameters Override (allowParamsOverride: true)
When enabled (default), the middleware acts as a default value provider:
- If a client provides a value for a parameter, the client's value is used.
- If a client doesn't provide a value, the configured default is applied.
- Tools follow the same pattern: client-provided tools take precedence, configured tools are used as fallback.
spec:
plugin:
responses-api:
model: gpt-4o
allowParamsOverride: true # Clients can override
params:
temperature: 0.7
maxOutputTokens: 1000