Responses API

The Responses API middleware promotes any route to an OpenAI-compatible Responses API endpoint. It adds GenAI metrics, central governance of model/parameters, and (optionally) lets clients override them.

Key Features and Benefits

One-line enablement: attach middleware, your route becomes an AI Responses API endpoint.
Governance: lock or allow overrides for model, temperature, topP, tools, and more.
Metrics: emits OpenTelemetry GenAI spans and counters.
Tool control: configure and limit the number of tools clients can use.
Works for local or cloud models: All you need is a Kubernetes Service pointing at the upstream host.

Requirements

You must have AI Gateway enabled:

helm upgrade traefik traefik/traefik -n traefik --wait \
  --reset-then-reuse-values \
  --set hub.aigateway.enabled=true

If routing to a cloud LLM provider, define a Kubernetes ExternalName service.

Model Compatibility

The middleware is designed for the OpenAI Responses API format. When routing to other providers:

Parameter names may differ (e.g., maxOutputTokens vs max_tokens)
Parameter limits vary by model and provider
Tool support is provider-specific

For non-OpenAI providers, you may need to use a proxy service that translates between the Responses API format and your target provider's format.

How It Works

Intercepts the request and validates it against the OpenAI Responses API schema.
Applies governance by rewriting model, param fields, or instructions if overrides are denied.
Starts a GenAI span and records the input tokens.
Forwards the (possibly rewritten) request to the upstream LLM.
Records usage metrics from the response (model, input/output tokens, latency).

Configuration Example

Middleware
Secret
IngressRoute
Service

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: responsesapi
spec:
  plugin:
    responses-api:
      token: urn:k8s:secret:ai-keys:openai-token
      model: gpt-4o
      allowModelOverride: false
      allowParamsOverride: true
      params:
        temperature: 1
        topP: 0.9
        maxOutputTokens: 1024
        maxToolCall: 20
        store: true
        tools:
          - type: web_search

apiVersion: v1
kind: Secret
metadata:
  name: ai-keys
type: Opaque
# Option 1: Plain text
stringData:
  openai-token: sk-proj-XXXXX
# Option 2: Pre-base64 encoded data
# data:
#   openai-token: c2stcHJvai1YWFhYWA==

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: openai-responses
spec:
  routes:
    - kind: Rule
      match: Host(`ai.example.com`)
      middlewares:
        - name: responsesapi
      services:
        - name: openai
          port: 443
          passHostHeader: false

apiVersion: v1
kind: Service
metadata:
  name: openai
spec:
  type: ExternalName
  externalName: api.openai.com
  ports:
    - port: 443

Configuration Options

Field	Description	Required	Default
`token`	URN of a Kubernetes Secret holding the API key (for example, `urn:k8s:secret:<secretname>:<key>`)	No
`model`	Default model to use (for example, `gpt-4o`, `gpt-4-turbo`)	Yes
`allowModelOverride`	`true` = clients may set the `model` field; `false` = middleware rewrites to `model`	No	auto (true if `model` empty, else false)
`allowParamsOverride`	`true` = clients may override params; `false` = middleware enforces `params`	No	`true`
`instructions`	System instructions to include in every request	No
`params`	Block containing default generation parameters	No
`params.temperature`	Sampling temperature between 0 and 2. Higher values make output more random	No
`params.topP`	Nucleus sampling parameter. An alternative to temperature sampling	No
`params.maxOutputTokens`	Maximum number of tokens to generate in the response (OpenAI Responses API format)	No
`params.maxToolCall`	Maximum number of tools that can be configured in a request. Requests exceeding this limit will be rejected	No
`params.store`	Whether to store the conversation for future reference (OpenAI feature)	No
`params.tools`	Array of tool configurations. Each tool must have a `type` field (for example, `web_search`, `file_search`, `function`)	No
`params.tools[].type`	Type of tool: `web_search`, `file_search`, `code_interpreter`, `image_generation`, `function`, `mcp`, etc.	Yes
`params.tools[].name`	Name of the tool (required for `function` type)	No
`params.tools[].description`	Description of what the tool does (for `function` type)	No
`params.tools[].parameters`	JSON Schema object describing the tool's parameters (for `function` type)	No

Parameter Override Behavior

The middleware supports two modes for handling parameters:

Mode 1: Allow Parameters Override (`allowParamsOverride: true`)

When enabled (default), the middleware acts as a default value provider:

If a client provides a value for a parameter, the client's value is used.
If a client doesn't provide a value, the configured default is applied.
Tools follow the same pattern: client-provided tools take precedence, configured tools are used as fallback.

spec:
  plugin:
    responses-api:
      model: gpt-4o
      allowParamsOverride: true  # Clients can override
      params:
        temperature: 0.7
        maxOutputTokens: 1000

Mode 2: Force Parameters (`allowParamsOverride: false`)

When turned off, the middleware enforces the configured values:

All configured parameters override client values.
Clients cannot change these settings.
Useful for strict governance and cost control.

spec:
  plugin:
    responses-api:
      model: gpt-4o
      allowParamsOverride: false  # Enforce configured values
      params:
        temperature: 0.5
        maxOutputTokens: 500
        tools:
          - type: web_search

Model Override Behavior

The allowModelOverride setting controls whether clients can specify their own model:

allowModelOverride: true: Clients can use any model they specify. The configured model acts as a fallback when clients omit the model field entirely.
allowModelOverride: false (default when model is set): Clients must use the configured model. If they omit the model field or specify a different model, the configured model is used.

spec:
  plugin:
    responses-api:
      model: gpt-4o
      allowModelOverride: true  # Client can request gpt-4-turbo instead

Tool Control

The middleware provides governance over tool usage:

Configure Default Tools

Provide a list of tools that will be available by default:

params:
  tools:
    - type: web_search
    - type: file_search
    - type: function
      name: get_weather
      description: Get current weather for a location
      parameters:
        type: object
        properties:
          location:
            type: string
            description: City name
        required:
          - location

Tool-Specific Options

Different tool types support additional configuration options:

function tools: Support name, description, and parameters fields
mcp tools: Support extensive configuration options including server connections, authentication, and tool-specific parameters
Built-in tools (web_search, file_search, code_interpreter, image_generation): Have their own specific configuration options

Refer to the respective tool documentation for complete configuration options.

Limit Tool Count

Use maxToolCall to prevent clients from requesting too many tools:

params:
  maxToolCall: 5  # Maximum 5 tools per request
  tools:
    - type: web_search

When a client request exceeds this limit, it will be rejected with a 400 Bad Request response.

Request Body Size Limits

The middleware enforces a maximum request body size based on the AI Gateway configuration:

helm upgrade traefik traefik/traefik -n traefik --wait \
  --set hub.aigateway.enabled=true \
  --set hub.aigateway.maxRequestBodySize=10485760  # 10MB

Requests exceeding this size will receive a 413 Request Entity Too Large response.

Streaming Support

The middleware fully supports streaming responses. When a client sets "stream": true in the request, the response will be streamed back as server-sent events (SSE).

Metrics and Streaming

When streaming is enabled, the middleware will not record detailed usage metrics (token counts) since the full response is not buffered. Duration metrics will still be recorded.

Metrics

The middleware emits OpenTelemetry GenAI metrics when metrics are enabled.

Working with Other AI Middlewares

The Responses API middleware can be combined with other AI Gateway middlewares for enhanced functionality. See the Adapting AI Middlewares for Responses API guide for more details.

OpenAI Responses API vs Chat Completions

The Responses API is OpenAI's successor to the Chat Completions API, designed for agent-like applications:

Feature	Chat Completions	Responses API
Request format	`messages[]` array	`input` string + optional `instructions`
Response format	`choices[].message.content`	`output[]` array
Built-in tools	Function calling only	Web search, file search, code interpreter, image generation
State management	Client-managed	Server-managed (via `previous_response_id`).

Important

Streaming and Generic Middlewares: Content Guard, LLM Guard, and Semantic Cache do not support true streaming mode. When streaming is enabled, these middlewares wait for the complete response to arrive, process it, and then send the entire response as a single chunk to the client.
State Management: The middleware does not currently manage conversation state (previous_response_id). Each request is treated as stateless.
Metrics with Streaming: Token usage metrics are not recorded for streaming requests since the full response is not buffered.

Troubleshooting

Request Entity Too Large

Problem: Receiving 413 Request Entity Too Large errors.

Solution: Increase the AI Gateway's max request body size:

helm upgrade traefik traefik/traefik -n traefik --wait \
  --reset-then-reuse-values \
  --set hub.aigateway.maxRequestBodySize=20971520  # 20MB

Tool Count Exceeded

Problem: Receiving 400 Bad Request: Maximum X tools allowed, got Y.

Solution: Either reduce the number of tools in your request or increase maxToolCall:

params:
  maxToolCall: 50  # Increase limit

Model Override Denied

Problem: Client's model selection is being overridden.

Solution: Enable model override in the middleware:

spec:
  plugin:
    responses-api:
      model: gpt-4o
      allowModelOverride: true  # Allow client to choose model

Metrics Not Being Recorded

Problem: No metrics are being recorded.

Solution: Ensure:

AI Gateway is enabled with metrics
The request includes the required headers (Hub-App-Name, Hub-App-Id)
You're not using streaming mode (which doesn't record token metrics)

Next Steps

Learn how to use the Responses API with other middlewares in our Responses API Guide
Configure Content Guard to protect sensitive data
Set up Semantic Cache to reduce costs
Use LLM Guard for custom content policies

Key Features and Benefits​

Requirements​

How It Works​

Configuration Example​

Configuration Options​

Parameter Override Behavior​

Mode 1: Allow Parameters Override (allowParamsOverride: true)​

Mode 2: Force Parameters (allowParamsOverride: false)​

Model Override Behavior​

Tool Control​

Configure Default Tools​

Limit Tool Count​

Request Body Size Limits​

Streaming Support​

Metrics​

Working with Other AI Middlewares​

OpenAI Responses API vs Chat Completions​

Troubleshooting​

Next Steps​