Adapting AI Middlewares for Responses API

This guide demonstrates how to configure Traefik Hub's AI Gateway middlewares to work with the OpenAI Responses API format. While these middlewares were originally designed for generic HTTP/JSON payloads, they can be adapted to handle the Responses API's specific request and response structure.

Overview

The OpenAI Responses API uses a different request/response format compared to the Chat Completions API:

Aspect	Chat Completions	Responses API
Request input	`messages[]` array	`input` string + optional `instructions`
Response output	`choices[].message.content`	`output[]` array with structured items
Request path	`/v1/chat/completions`	`/v1/responses`

To work with this format, you need to configure your AI middlewares to target the correct JSON paths in requests and responses.

Prerequisites

Before starting, ensure you have:

AI Gateway enabled in your Traefik Hub installation:

helm upgrade traefik traefik/traefik -n traefik --wait \
  --reset-then-reuse-values \
  --set hub.aigateway.enabled=true

An OpenAI API key stored in a Kubernetes Secret:

apiVersion: v1
kind: Secret
metadata:
  name: ai-keys
  namespace: apps
type: Opaque
stringData:
  openai-token: sk-proj-XXXXX

An ExternalName service pointing to OpenAI:

apiVersion: v1
kind: Service
metadata:
  name: openai
  namespace: apps
spec:
  type: ExternalName
  externalName: api.openai.com
  ports:
    - port: 443

Adapting Content Guard

The Content Guard middleware detects and masks PII in requests and responses using JSON path queries. For Responses API, configure it to inspect the input and instructions fields, as the Responses API uses a flat structure instead of the nested messages[].content array used in Chat Completion.

Configuration

Content Guard Middleware
Responses API Middleware
IngressRoute

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: content-guard-responses
  namespace: apps
spec:
  plugin:
    content-guard:
      engine:
        presidio:
          host: http://presidio-analyzer.presidio.svc.cluster.local:5002
          language: en
          timeout: 30s
      request:
        rules:
          # Rule 1: Detect and mask PII in all request fields
          - entities:
              - PERSON
              - EMAIL_ADDRESS
              - PHONE_NUMBER
              - SSN
              - CREDIT_CARD
            block: false
            mask:
              char: "*"
              unmaskFromLeft: 2
              unmaskFromRight: 2

          # Rule 2: Specifically target input and instructions fields
          - jsonPaths:
              - .input
              - .instructions
            entities:
              - EMAIL_ADDRESS
              - PHONE_NUMBER
            block: false
            mask:
              char: "X"
      response:
        rules:
          # Mask PII in response output
          - entities:
              - PERSON
              - EMAIL_ADDRESS
              - PHONE_NUMBER
            block: false
            mask:
              char: "*"
              unmaskFromLeft: 1
              unmaskFromRight: 1

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: responsesapi
  namespace: apps
spec:
  plugin:
    responses-api:
      token: urn:k8s:secret:ai-keys:openai-token
      model: gpt-4o-2024-05-13
      allowModelOverride: false
      allowParamsOverride: true
      params:
        temperature: 0.7
        maxOutputTokens: 1024

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: responses-with-content-guard
  namespace: apps
spec:
  entryPoints:
    - websecure
  routes:
    - kind: Rule
      match: Host(`ai.localhost`) && Path(`/v1/responses`)
      services:
        - name: openai
          port: 443
          passHostHeader: false
      middlewares:
        - name: content-guard-responses
        - name: responsesapi

Path Rewriting

If your upstream service uses a different path (for example, /v1/chat/completions instead of /v1/responses), you'll need to add a path rewriting middleware:

# First, create the middleware
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: replacepath-responses
spec:
  replacePath:
    path: "/v1/responses"  # Rewrite to the correct upstream path

# Then reference it in your IngressRoute
middlewares:
  - name: replacepath-responses

Key Configuration Points

jsonPaths: Set to [".input", ".instructions"] to target Responses API request fields.
Multiple Rules: You can have a global rule for all fields and specific rules for certain JSON paths.
Engine Setup: Presidio must be deployed. See Presidio documentation for setup instructions.

Testing

Create a test request with PII:

curl -X POST https://ai.localhost/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "say the following back to me: My email is [email protected] and phone is 555-123-4567"
  }'

The middleware will mask the PII before forwarding to the LLM and the following response will be returned:

[
  {
    // ...
    "output": [
      {
        "type": "message",
        "text": "My email is XXXXXXXXXXXXXXXXXXXX and phone is XXXXXXXXXXXX"
      }
    ],
    "role": "assistant"
  }
]
//...
}

You can verify the PII is masked by checking the response body.

Adapting LLM Guard

The LLM Guard middleware performs custom content analysis using external services or LLMs. For Responses API, adjust the template to extract the input field.

Configuration

This example uses a sentiment analysis service to block negative content:

LLM Guard Middleware
Responses API Middleware
IngressRoute

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: sentiment-guard-responses
  namespace: apps
spec:
  plugin:
    llm-guard-custom:
      endpoint: http://sentiment-analyzer.apps.svc.cluster.local:5000/predict
      clientConfig:
        timeout: 30s
        headers:
          Content-Type: application/json
      request:
        # Extract the input field for sentiment analysis
        template: '{"text":"{{ .input }}"}'
        # Block if negative sentiment exceeds 60%
        blockConditions:
          - condition: 'JSONGt(".predictions[0].NEGATIVE", 0.6)'
            reason: 'Negative sentiment detected'

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: responsesapi
  namespace: apps
spec:
  plugin:
    responses-api:
      token: urn:k8s:secret:ai-keys:openai-token
      model: gpt-4o-2024-05-13
      allowModelOverride: false
      allowParamsOverride: true
      params:
        temperature: 0.7
        maxOutputTokens: 1024

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: responses-with-sentiment-guard
  namespace: apps
spec:
  entryPoints:
    - websecure
  routes:
    - kind: Rule
      match: Host(`ai.localhost`) && Path(`/v1/responses`)
      services:
        - name: openai
          port: 443
          passHostHeader: false
      middlewares:
        - name: sentiment-guard-responses
        - name: responsesapi

Key Configuration Points

template: Use {{ .input }} to extract the input field from the Responses API request.
blockCondition: Define conditions using JSON query syntax to determine when to block requests.
External Service: You need to deploy your own sentiment analysis or content filtering service.

Example Sentiment Analyzer

Here's a Python sentiment analyzer that you can deploy in Kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sentiment-analyzer
  namespace: sentiment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sentiment-analyzer
  template:
    metadata:
      labels:
        app: sentiment-analyzer
    spec:
      containers:
        - name: sentiment-analyzer
          image: newa/sentiment-analyzer:v1.0.0
          ports:
            - containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
  name: sentiment-analyzer
  namespace: sentiment
spec:
  selector:
    app: sentiment-analyzer
  ports:
    - port: 5000
      targetPort: 5000

The sentiment analyzer service uses the DistilBERT Multilingual Sentiment Analysis Service Model: lxyuan/distilbert-base-multilingual-cased-sentiments-student

Testing

Test with negative content:

curl -k -X POST https://ai.localhost/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "I hate everything and everyone"
  }'

The middleware will block this request with a Forbidden response.

Test with positive content:

curl -k -X POST https://ai.localhost/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "I love learning new things"
  }'

This request will be allowed through.

Adapting Semantic Cache

The Semantic Cache middleware caches responses based on semantic similarity. For Responses API, use the generic semantic-cache plugin (not the chat-specific variant) and configure a contentTemplate.

Configuration

Semantic Cache Middleware
Responses API Middleware
IngressRoute

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: semantic-cache-responses
  namespace: apps
spec:
  plugin:
    semantic-cache:
      # Use generic semantic-cache plugin
      vectorDB:
        redis:
          endpoints:
            - redis-stack.apps.svc.cluster.local:6379
          database: 0
          collectionName: ai_responses_cache
          maxDistance: 0.6
          ttl: 3600  # 1 hour
      vectorizer:
        openai:
          model: text-embedding-3-small
          token: urn:k8s:secret:ai-keys:openai-token
          dimensions: 1536
      readOnly: false
      allowBypass: true
      # Extract input and instructions for cache key
      contentTemplate: '{{ .input }} {{ .instructions }}'

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: responsesapi
  namespace: apps
spec:
  plugin:
    responses-api:
      token: urn:k8s:secret:ai-keys:openai-token
      model: gpt-4o
      allowModelOverride: false
      allowParamsOverride: true
      params:
        temperature: 0.7
        maxOutputTokens: 1024

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: responses-with-cache
  namespace: apps
spec:
  entryPoints:
    - websecure
  routes:
    - kind: Rule
      match: Host(`ai.localhost`) && Path(`/v1/responses`)
      services:
        - name: openai
          port: 443
          passHostHeader: false
      middlewares:
        - name: responsesapi
        - name: semantic-cache-responses

Key Configuration Points

Plugin Name: Use semantic-cache (generic) not chat-completion-semantic-cache.
contentTemplate: Extract text from input and instructions fields using Go template syntax.
Separate Database: Use a different database number or collection name to isolate Responses API cache from other caches.
Vector Database: Deploy Redis Stack, Milvus, or Weaviate for vector storage.

Setting Up Redis Stack

Deploy Redis Stack to Kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-stack
  namespace: apps
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-stack
  template:
    metadata:
      labels:
        app: redis-stack
    spec:
      containers:
        - name: redis-stack
          image: redis/redis-stack:latest
          ports:
            - containerPort: 6379
          volumeMounts:
            - name: redis-data
              mountPath: /data
      volumes:
        - name: redis-data
          emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: redis-stack
  namespace: apps
spec:
  selector:
    app: redis-stack
  ports:
    - port: 6379
      targetPort: 6379

Testing

Make the same request twice:

First request (cache miss):

curl -k -X POST https://ai.localhost/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "What is the capital of France?"
  }' -i

In the response headers you will see:

X-Cache-Status: Miss

Second request (cache hit):

curl -k -X POST https://ai.localhost/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "What is the capital of France?"
  }' -i

In the response headers you will see:

X-Cache-Distance: 0.000000
X-Cache-Status: Hit

The second request is served from cache, significantly faster and without consuming LLM tokens. The first request is not served from cache so it is a cache miss.

Complete Integration Example

Here's a complete example integrating all three middlewares with the Responses API:

All Middlewares
IngressRoute

---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: content-guard-responses
  namespace: apps
spec:
  plugin:
    content-guard:
      engine:
        presidio:
          host: http://presidio-analyzer.presidio.svc.cluster.local:5002
      request:
        rules:
          - jsonPaths:
              - .input
              - .instructions
            entities:
              - EMAIL_ADDRESS
              - PHONE_NUMBER
              - SSN
            mask:
              char: "*"
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: sentiment-guard-responses
  namespace: apps
spec:
  plugin:
    llm-guard-custom:
      endpoint: http://sentiment-analyzer.apps.svc.cluster.local:5000/predict
      clientConfig:
        timeout: 30s
        headers:
          Content-Type: application/json
      request:
        template: '{"text":"{{ .input }}"}'
        blockConditions:
          - condition: 'JSONGt(".predictions[0].NEGATIVE", 0.8)'
            reason: 'Negative sentiment detected'
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: responsesapi
  namespace: apps
spec:
  plugin:
    responses-api:
      token: urn:k8s:secret:ai-keys:openai-token
      model: gpt-4o
      allowModelOverride: false
      params:
        temperature: 0.7
        maxOutputTokens: 1024
        tools:
          - type: web_search
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: semantic-cache-responses
  namespace: apps
spec:
  plugin:
    semantic-cache:
      vectorDB:
        redis:
          endpoints:
            - redis-stack.apps.svc.cluster.local:6379
          database: 0
          collectionName: ai_responses_cache
      vectorizer:
        openai:
          model: text-embedding-3-small
          token: urn:k8s:secret:ai-keys:openai-token
      contentTemplate: '{{ .input }} {{ .instructions }}'

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: responses-complete-stack
  namespace: apps
spec:
  entryPoints:
    - websecure
  routes:
    - kind: Rule
      match: Host(`ai.localhost`) && Path(`/v1/responses`)
      services:
        - name: openai
          port: 443
          passHostHeader: false
      middlewares:
        # Order matters!
        - name: content-guard-responses      # 1. Mask PII first
        - name: sentiment-guard-responses    # 2. Block negative content
        - name: responsesapi                 # 3. Apply governance + metrics
        - name: semantic-cache-responses     # 4. Cache responses

Middleware Order

The order of middlewares is critical:

Content Guard: Masks PII before any other processing
LLM Guard: Analyzes and potentially blocks content
Responses API: Applies governance and records metrics
Semantic Cache: Caches the final response

Middleware Order Matters

In most cases, you want to place Content Guard and LLM Guard before the Responses API middleware to inspect the original request. Place Semantic Cache after to cache the final response.

However, middleware order depends on your use case:

Request-only checks: Place guards before the Semantic Cache middleware
Response-only checks: Place guards after the Semantic Cache middleware
Comprehensive protection: Use multiple guards around the cache (request checks → cache → response checks)

Testing the Complete Stack

Make a request with PII and sentiment:

curl -X POST https://ai.localhost/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Hi, my email is [email protected]. Can you help me with something?"
  }' -v

Expected behavior:

Email address is masked: j***@***.***
Sentiment is checked (positive, so allowed)
Model is set to gpt-4o (governance applied)
Metrics are recorded
Response is cached for future similar requests

Comparison: Chat Completions vs Responses API

Here's a side-by-side comparison of middleware configurations:

Chat Completions
Responses API

# Content Guard for Chat Completions
spec:
  plugin:
    chat-completion-content-guard:  # Chat-specific variant
      engine:
        presidio:
          host: http://presidio-analyzer.apps.svc.cluster.local:5002
      request:
        rules:
          - entities: [EMAIL_ADDRESS, PHONE_NUMBER]
            mask:
              char: "*"

# Semantic Cache for Chat Completions
spec:
  plugin:
    chat-completion-semantic-cache:  # Chat-specific variant
      ignoreSystem: false
      ignoreAssistant: true
      messageHistory: 5
      vectorDB:
        redis:
          endpoints: [redis-stack.apps.svc.cluster.local:6379]
          collectionName: ai_chat_cache
      vectorizer:
        openai:
          model: text-embedding-3-small
          token: urn:k8s:secret:ai-keys:openai-token

# LLM Guard for Chat Completions
spec:
  plugin:
    llm-guard-custom:
      endpoint: http://analyzer.apps.svc.cluster.local:5000/predict
      request:
        template: '{"text":"{{ (index .messages 0).content }}"}'

# Content Guard for Responses API
spec:
  plugin:
    content-guard:  # Generic variant
      engine:
        presidio:
          host: http://presidio-analyzer.apps.svc.cluster.local:5002
      request:
        rules:
          - jsonPaths: [.input, .instructions]  # Explicit paths
            entities: [EMAIL_ADDRESS, PHONE_NUMBER]
            mask:
              char: "*"

# Semantic Cache for Responses API
spec:
  plugin:
    semantic-cache:  # Generic variant
      contentTemplate: '{{ .input }} {{ .instructions }}'  # Explicit template
      vectorDB:
        redis:
          endpoints:
            - redis-stack.apps.svc.cluster.local:6379
          database: 0
          collectionName: ai_responses_cache
      vectorizer:
        openai:
          model: text-embedding-3-small
          token: urn:k8s:secret:ai-keys:openai-token

# LLM Guard for Responses API
spec:
  plugin:
    llm-guard-custom:
      endpoint: http://analyzer.apps.svc.cluster.local:5000/predict
      request:
        template: '{"text":"{{ .input }}"}'  # Different extraction

Streaming Support

Content Guard, LLM Guard, and Semantic Cache do not support true streaming mode. When using "stream": true in Responses API requests:

These middlewares wait for the complete response to arrive, process it, and then send the entire response as a single chunk to the client
The client expects a stream but receives the processed response as one chunk after all processing is complete
Token usage metrics will not be recorded for streaming requests

Workaround: Use separate routes for streaming and non-streaming requests:

# Non-streaming with full middleware stack
- kind: Rule
  match: Host(`ai.localhost`) && Path(`/v1/responses/standard`)
  middlewares:
    - name: content-guard-responses
    - name: responsesapi
    - name: semantic-cache-responses

# Streaming with minimal middlewares
- kind: Rule
  match: Host(`ai.localhost`) && Path(`/v1/responses/stream`)
  middlewares:
    - name: responsesapi

Troubleshooting

Content Guard Not Masking PII

PII is not being masked in requests.

Solutions:

Verify Presidio is running and accessible:
```
kubectl get pods -n presidio
```

Check the JSON paths are correct:

jsonPaths:
  - .input        # Not .messages
  - .instructions

Enable debug logging in Traefik to see middleware execution:

helm upgrade traefik traefik/traefik -n traefik --wait \
  --reset-then-reuse-values \
  --set "additionalArguments={--log.level=DEBUG}"

Test Presidio directly to verify it's working:

curl -X POST http://presidio-analyzer.apps.svc.cluster.local:5002/analyze \
  -H "Content-Type: application/json" \
  -d '{"text":"My email is [email protected]","language":"en"}'

LLM Guard Template Errors

Receiving errors about template execution or the guard service is not being called.

Solutions:

Verify the template syntax uses the correct field for Responses API:

# ✅ Correct - Use .input for Responses API
template: '{"text":"{{ .input }}"}'

# ❌ Incorrect - Don't use .messages (that's for Chat Completions)
template: '{"text":"{{ (index .messages 0).content }}"}'

Test the external service directly to ensure it's accessible:

curl -X POST http://sentiment-analyzer.apps.svc.cluster.local:5000/predict \
  -H "Content-Type: application/json" \
  -d '{"text":"test message"}'

Check service name resolution in your cluster:

kubectl get svc -n sentiment
kubectl get endpoints -n sentiment

Verify the template output matches what your service expects by enabling debug logging

Semantic Cache Not Working

All requests show X-Cache-Status: Miss or cache is not being populated.

Solutions:

Verify your vector database is running with vector support. For example, you can run the following command to check if Redis Stack is running with it:
```
kubectl exec -it redis-stack-0 -n redis -- redis-cli
> MODULE LIST
# Should show "search" module loaded
```

Check the contentTemplate extracts content correctly:

# ✅ Correct - Extract both input and instructions
contentTemplate: '{{ .input }} {{ .instructions }}'

# ❌ Incorrect - Chat Completions syntax
contentTemplate: '{{ .messages }}'

Verify the vectorizer is accessible and has valid credentials:

# Test OpenAI vectorizer connectivity
kubectl get secret ai-keys -n apps -o jsonpath='{.data.openai-token}' | base64 -d

Check Redis logs for any errors:
```
kubectl logs -n redis redis-stack-0
```

Verify the vector database configuration:

vectorDB:
  redis:
    endpoints:
      - redis-stack.apps.svc.cluster.local:6379  # Full FQDN
    database: 0
    collectionName: ai_responses_cache  # Unique collection name

Test cache manually by making the same request twice and checking response headers

Wrong Middleware Order

Middlewares not behaving as expected or PII is being sent to LLM/cache.

Solution:

Ensure correct middleware order in IngressRoute. Order matters because:

Content Guard must run first to mask PII before other middlewares see it
LLM Guard should run after PII masking but before the API middleware
Responses API middleware applies governance
Semantic Cache should run last to cache the final response

middlewares:
  - name: content-guard-responses      # 1. Mask PII first
  - name: sentiment-guard-responses    # 2. Block negative content
  - name: responsesapi                 # 3. Apply governance + metrics
  - name: semantic-cache-responses     # 4. Cache final response

Test the order:

Make a request with PII:

curl -X POST https://ai.localhost/v1/responses \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","input":"My email is [email protected]"}'

Check Traefik logs to see middleware execution order:

kubectl logs -n traefik -l app.kubernetes.io/name=traefik --tail=100

Streaming Requests Not Working

Streaming responses are not being returned or middlewares are blocking streaming.

Issue:

Content Guard, LLM Guard, and Semantic Cache do not support true streaming mode. When "stream": true is set, these middlewares wait for the complete response, process it, and send it as a single chunk to the client.

Solution:

Create separate routes for streaming and non-streaming requests:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: responses-routes
  namespace: apps
spec:
  entryPoints:
    - websecure
  routes:
    # Non-streaming with full middleware stack
    - kind: Rule
      match: Host(`ai.localhost`) && Path(`/v1/responses`)
      services:
        - name: openai
          port: 443
      middlewares:
        - name: content-guard-responses
        - name: sentiment-guard-responses
        - name: responsesapi
        - name: semantic-cache-responses

    # Streaming with minimal middlewares (only governance)
    - kind: Rule
      match: Host(`ai.localhost`) && Path(`/v1/responses/stream`)
      services:
        - name: openai
          port: 443
      middlewares:
        - name: responsesapi  # Only governance, no content processing

Clients should use different endpoints based on their needs:

Standard requests: POST /v1/responses
Streaming requests: POST /v1/responses/stream

Responses API Middleware Configuration Errors

Configuration validation errors or unexpected behavior from the Responses API middleware.

Common issues:

Model override not working:

# ✅ Correct - Allow clients to override model
spec:
  plugin:
    responses-api:
      model: gpt-4o
      allowModelOverride: true  # Clients can specify their own model

# ❌ Incorrect - Model is enforced
spec:
  plugin:
    responses-api:
      model: gpt-4o
      allowModelOverride: false  # Client's model is ignored

Too many tools error:

If receiving "Maximum X tools allowed, got Y" errors:

params:
  maxToolCall: 20  # Increase limit or remove tools from request
  tools:
    - type: web_search

Missing token:

Ensure the token secret exists and is referenced correctly:

# Check secret exists
kubectl get secret ai-keys -n apps

# Check secret content
kubectl get secret ai-keys -n apps -o yaml

Reference in middleware:

token: urn:k8s:secret:ai-keys:openai-token

Next Steps

Learn more about the Responses API middleware configuration options
Explore Content Guard for PII protection
Set up Semantic Cache to reduce costs
Configure LLM Guard for custom policies
Review OpenTelemetry metrics for monitoring

Overview​

Prerequisites​

Adapting Content Guard​

Configuration​

Key Configuration Points​

Testing​

Adapting LLM Guard​

Configuration​

Key Configuration Points​

Example Sentiment Analyzer​

Testing​

Adapting Semantic Cache​

Configuration​

Key Configuration Points​

Setting Up Redis Stack​

Testing​

Complete Integration Example​

Middleware Order​

Testing the Complete Stack​

Comparison: Chat Completions vs Responses API​

Streaming Support​

Troubleshooting​

Next Steps​

Overview

Prerequisites

Adapting Content Guard

Configuration

Key Configuration Points

Testing

Adapting LLM Guard

Configuration

Key Configuration Points

Example Sentiment Analyzer

Testing

Adapting Semantic Cache

Configuration

Key Configuration Points

Setting Up Redis Stack

Testing

Complete Integration Example

Middleware Order

Testing the Complete Stack

Comparison: Chat Completions vs Responses API

Streaming Support

Troubleshooting

Next Steps