CircuitBreaker
The HTTP circuit breaker prevents stacking requests to unhealthy Services, resulting in cascading failures.
When your system is healthy, the circuit is closed (normal operations). When your system becomes unhealthy, the circuit opens, and the requests are no longer forwarded, but instead are handled by a fallback mechanism.
To assess if your system is healthy, the circuit breaker constantly monitors the services.
The CircuitBreaker only analyzes what happens after its position within the middleware chain. What happens before has no impact on its state.
Each router gets its own instance of a given circuit breaker. One circuit breaker instance can be open while the other remains closed: their state is not shared.
This is the expected behavior, we want you to be able to define what makes a service healthy without having to declare a circuit breaker for each route.
Configuration Example
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: latency-check
spec:
circuitBreaker:
expression: LatencyAtQuantileMS(50.0) > 100
Configuration Options
| Field | Description | Default | Required |
|---|---|---|---|
expression | Condition to open the circuit breaker and applies the fallback mechanism instead of calling your services. More information here | 100ms | No |
checkPeriod | The interval between successive checks of the circuit breaker condition (when in standby state). | 100ms | No |
fallbackDuration | The duration for which the circuit breaker will wait before trying to recover (from a tripped state). | 10s | No |
recoveryDuration | The duration for which the circuit breaker will try to recover (as soon as it is in recovering state). | 10s | No |
responseCode | The status code that the circuit breaker will return while it is in the open state. | 503 | No |
expression
The expression option can check three different metrics:
| Metrics | Description | Example |
|---|---|---|
NetworkErrorRatio | The network error ratio to open the circuit breaker. | NetworkErrorRatio() > 0.30 opens the circuit breaker at a 30% ratio of network errors |
ResponseCodeRatio | The status code ratio to open the circuit breaker. More information below | ResponseCodeRatio(500, 600, 0, 600) > 0.25 opens the circuit breaker if 25% of the requests returned a 5XX status (amongst the request that returned a status code from 0 to 5XX) |
LatencyAtQuantileMS | The latency at a quantile in milliseconds to open the circuit breaker when a given proportion of your requests become too slow. Only floating point number (with the trailing .0) for the quantile value. | LatencyAtQuantileMS(50.0) > 100 opens the circuit breaker when the median latency (quantile 50) reaches 100ms. |
ResponseCodeRatio
ResponseCodeRatio(from, to, dividedByFrom, dividedByTo) calculates the ratio of error responses to total responses by counting HTTP status codes within specified ranges.
Parameters (all are HTTP status codes):
from: Start of the error code range to count (inclusive)to: End of the error code range to count (exclusive)dividedByFrom: Start of the total response range for the denominator (inclusive)dividedByTo: End of the total response range for the denominator (exclusive)
How it works:
The circuit breaker tracks counters for all HTTP status codes during each checkPeriod. The ratio is calculated as:
count(codes in [from, to)) / count(codes in [dividedByFrom, dividedByTo))
If the denominator is 0 (no responses in that range), the function returns 0.
Example: ResponseCodeRatio(500, 600, 0, 600) > 0.30
- Numerator: Count responses with status codes 500-599 (5XX errors)
- Denominator: Count all responses with status codes 0-599 (all responses including 2XX, 3XX, 4XX, 5XX)
- Trips when: 30% or more of all requests return a 5XX error
This means if you received 100 requests total, and 30 or more returned 5XX errors, the circuit breaker opens.
Using Multiple Metrics
You can combine multiple metrics using operators in your expression.
Supported operators are:
- AND (
&&) - OR (
||)
For example, ResponseCodeRatio(500, 600, 0, 600) > 0.30 || NetworkErrorRatio() > 0.10 triggers the circuit breaker when 30% of the requests return a 5XX status code, or when the ratio of network errors reaches 10%.
Operators
Here is the list of supported operators:
- Greater than (
>) - Greater or equal than (
>=) - Lesser than (
<) - Lesser or equal than (
<=) - Equal (
==) - Not Equal (
!=)
Fallback mechanism
The fallback mechanism returns a HTTP 503 Service Unavailable to the client instead of calling the target service.
This behavior cannot be configured.
Status
There are three possible status for your circuit breaker:
Closed(your service operates normally).Open(the fallback mechanism takes over your service).Recovering(the circuit breaker tries to resume normal operations by progressively sending requests to your service).
Closed
While the circuit is closed, the circuit breaker only collects metrics to analyze the behavior of the requests.
At specified intervals (checkPeriod), the circuit breaker evaluates expression to decide if its state must change.
Open
While open, the fallback mechanism takes over the normal service calls for a duration of FallbackDuration.
The fallback mechanism returns a HTTP 503 (or ResponseCode) to the client.
After this duration, it enters the recovering state.
Recovering
While recovering, the circuit breaker sends linearly increasing amounts of requests to your service (for RecoveryDuration).
If your service fails during recovery, the circuit breaker opens again.
If the service operates normally during the entire recovery duration, then the circuit breaker closes.
