linkerd · adleong · Jun 22, 2026 · Jun 10, 2026 · Jun 11, 2026 · Jun 15, 2026
diff --git a/linkerd.io/content/2-edge/reference/circuit-breaking.md b/linkerd.io/content/2-edge/reference/circuit-breaking.md
@@ -37,24 +37,54 @@ including endpoints made unavailable by failure accrual.
 
 A _failure accrual policy_ determines how failures are tracked for endpoints,
 and what criteria result in an endpoint becoming unavailable ("tripping the
-circuit breaker"). Currently, the Linkerd proxy implements one failure accrual
-policy, _consecutive failures_. Additional failure accrual policies may be added
-in the future.
-
-{{< note >}}
-
-HTTP responses are classified as _failures_ if their status code is a [5xx
-server error]. Future Linkerd releases may add support for configuring what
-status codes are classified as failures.
-
-{{< /note >}}
+circuit breaker").
 
 ### Consecutive Failures
 
 In this failure accrual policy, an endpoint is marked as failing after a
 configurable number of failures occur _consecutively_ (i.e., without any
 successes). For example, if the maximum number of failures is 7, the endpoint is
-made unavailable once 7 failures occur in a row with no successes.
+made unavailable once 7 failures occur in a row with no successes. For the
+purpose of this failure accrual policy, a _failure_ is an HTTP response with
+a [5xx server error] status code or a gRPC response with one of the following
+gRPC status codes:
+
+- DATA_LOSS
+- DEADLINE_EXCEEDED
+- INTERNAL
+- PERMISSION_DENIED
+- UNAVAILABLE
+- INTERNAL
- INTERNAL
- PERMISSION_DENIED
- UNAVAILABLE
- INTERNAL
+- INTERNAL
+- PERMISSION_DENIED
+- UNAVAILABLE
- INTERNAL
- PERMISSION_DENIED
- UNAVAILABLE
- INTERNAL
+- INTERNAL
+- PERMISSION_DENIED
+- UNAVAILABLE
+
+### Unified
+
+In this failure accrual policy, an endpoint is marked as failing after _either_
+of the following condiditions is met:
-of the following condiditions is met:
+of the following conditions is met:
-of the following condiditions is met:
+of the following conditions is met:
+
+- Success rate drops below a configured threshold. For the purposes of
+  calculating success rate, a failure is any HTTP response with a
+  [5xx server error] or 429 status code or a gRPC response with one of the
+  following gRPC status codes:
+  - DATA_LOSS
+  - DEADLINE_EXCEEDED
+  - INTERNAL
+  - PERMISSION_DENIED
+  - UNAVAILABLE
+  - INTERNAL
-  - INTERNAL
-  - PERMISSION_DENIED
-  - UNAVAILABLE
-  - INTERNAL
+  - INTERNAL
+  - PERMISSION_DENIED
+  - UNAVAILABLE
-  - INTERNAL
-  - PERMISSION_DENIED
-  - UNAVAILABLE
-  - INTERNAL
+  - INTERNAL
+  - PERMISSION_DENIED
+  - UNAVAILABLE
+  - RESOURCE_EXHAUSTED
+- A configured number of failures occur _consecutively_. For the purpose of
+  tracking consecutive failures, a _failure_ is an HTTP response with a
+  [5xx server error] status code or a gRPC response with one of the following
+  gRPC status codes:
+  - DATA_LOSS
+  - DEADLINE_EXCEEDED
+  - INTERNAL
+  - PERMISSION_DENIED
+  - UNAVAILABLE
+  - INTERNAL
-  - INTERNAL
-  - PERMISSION_DENIED
-  - UNAVAILABLE
-  - INTERNAL
+  - INTERNAL
+  - PERMISSION_DENIED
+  - UNAVAILABLE
-  - INTERNAL
-  - PERMISSION_DENIED
-  - UNAVAILABLE
-  - INTERNAL
+  - INTERNAL
+  - PERMISSION_DENIED
+  - UNAVAILABLE
+
+For more information on the Unified failure
+accrual, see [Rate Limit Aware Load Balancing](../tasks/rate-limit-aware-load-balancing.md).
 
 ## Probation and Backoffs
 
@@ -123,8 +153,7 @@ breaking when sending traffic to that Service:
 - `balancer.linkerd.io/failure-accrual`: Selects the
   [failure accrual policy](#failure-accrual-policies) used when communicating
   with this Service. If this is not present, no failure accrual is performed.
-  Currently, the only supported value for this annotation is `"consecutive"`, to
-  perform [consecutive failures failure accrual](#consecutive-failures).
+  Supported values for this annotation are `consecutive` and `unified`.
 
 When the failure accrual mode is `"consecutive"`, the following annotations
 configure parameters for the consecutive-failures failure accrual policy:
@@ -150,6 +179,29 @@ configure parameters for the consecutive-failures failure accrual policy:
   floating-point number, and must be between 0.0 and 100.0. If this annotation
   is not present, the default value is 0.5.
 
+When the failure accrual mode is `"unified"`, the following annotations
+configure parameters for the unified failure accrual policy:
+
+- `balancer.alpha.linkerd.io/failure-accrual-success-rate-threshold`: If the
+  success rate of responses in the window drops below this threshold, then the
+  endpoint will be made unavailable.  Must be between `0.0` and `1.0`.
+  Rate-limited responses such as HTTP 429 and gRPC RESOURCE_EXHAUSTED count as
+  failures for this calculation. If this annotation is not present, the default
+  value is `0.8` (80% success rate).
+- `balancer.alpha.linkerd.io/failure-accrual-success-rate-window`: The window of
+  time over which success rate is calculated.  If this annotation is not present,
+  the default value is `10s`.
+- `balancer.alpha.linkerd.io/failure-accrual-success-rate-min-requests`: The
+  minimum number of responses which must be in the window before this breaker
+  can trip. This acts as a "cold start" protection to ensure we have a
+  sufficient number of responses for the success rate calculation to be
+  meaningful before tripping. If this annotation is not present, the default
+  value is `5`.
+- `balancer.linkerd.io/failure-accrual-consecutive-max-failures`: See above.
+- `balancer.linkerd.io/failure-accrual-consecutive-min-penalty`: See above.
+- `balancer.linkerd.io/failure-accrual-consecutive-max-penalty`: See above.
+- `balancer.linkerd.io/failure-accrual-consecutive-jitter-ratio`: See above.
+
 [^1]:
     The part of the proxy which handles connections from within the pod to the
     rest of the cluster.

diff --git a/linkerd.io/content/2-edge/tasks/configuring-rate-limiting.md b/linkerd.io/content/2-edge/tasks/configuring-rate-limiting.md
@@ -10,6 +10,9 @@ For more information about Linkerd's rate limiting check the
 [Rate Limiting feature doc](../features/rate-limiting/) and the
 [HTTPLocalRateLimitPolicy reference doc](../reference/rate-limiting/).
 
+To see how clients can be configured to load balance around Services which
+are rate-limited, see [Rate Limit Aware Load Balancing](../tasks/rate-limit-aware-load-balancing.md).
+
 ## Prerequisites
 
 To use this guide you'll only need a Kubernetes cluster running a Linkerd

diff --git a/linkerd.io/content/2-edge/tasks/rate-limit-aware-load-balancing.md b/linkerd.io/content/2-edge/tasks/rate-limit-aware-load-balancing.md
@@ -0,0 +1,102 @@
+---
+title: Handling Rate-Limited Endpoints
+description: Automatically route traffic away from rate-limited endpoints
+---
+
+When backends implement rate limiting and return
+[HTTP 429](https://www.rfc-editor.org/rfc/rfc6585.html#page-3) or
+[gRPC RESOURCE_EXHAUSTED](https://grpc.github.io/grpc/core/md_doc_statuscodes.html)
+by default, the proxy treats these as successful responses from a load
+balancing perspective. Since these types of responses are typically very fast,
+Linkerd's [EWMA load balancing](../features/load-balancing.md) may actually
+send _more_ traffic to these rate-limited endpoints. This can create a feedback
+loop where clients experience high 429 or RESOURCE_EXHAUSTED rates.
+
+Linkerd has two experimental features to help route traffic away from endpoints
+which are in a rate-limited state.
+
+{{< docs/production-note >}}
+
+{{< warning >}}
+
+Rate Limit Aware Load Balancing is an experimental, opt-in feature.
+
+{{< /warning >}}
+
+## Load Biaser
+
+Linkerd can be configured to use a more sophisticated version of the EWMA
+load balancing algorithm which takes rate-limit responses (HTTP 429 or gRPC
+RESOURCE_EXHAUSTED) into account. This algorithm is called the Load Biaser
+because it biases traffic away from endpoints which have returned rate-limit
+responses recently.
+
+The Load Biaser works exactly the same as [EWMA](../features/load-balancing.md)
+except that when it receives a rate-limited response, it substitutes a fixed
+penalty value for the response's actual latency (unless the latency is
+higher). For example, if the penalty is configured to be `5s` and the Load
+Biaser receives a 429 response in `10ms`, it will treat the latency of that
+response as `5s` for load balancing purposes.
+
+In this way, the load balancer will not favor endpoints which return
+rate-limited responses quickly.
+
+The penalty value can be further refined if the server sets the `Retry-After`
+HTTP response header or the `grpc-retry-pushback-ms` gRPC trailer. If one of
+these values is present and is higher than the configured penalty, it will be
+used in place of the penalty. This allows servers to exert a higher or lower
+amount of pushback.
+
+To enable Linkerd to use the Load Biaser for a Service, set the following
+annotation on the Service resource:
+
+| Annotation                                    | Type | Default | Notes                                    |
+|-----------------------------------------------|------|---------|------------------------------------------|
+| `balancer.alpha.linkerd.io/penalize-failures` | bool | `false` | Enables the Load Biaser for this Service |
+
+The Load Biaser can be further configured with these annotations on the Service
+resource:
+
+| Annotation                                                    | Type     | Default |                                                                                        |
+|---------------------------------------------------------------|----------|---------|----------------------------------------------------------------------------------------|
+| `balancer.alpha.linkerd.io/load-biaser-penalty`               | duration | `5s`    | The latency value to inject for rate-limited responses and failures                    |
+| `balancer.alpha.linkerd.io/load-biaser-max-retry-after`       | duration | `300s`  | The maximum allowed value of a Retry-After header                                      |
+
+## Unified Circuit Breaker
+
+Linkerd can be configured to use a more sophisticated version of
+[consecutive failures failure accrual](../tasks/circuit-breakers.md) called
+Unified failure accrual.
+
+The Unified failure accrual can be configured with a success rate threshold.
+If the percent of responses within a fixed time window drops below this
+threshold, the circuit breaker will trip, temporarily cutting off traffic to
+this endpoint and giving it time to recover. Critically, any rate-limited
+responses will count as failures for this success rate calculation.
+
+The Unified failure accrual will ALSO trip if it encounters a configured
+number of consecutive failures, just like the consecutive failures accrual.
+
+To enable the Unified failure accrual circuit breaker on a Service, set the
+following annotation to `"unified"` on the Service resource:
+
+| Annotation                            | Type     | Default | Notes                                                                        |
+|---------------------------------------|----------|---------|------------------------------------------------------------------------------|
+| `balancer.linkerd.io/failure-accrual` | string.  | None    | The failure-accrual mode. Set to `unified` to enable Unified failure accrual |
+
+The Unified failure accrual can be further configured with these annotations on
+the Service resouce:
-the Service resouce:
+the Service resource:
-the Service resouce:
+the Service resource:
+
+| Annotation                                                            | Type                         | Default | Notes                                                            |
+|-----------------------------------------------------------------------|------------------------------|---------|------------------------------------------------------------------|
+| `balancer.alpha.linkerd.io/failure-accrual-success-rate-threshold`    | number between 0 and 1       | `0.8`   | The success rate threshold at which to trip the breaker          |
+| `balancer.alpha.linkerd.io/failure-accrual-success-rate-window`       | duration                     | `10s`   | The window over which the success rate is calculated             |
+| `balancer.alpha.linkerd.io/failure-accrual-success-rate-min-requests` | number                       | `5`     | Only trip if there are at least this many requests in the window |
+| `balancer.linkerd.io/failure-accrual-consecutive-max-failures`        | number                       | `7`     | Trip if we encounter this many consecutive failures              |
+| `balancer.linkerd.io/failure-accrual-consecutive-min-penalty`         | duration                     | `1s`    | The minimum duration for which to cut off traffic                |
+| `balancer.linkerd.io/failure-accrual-consecutive-max-penalty`         | duration                     | `1m`    | The maximum duration for which to cut off traffic                |
+| `balancer.linkerd.io/failure-accrual-consecutive-jitter-ratio`        | number between 0.0 and 100.0 | `0.5`   | The amount of randomness to inject into the backoff              |
+
+See the
+[reference documentation](../reference/circuit-breaking/#configuring-failure-accrual)
+for details on failure accrual configuration.