Skip to content

Add docs for rate limit aware load balancing#2126

Open
adleong wants to merge 5 commits into
mainfrom
alex/rlalb
Open

Add docs for rate limit aware load balancing#2126
adleong wants to merge 5 commits into
mainfrom
alex/rlalb

Conversation

@adleong

@adleong adleong commented Jun 10, 2026

Copy link
Copy Markdown
Member

No description provided.

Signed-off-by: Alex Leong <alex@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
@cratelyn cratelyn requested a review from unleashed June 11, 2026 20:11

@raykroeker raykroeker left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor language changes, and 1 technical issue (ratio at the bottom).

Comment thread linkerd.io/content/2-edge/reference/circuit-breaking.md Outdated
failures for this calculation. If this annotation is not present, the default
value is `0.8` (80% success-rate).
- `balancer.alpha.linkerd.io/failure-accrual-success-rate-window`: The window of
time over success-rate is calculated. If this annotation is not present, the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of:
The window of time over success-rate is calculated..

The window of time over which the success-rate is calculated...

Comment thread linkerd.io/content/2-edge/tasks/rate-limit-aware-load-balancing.md Outdated
Comment thread linkerd.io/content/2-edge/tasks/rate-limit-aware-load-balancing.md Outdated
Comment thread linkerd.io/content/2-edge/tasks/rate-limit-aware-load-balancing.md Outdated
| `balancer.linkerd.io/failure-accrual-consecutive-max-failures` | number | `7` | Trip if we encounter this many consecutive failures |
| `balancer.linkerd.io/failure-accrual-consecutive-min-penalty` | duration | `1s` | The minimum duration for which to cut off traffic |
| `balancer.linkerd.io/failure-accrual-consecutive-max-penalty` | duration | `1m` | The maximum duration for which to cut off traffic |
| `balancer.linkerd.io/failure-accrual-consecutive-jitter-ratio` | number between 0.0 and 100.0 | `0.5` | The amount of randomness to inject into the backoff |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of:
number between 0.0 and 100.0

number between 0.0 and 1.0

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Alex Leong <alex@buoyant.io>

@unleashed unleashed left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note on backwards compatibility to remove these annotations if downgrading?

| Annotation | Type | Default | |
|---------------------------------------------------------------|----------|---------|----------------------------------------------------------------------------------------|
| `balancer.alpha.linkerd.io/load-biaser-penalty` | duration | `5s` | The latency value to inject for rate-limited responses |
| `balancer.alpha.linkerd.io/failure-accrual-honor-retry-after` | boolean | `false` | If Retry-After response headers or grpc-retry-pushback-ms gRPC trailers are respected. |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This became a no-op after removing support for retry-after/grpc-retry-pushback-ms hints in breakers.

Comment on lines +48 to +50
amount of pushback. Note that this requires setting the
`balancer.alpha.linkerd.io/failure-accrual-honor-retry-after=true` annotation on
the Service in order for these response hints to be used.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are optionally enabling this in the biaser, but always respecting it whenever it is enabled.

number of consecutive failures, just like the consecutive failures accrual.

To enable the Unified failure accrual circuit breaker on a Service, set the
following annotation to `"unified"` on the Server resource:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
following annotation to `"unified"` on the Server resource:
following annotation to `"unified"` on the Service resource:


| Annotation | Type | Default | |
|---------------------------------------------------------------|----------|---------|----------------------------------------------------------------------------------------|
| `balancer.alpha.linkerd.io/load-biaser-penalty` | duration | `5s` | The latency value to inject for rate-limited responses |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `balancer.alpha.linkerd.io/load-biaser-penalty` | duration | `5s` | The latency value to inject for rate-limited responses |
| `balancer.alpha.linkerd.io/load-biaser-penalty` | duration | `5s` | The latency value to inject for rate-limited responses and failures |

Comment on lines +29 to +30
load balancing algorithm which takes rate-limit responses (HTTP 429 or gRPC
RESOURCE_EXHAUSTED) into account. This algorithm is called the Load Biaser

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also takes failures into account -- some gRPC (server error ones) and all HTTP 5xx responses.

- `balancer.alpha.linkerd.io/failure-accrual-success-rate-threshold`: If the
success-rate of responses in the window drops below this threshold, then the
endpoint will be made unavailable. Must be between `0.0` and `1.0`.
Rate-limited responses such as HTTP 429 and gRPC RESOURCE_EXHAUSATED count as

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Rate-limited responses such as HTTP 429 and gRPC RESOURCE_EXHAUSATED count as
Rate-limited responses such as HTTP 429 and gRPC RESOURCE_EXHAUSTED count as

@@ -0,0 +1,105 @@
---
title: Rate Limit Aware Load Balacing

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: Rate Limit Aware Load Balacing
title: Rate Limit Aware Load Balancing


{{< warning >}}

Rate Limit Aware Load Balacing is an experimental, opt-in feature.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Rate Limit Aware Load Balacing is an experimental, opt-in feature.
Rate Limit Aware Load Balancing is an experimental, opt-in feature.

Comment on lines 40 to 42

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
circuit breaker").

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i.e. just delete the last two sentences)

### Unified

In this failure accrual policy, an endpoint is marked as failing after _either_
success-rate drops below a configured threshold _or_ a configured number of

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"success-rate" => "success rate" throughout this PR

Comment on lines 133 to 134

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Supported values for this annotation are `consecutive` and `unified`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new unified policy respects 429s and gRPC rate-limited responses. I am not sure whether that is extended to the consecutive policy, but either way we should clarify in this note.

When backends implement rate limiting and return
[HTTP 429](https://www.rfc-editor.org/rfc/rfc6585.html#page-3) or
[gRPC RESOURCE_EXHAUSTED](https://grpc.github.io/grpc/core/md_doc_statuscodes.html)
responses, the proxy currently treats these as successful responses from a load

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
responses, the proxy currently treats these as successful responses from a load
by default, the proxy treats these as successful responses from a load

Comment on lines +2 to +3
title: Rate Limit Aware Load Balacing
description: Routing traffic away from rate limited endpoints

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: Rate Limit Aware Load Balacing
description: Routing traffic away from rate limited endpoints
title: Handling Rate-Limited Endpoints
description: Automatically route traffic away from rate-limited endpoints

adleong added 2 commits June 19, 2026 21:52
Signed-off-by: Alex Leong <alex@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants