fix(load-biaser): add test assertions to load function test#4551
Open
cratelyn wants to merge 43 commits into
Open
fix(load-biaser): add test assertions to load function test#4551cratelyn wants to merge 43 commits into
cratelyn wants to merge 43 commits into
Conversation
Introduce linkerd-ewma, a general-purpose exponentially-weighted moving average crate. The crate provides five public methods on an Ewma struct: new (initializes with INFINITY sentinel), get (returns stored value), add (blends a new sample using exponential decay), add_peak (replaces stored value when the new sample exceeds it), and add_rate (derives a rate from the inverse of the elapsed interval and feeds it through add). This is being added in spite of tower::PeakEwma because this is not limited to middleware-based RTT computing. We specifically plan to use this implementation for a load biasing feature and a success-rate circuit breaker policy, which would otherwise not be possible. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Extend linkerd-ewma with the API surface needed for success-rate circuit breaking. A MIN_DECAY constant (1 ms) is now applied in both constructors so that a zero-duration decay never produces division-by-zero or NaN results in downstream arithmetic. New methods: new_with_value sets an explicit initial sample instead of the INFINITY sentinel, reset overwrites both value and timestamp for breaker recovery, and get_at projects the stored value forward through exponential decay without mutating internal state. Also add_peak is now decay-aware: it projects the stored value to the candidate timestamp before deciding whether to replace it, and it unconditionally replaces INFINITY so that the first real sample always takes effect even at the construction timestamp. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Add a retry_after module to linkerd-http-classify with shared parsing functions for extracting backoff hints from HTTP and gRPC responses. parse_retry_after handles 429/503 responses with both delay-seconds and HTTP-date formats per RFC 7231, capping the returned duration at a caller-specified maximum. parse_grpc_retry_pushback reads the grpc-retry-pushback-ms header per the gRPC A6 spec, rejecting negative values and capping positive ones. We use the httpdate crate for the actual RFC 7231 HTTP-date parsing. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
…re penalties Introduce the linkerd-load-biaser crate, which wraps any tower::Service to provide per-endpoint load metrics for P2C balancing. The crate tracks request latency via EWMA and injects penalties when failure responses are detected, steering traffic away from unhealthy endpoints. Penalty injection covers HTTP 429/503/5xx and gRPC RESOURCE_EXHAUSTED/UNAVAILABLE trailers-only responses (not streaming gRPC failures since we can only access headers here). For responses with backoff hints, Retry-After on HTTP 429/503 or grpc-retry-pushback-ms on gRPC trailers-only errors, the penalty is amplified so that the EWMA value remains meaningful through the server-requested backoff window. The amplification is clamped to prevent infinity from permanently disabling the endpoint. The load metric is computed as `max(rtt * (pending + 1), penalty)`, where `rtt` is the peak-EWMA latency, and `pending` is the number of in-flight requests. This is returned via tower::load::Load for direct P2C integration. The load biaser is disabled by default, preserving RTT-only behavior (PeakEwma equivalent), unless explicitly activated. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
These cover the complete load biasing lifecycle, including penalty injection, hint parsing, cancellation safety via PinnedDrop, and backwards-compatible behavior when disabled (ie. RTT-only behavior equivalent to PeakEwma). Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Co-authored-by: katelyn martin <kate@buoyant.io>
Co-authored-by: katelyn martin <kate@buoyant.io>
…_rate_limit_hint The _max parameter was accepted for API symmetry with rate_limit_hint(max) but intentionally unused: the method always caches the uncapped raw value so each consumer can apply its own cap via rate_limit_hint(max). Removing the parameter for now since we probably won't need it in the future, and if so we can always put it back in place. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
…or and accessor Make the inner Duration field private and provide CachedRateLimitHint::new() for construction and duration_capped(max) for reads. This prevents consumers from bypassing the per-caller cap that rate_limit_hint(max) enforces, since the cached value is intentionally uncapped. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Explain why a standalone EWMA crate exists instead of using Tower's RttEstimate: it is private, mutates on read, and cannot support the penalty dimension that failure-aware load balancing requires. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The crate only uses tokio::time, so disable the default feature set to avoid pulling unnecessary features into the dependency declaration. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The cancellation test uses tokio::sync::oneshot which requires the sync feature. This compiled only because workspace feature unification pulled it in from other crates. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Replace raw string literals with the module-level constant for consistency with how HTTP tests use http::header::RETRY_AFTER. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Consistent with Ewma::new which already has this attribute. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Inspect the grpc-status header only on HTTP 200 responses whose content-type starts with application/grpc. Without this a non-gRPC upstream that happens to include a grpc-status header would be considered a gRPC failure and penalized by the load biaser. The same check is applied to the gRPC retry-pushback-ms parsing in the ReponseFailureHint trait implementation. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Up until now we mapped every non-zero gRPC status code to FailureHint::InternalError, penalizing client errors like CANCELLED, INVALID_ARGUMENT, NOT_FOUND, etc. These don't indicate server health issues and should not steer traffic away from the endpoint. Restrict penalty injection to server-side error codes that indicate endpoint problems: UNKNOWN (2), DEADLINE_EXCEEDED (4), INTERNAL (13), and DATA_LOSS (15), alongside the existing RESOURCE_EXHAUSTED (8) and UNAVAILABLE (14) statuses. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Ensure only those gRPC status codes indicating server-side errors inject penalties. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Verify that consecutive 429 responses at 1s intervals keep the penalty at the configured level, confirming the EWMA peak resets the decayed value rather than accumulating. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Add a `last_update()` getter that returns the timestamp of the most recent EWMA update. Callers that need to detect staleness (ie. idle periods where the EWMA has decayed to the point that a single sample dominates) can compare this against the current time to detect this exact circumstance (and, for example, require more samples before taking decisions). Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Co-authored-by: katelyn martin <git@katelyn.world>
1653593 to
67cb4d6
Compare
- Drop unused add_rate, last_update - Correct MIN_DECAY enforcement comment - Note on ignoring negative do-not-retry pushbacks Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
…A RTT We now now keep a single RTT EWMA and a load of `rtt * (pending + 1)`, exactly like Tower's PeakEwma. A success records its measured RTT, while a failure now records a computed effective RTT through the same peak-EWMA logic, using the Retry-After or grpc-retry-pushback hint when present, or otherwise penalizing the RTT with a base value. In-flight requests are now counted the way Tower's PeakEwma counts them, using Arc's strong count and measuring on cancellation. Finally an explicit completion tracker can use `PendingUntilFirstData` for measurement to more closely match previous behavior. `linkerd-ewma` is still a separate crate because we feed it a penalty value rather than a measured RTT, and since Tower's `RttEstimate` is private (at the moment) and advances its decay clock on read, it can't accept an injected observation nor be read under a shared lock. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
…fault Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The gRPC A6 spec defines grpc-retry-pushback-ms as an i32. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
We can now trim our tokio flags and drop the tokio-test dependency. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
penalty_ms on SharedState is millisecond. Remove references to types that are not integrated yet. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Conveys meaning without coupling type nor constant. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Add a test exercising the case of a sample at or below the still undecayed peak. It should not replace the peak, but compute the value blending in the new sample. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Ensure that add() discards a sample whose timestamp is at or before the stored one. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
A hint below the base penalty such as 0 records a low effective RTT and can make a failing endpoint look healthier than it should. Ensure a failure's recorded measurement is at least the base penalty, so that retry hints take effect only when they exceed that penalty. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
test_rtt_tracked_after_request resolved instantly under paused time and only checked that the RTT moved minimally. Drive a request that takes a measurable delay and assert the recorded RTT reflects it. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Existing tests raised the pending count with disabled handles or a single request. Try now with two concurrent requests and assert the strong count reports two pending, then assert the count falls when they resolve. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
this commit adds `assert_eq!` statements to confirm that the load values claimed in the comments here are accurate. the values of `load_idle` and `load_one_pending` are not reflected in the comments above. `TODO` comments are left here to note this. Signed-off-by: katelyn martin <kate@buoyant.io>
67cb4d6 to
e0f4993
Compare
Member
Author
|
i've rebased this branch atop the latest state of the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
this commit adds
assert_eq!statements to confirm that the load valuesclaimed in the comments here are accurate.
the values of
load_idleandload_one_pendingare not reflected inthe comments above.
TODOcomments are left here to note this.Signed-off-by: katelyn martin kate@buoyant.io