Fix race conditions in RunningEndpointInstance.Stop and BaseEndpointLifecycle where concurrent or early-cancelled Stop calls leave the endpoint in a broken state#7750
Merged
Conversation
… callers return immediately; note BaseEndpointLifecycle.Stop calls StopCore not Stop
andreasohlund
approved these changes
May 13, 2026
DavidBoike
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When
Stopis called with an already-cancelled token (e.g., during host shutdown when the cancellation token has already triggered),stopSemaphore.WaitAsync(cancelledToken)throwsOperationCanceledExceptionbefore entering the critical section. This leavesstatusasRunningandendpointInstancenon-null. The framework then callsDisposeAsync, which re-entersStop(CancellationToken.None)and attempts full shutdown against a DI container that is already being torn down by the host, causingObjectDisposedExceptiononstoppingTokenSourceor accessing a disposedILoggerFactory.The same race exists in
BaseEndpointLifecycle.Stop:lifeCycleSemaphore.WaitAsync(cancelledToken)can fail, leavingendpointInstancenon-null, and subsequentDisposeAsyncre-enters stop against an already-disposed container.Solution
RunningEndpointInstance is refactored into three methods:
StopCore(CancellationToken)performs shutdown only (cancel stopping token, stop components/transport). It usesCancellationToken.Noneon the semaphore wait because the semaphore is an internal serialization mechanism, not a cancellation point. The first caller to enterStopCoreowns shutdown; later callers that observeStoppingorStoppedreturn immediately without waiting.Stop(CancellationToken)is the legacyIEndpointInstanceAPI. It callsStopCorethenDisposeAsyncin a try/finally, so the public contract still covers full shutdown and cleanup.DisposeAsync()handles cleanup only (unregister log slot, clear settings, dispose CTS/service provider lease). It usesInterlocked.Exchangefor idempotency and callsStopCoreas a safety net in caseStopwas never called.BaseEndpointLifecycle changes:
StopusesCancellationToken.NoneonlifeCycleSemaphore.WaitAsyncand callsendpointInstance.StopCore(cancellationToken)(notStop), so cleanup is left to the separateDisposeAsynccall.DisposeAsyncusesInterlocked.ExchangeonisDisposedfor idempotency, reads/nullsendpointInstance(safe as a reference-type atomic read), then callsinstance.DisposeAsync()andproviderLease.DisposeAsync(). No semaphore is needed becauseInterlocked.Exchangealready prevents concurrent entry.EndpointHostedService is unchanged. The .NET Generic Host calls
StopAsync(token)thenDisposeAsync(), which maps tolifecycle.Stop(token)thenlifecycle.DisposeAsync(). This is the correct two-step flow.Internally managed mode (legacy
IEndpointInstance.Stop()): callsStopCorethenDisposeAsync, maintaining the original "stop and clean up everything" contract. Double-dispose of the service provider is idempotent.Key decisions
CancellationToken.Noneon semaphore waits: the semaphore is an internal serialization mechanism; the caller token must not abort the wait because a failed wait leaves state asRunning, allowing a subsequentDisposeAsyncre-entry to attempt full shutdown against an already-torn-down DI container.StopCoreearly return: later callers that observeStoppingorStoppedreturn immediately. Only the first caller owns shutdown. This is intentional; waiting would add no value since the outcome is predetermined.StopandDisposeAsyncinBaseEndpointLifecycle: allows the hosted service to control the lifecycle with a clean two-step flow (stop, then dispose).Acceptance test
When_stop_is_called_with_cancelled_tokencancels the scenario token after the endpoint starts, causingStopEndpointsto pass an already-cancelled token toBaseEndpointLifecycle.Stop. Without the fix, this exposesObjectDisposedExceptionduring disposal.