Fix race conditions in RunningEndpointInstance.Stop and BaseEndpointLifecycle where concurrent or early-cancelled Stop calls leave the endpoint in a broken state by danielmarbach · Pull Request #7750 · Particular/NServiceBus

danielmarbach · 2026-05-12T08:58:37Z

Problem

When Stop is called with an already-cancelled token (e.g., during host shutdown when the cancellation token has already triggered), stopSemaphore.WaitAsync(cancelledToken) throws OperationCanceledException before entering the critical section. This leaves status as Running and endpointInstance non-null. The framework then calls DisposeAsync, which re-enters Stop(CancellationToken.None) and attempts full shutdown against a DI container that is already being torn down by the host, causing ObjectDisposedException on stoppingTokenSource or accessing a disposed ILoggerFactory.

The same race exists in BaseEndpointLifecycle.Stop: lifeCycleSemaphore.WaitAsync(cancelledToken) can fail, leaving endpointInstance non-null, and subsequent DisposeAsync re-enters stop against an already-disposed container.

Solution

RunningEndpointInstance is refactored into three methods:

StopCore(CancellationToken) performs shutdown only (cancel stopping token, stop components/transport). It uses CancellationToken.None on the semaphore wait because the semaphore is an internal serialization mechanism, not a cancellation point. The first caller to enter StopCore owns shutdown; later callers that observe Stopping or Stopped return immediately without waiting.
Stop(CancellationToken) is the legacy IEndpointInstance API. It calls StopCore then DisposeAsync in a try/finally, so the public contract still covers full shutdown and cleanup.
DisposeAsync() handles cleanup only (unregister log slot, clear settings, dispose CTS/service provider lease). It uses Interlocked.Exchange for idempotency and calls StopCore as a safety net in case Stop was never called.

BaseEndpointLifecycle changes:

Stop uses CancellationToken.None on lifeCycleSemaphore.WaitAsync and calls endpointInstance.StopCore(cancellationToken) (not Stop), so cleanup is left to the separate DisposeAsync call.
DisposeAsync uses Interlocked.Exchange on isDisposed for idempotency, reads/nulls endpointInstance (safe as a reference-type atomic read), then calls instance.DisposeAsync() and providerLease.DisposeAsync(). No semaphore is needed because Interlocked.Exchange already prevents concurrent entry.

EndpointHostedService is unchanged. The .NET Generic Host calls StopAsync(token) then DisposeAsync(), which maps to lifecycle.Stop(token) then lifecycle.DisposeAsync(). This is the correct two-step flow.

Internally managed mode (legacy IEndpointInstance.Stop()): calls StopCore then DisposeAsync, maintaining the original "stop and clean up everything" contract. Double-dispose of the service provider is idempotent.

Key decisions

CancellationToken.None on semaphore waits: the semaphore is an internal serialization mechanism; the caller token must not abort the wait because a failed wait leaves state as Running, allowing a subsequent DisposeAsync re-entry to attempt full shutdown against an already-torn-down DI container.
StopCore early return: later callers that observe Stopping or Stopped return immediately. Only the first caller owns shutdown. This is intentional; waiting would add no value since the outcome is predetermined.
Separate Stop and DisposeAsync in BaseEndpointLifecycle: allows the hosted service to control the lifecycle with a clean two-step flow (stop, then dispose).
"Stopper" keyed singleton is intentionally a hidden backdoor for acceptance testing, not exposed in Core.

Acceptance test

When_stop_is_called_with_cancelled_token cancels the scenario token after the endpoint starts, causing StopEndpoints to pass an already-cancelled token to BaseEndpointLifecycle.Stop. Without the fix, this exposes ObjectDisposedException during disposal.

…e shutdown handling

…nario

…int shutdown logic

…sposals

… callers return immediately; note BaseEndpointLifecycle.Stop calls StopCore not Stop

danielmarbach added 3 commits May 12, 2026 10:56

Add acceptance test for graceful shutdown with cancelled token

46b6f23

Refactor lifecycle semaphore usage for endpoint management and improv…

1c5d031

…e shutdown handling

Refactor comment in test for Stop with cancelled token to clarify sce…

c880c05

…nario

internalautomation Bot assigned danielmarbach May 12, 2026

danielmarbach added 4 commits May 12, 2026 11:03

Remove redundant check for isStopped in Stop method to simplify endpo…

b058589

…int shutdown logic

Implement double-check locking in DisposeAsync to prevent multiple di…

f7ac9e7

…sposals

Stopping over public legacy API must still stop and dispose in one step

861162c

Multiple dispose test

285e50c

danielmarbach requested review from DavidBoike, andreasohlund and mattmercurio May 12, 2026 09:14

Clarify StopCore and Stop comments: first caller owns shutdown, later…

f358584

… callers return immediately; note BaseEndpointLifecycle.Stop calls StopCore not Stop

danielmarbach changed the title ~~Refactor and improve shutdown handling with cancelled tokens~~ Fix race conditions in RunningEndpointInstance.Stop and BaseEndpointLifecycle where concurrent or early-cancelled Stop calls leave the endpoint in a broken state May 12, 2026

andreasohlund approved these changes May 13, 2026

View reviewed changes

DavidBoike approved these changes May 13, 2026

View reviewed changes

danielmarbach merged commit 41e65c0 into master May 13, 2026
4 checks passed

danielmarbach deleted the stopping-race branch May 13, 2026 14:00

danielmarbach mentioned this pull request May 13, 2026

Fix method usage #7754

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race conditions in RunningEndpointInstance.Stop and BaseEndpointLifecycle where concurrent or early-cancelled Stop calls leave the endpoint in a broken state#7750

Fix race conditions in RunningEndpointInstance.Stop and BaseEndpointLifecycle where concurrent or early-cancelled Stop calls leave the endpoint in a broken state#7750
danielmarbach merged 8 commits into
masterfrom
stopping-race

danielmarbach commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

danielmarbach commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Key decisions

Acceptance test

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danielmarbach commented May 12, 2026 •

edited

Loading