fix: agentcore dev orphaned processes#1438
Conversation
|
Claude Security Review: no high-confidence findings. (run) |
Package TarballHow to installgh release download pr-1438-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.16.0.tgz |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Thanks for tackling the orphaned process bug. The fix on Linux/macOS is sound — detached: true puts the child in its own process group and process.kill(-pid, signal) signals the whole tree, which is exactly what's needed for uvicorn --reload's reloader/worker pair.
However, this approach is broken on Windows, which src/lib/utils/platform.ts and AGENTS.md indicate is a supported platform. Two issues called out inline:
detached: trueon Windows spawns the child in a new console window (visible flash, separate console).- Windows has no POSIX process groups —
process.kill(-pid, ...)will throwEINVAL/ESRCH, thecatchfalls back tochild.kill(signal), andchild.killon Windows does not kill the process tree either. So on Windows we'd be back to orphaning subprocesses — same bug this PR is fixing.
The usual fix is to gate the new behavior on !isWindows and use taskkill /pid <pid> /T /F (or the tree-kill package) for Windows process tree termination.
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Nice fix — the POSIX-side approach (detached: true + process.kill(-pid, sig)) is exactly right for uvicorn --reload's reloader/worker pair, and the Windows path via taskkill /T /F covers the cross-platform gap raised in earlier review threads.
Most of my concerns were already covered in those threads (Windows handling, the this.child! footgun, the SIGTERM-only handler in TUI mode). One additional test-correctness item below — non-blocking for CI (since CI only runs on ubuntu-latest) but worth fixing for parity with codezip-dev-server.test.ts, which already mocks isWindows.
|
Claude Security Review: no high-confidence findings. (run) |
71ef549 to
5b4a8f4
Compare
|
Claude Security Review: no high-confidence findings. (run) |
Description
When
agentcore devis killed programmatically (SIGTERM/SIGKILL), the underlying uvicorn reloader and worker processes are not terminated. They get reparented to PID 1 and continue holding the port. Subsequentagentcore devruns then fall back to a different port, leaking processes with each run.Interactive Ctrl+C (SIGINT) was unaffected since it hits the entire process group and was already handled. This fix addresses only the programmatic
kill $PID(SIGTERM) path used by scripts, CI, and agent tooling.Root cause: Two issues combined:
kill $PID) exited without triggering cleanup.child.kill('SIGTERM')only signals the direct child PID.uvicorn --reloadforks a reloader that spawns a worker, so grandchild processes survived.Fix:
detached: true(POSIX) so it gets its own process group.process.kill(-pid, signal)on POSIX,taskkill /pid <pid> /T /Fon Windows.--logsand browser/web-UI useonShutdownSignal()(SIGINT + SIGTERM), TUI adds SIGTERM only (Ink handles SIGINT).onShutdownSignal()helper to avoid duplicating signal registration.Related Issue
Closes #1440
Documentation PR
N/A
Type of Change
Testing
Manually verified on macOS and Windows:
Manually tested on windows by running agentcore dev and then Stop-Process on the node process, making sure no child processes remained.
npm run test:unitandnpm run test:integnpm run typechecknpm run lintsrc/assets/, I rannpm run test:update-snapshotsand committed the updated snapshotsChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.