fix(atelet): set OCI CgroupsPath so each actor gets its own cgroup by ArnaudBger · Pull Request #161 · agent-substrate/substrate

ArnaudBger (ArnaudBger) · 2026-06-03T06:26:03Z

Fixes #50

Issue

When the OCI spec leaves Linux.CgroupsPath empty, runsc uses whatever cgroup it inherited from its parent process and does not record it as one it owns. At teardown, runsc only attempts Rmdir on paths it owns:

// runsc/cgroup/cgroup_v2.go — Uninstall
for i := len(c.Own) - 1; i >= 0; i-- {
    current := c.Own[i]
    // unix.Rmdir(current) with backoff on EBUSY
}

That makes the failure mode asymmetric and that is why it is hard to spot:

Non-owner actors hit Uninstall with an empty c.Own, the loop body never runs, the function returns nil — no log, no error.
The owner actor actually calls Rmdir on the shared cgroup, and gets EBUSY because the non-owners' processes are still in cgroup.procs. This is the only log line you ever see.

From logs alone it looks like one specific actor is broken — when in reality the bug is shared-cgroup ownership across actors.

How to reproduce

Start actor A with no Linux.CgroupsPath. runsc creates the pause cgroup; A owns it.
Start actor B in the same sandbox; B attaches processes to A's pause cgroup.
Suspend or delete A. runsc calls Rmdir on the pause path.
Rmdir returns EBUSY (B is still in cgroup.procs), surfacing removing cgroup path "/sys/fs/cgroup/pause": device or resource busy and leaving stale state — matching the failure in Actor stuck in STATUS_SUSPENDING #50.

Solution

Set a relative, per-actor path:

CgroupsPath: path.Join("actors", actorTemplateNamespace, actorTemplateName, actorID, containerName),

Relative → runsc resolves it under its own current cgroup (the sandbox pod's cgroup), so per-actor usage rolls up into the pod's kubelet/cAdvisor accounting.
Unique per actor → each container lives in its own directory, ends up in c.Own, and is created/cleaned up cleanly with no cross-actor EBUSY.
Layout follows the kubectl <namespace>/<name> convention (actors/<ns>/<template>/<actor>/<container>), so per-namespace and per-template totals are queryable at each level.

Spec construction is extracted into buildSpec so the new behavior is unit-testable.

google-cla · 2026-06-03T06:26:13Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

ArnaudBger (ArnaudBger) force-pushed the fix/atelet-actor-cgroups-path branch from e662fc3 to 7d3e5cd Compare June 3, 2026 06:44

fix(atelet): set OCI CgroupsPath so each actor gets its own cgroup

3164926

ArnaudBger (ArnaudBger) force-pushed the fix/atelet-actor-cgroups-path branch from 7d3e5cd to 3164926 Compare June 4, 2026 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(atelet): set OCI CgroupsPath so each actor gets its own cgroup#161

fix(atelet): set OCI CgroupsPath so each actor gets its own cgroup#161
ArnaudBger (ArnaudBger) wants to merge 1 commit into
agent-substrate:mainfrom
ArnaudBger:fix/atelet-actor-cgroups-path

ArnaudBger (ArnaudBger) commented Jun 3, 2026 •

edited

Loading

Uh oh!

google-cla Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ArnaudBger (ArnaudBger) commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

How to reproduce

Solution

Uh oh!

google-cla Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ArnaudBger (ArnaudBger) commented Jun 3, 2026 •

edited

Loading