feat(remediation): add quarantine action (deny-all NetworkPolicy + dry-run)#388
feat(remediation): add quarantine action (deny-all NetworkPolicy + dry-run)#388yugal07 wants to merge 1 commit into
Conversation
…y-run) Signed-off-by: yugal07 <yashsadhwani544@gmail.com>
📝 WalkthroughWalkthroughAdds a ChangesQuarantine Remediator and Action Handler Refactor
Sequence DiagramsequenceDiagram
participant Client
participant handleOperatorAction
participant QuarantineRemediator
participant KubeAPI
Client->>handleOperatorAction: OperatorActionQuarantine(target, dryRun)
handleOperatorAction->>QuarantineRemediator: Plan(ctx, Request)
QuarantineRemediator->>KubeAPI: Get workload object
KubeAPI-->>QuarantineRemediator: Deployment/Pod
QuarantineRemediator-->>handleOperatorAction: Plan{NetworkPolicy JSON}
handleOperatorAction->>QuarantineRemediator: Apply(ctx, plan, dryRun)
QuarantineRemediator->>KubeAPI: Create NetworkPolicy [dryRun?]
KubeAPI-->>QuarantineRemediator: ok / AlreadyExists
QuarantineRemediator-->>handleOperatorAction: Result{Applied}
handleOperatorAction-->>Client: result recorded
Client->>handleOperatorAction: OperatorActionRevert(target)
handleOperatorAction->>QuarantineRemediator: Revert(ctx, target, dryRun)
QuarantineRemediator->>KubeAPI: Delete NetworkPolicy
KubeAPI-->>QuarantineRemediator: ok / NotFound
QuarantineRemediator-->>handleOperatorAction: Result{Applied}
handleOperatorAction-->>Client: aggregated revert result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
mainhandler/actionhandler_test.go (1)
239-250: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert that excluded namespaces never issue a create.
Line 248 only checks the error path, so this test does not actually prove the “before any write” guarantee. Capture
createonnetworkpolicies(or assert noNetworkPolicyexists afterward) so a regression that writes before failing cannot slip through.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mainhandler/actionhandler_test.go` around lines 239 - 250, The quarantine test only verifies the error path and does not prove that no write occurs before rejecting an excluded namespace. Update TestHandleOperatorAction_QuarantineExcludedNamespace in actionhandler_test.go to also observe writes through the fake clientset, either by asserting no NetworkPolicy create happens on networkpolicies or by checking that no NetworkPolicy exists after handleOperatorAction returns. Use the existing handleOperatorAction and newActionHandlerForTest setup to ensure the rejection happens before any create for the excluded kube-system target.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@mainhandler/remediators/quarantine.go`:
- Line 105: The quarantine NetworkPolicy name is not unique enough because
quarantineNPName currently keys only on target.Name, so different target kinds
can collide on the same policy. Update the naming logic used from
Quarantine/Remediate and the related delete/revert path to include the target
identity kind as part of the base name, and add a short hash of the full
identity when truncation is needed. Make sure the new unique name is used
consistently wherever the policy name is generated or looked up so different
targets do not share or delete each other’s NetworkPolicy.
- Around line 140-152: The workload selector handling in quarantine planning
only preserves MatchLabels, so Deployment/StatefulSet/DaemonSet selectors with
matchExpressions can be lost or applied incorrectly. Update the selector flow in
the remediator logic around the workload lookup branches and the
NetworkPolicySpec.PodSelector assignment to carry the full metav1.LabelSelector
through planning instead of converting it to labels-only, and make sure the
planning path uses the original selector from the workload object when building
the quarantine policy.
---
Nitpick comments:
In `@mainhandler/actionhandler_test.go`:
- Around line 239-250: The quarantine test only verifies the error path and does
not prove that no write occurs before rejecting an excluded namespace. Update
TestHandleOperatorAction_QuarantineExcludedNamespace in actionhandler_test.go to
also observe writes through the fake clientset, either by asserting no
NetworkPolicy create happens on networkpolicies or by checking that no
NetworkPolicy exists after handleOperatorAction returns. Use the existing
handleOperatorAction and newActionHandlerForTest setup to ensure the rejection
happens before any create for the excluded kube-system target.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cab49b84-5e06-443f-b548-0bf737de4bf6
📒 Files selected for processing (5)
mainhandler/actionhandler.gomainhandler/actionhandler_test.gomainhandler/remediators/quarantine.gomainhandler/remediators/quarantine_test.gomainhandler/remediators/remediator.go
| if err := validateTarget(t); err != nil { | ||
| return Result{}, err | ||
| } | ||
| name := quarantineNPName(t.Name) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Make the quarantine policy name unique per target identity.
quarantineNPName() keys only on target.Name, so Deployment/payments/api and StatefulSet/payments/api collide on the same NetworkPolicy. The second apply is then reported as success via AlreadyExists, and reverting either target deletes the other's policy. Include at least the kind in the name, and preferably a short hash of the full target identity when truncation is involved.
Also applies to: 181-181, 204-212
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mainhandler/remediators/quarantine.go` at line 105, The quarantine
NetworkPolicy name is not unique enough because quarantineNPName currently keys
only on target.Name, so different target kinds can collide on the same policy.
Update the naming logic used from Quarantine/Remediate and the related
delete/revert path to include the target identity kind as part of the base name,
and add a short hash of the full identity when truncation is needed. Make sure
the new unique name is used consistently wherever the policy name is generated
or looked up so different targets do not share or delete each other’s
NetworkPolicy.
| labels = matchLabels(o.Spec.Selector) | ||
| case "statefulset": | ||
| o, err := r.client.AppsV1().StatefulSets(t.Namespace).Get(ctx, t.Name, metav1.GetOptions{}) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| labels = matchLabels(o.Spec.Selector) | ||
| case "daemonset": | ||
| o, err := r.client.AppsV1().DaemonSets(t.Namespace).Get(ctx, t.Name, metav1.GetOptions{}) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| labels = matchLabels(o.Spec.Selector) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Preserve the full workload selector.
This only carries MatchLabels forward. A Deployment/StatefulSet/DaemonSet with matchExpressions will either fail planning (len(labels) == 0) or quarantine the wrong pods when the selector mixes labels and expressions. Carry the full metav1.LabelSelector through planning and assign it directly to NetworkPolicySpec.PodSelector.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mainhandler/remediators/quarantine.go` around lines 140 - 152, The workload
selector handling in quarantine planning only preserves MatchLabels, so
Deployment/StatefulSet/DaemonSet selectors with matchExpressions can be lost or
applied incorrectly. Update the selector flow in the remediator logic around the
workload lookup branches and the NetworkPolicySpec.PodSelector assignment to
carry the full metav1.LabelSelector through planning instead of converting it to
labels-only, and make sure the planning path uses the original selector from the
workload object when building the quarantine policy.
Related issue: kubescape/kubescape#1770 — Kubescape CLI control over cluster operations
Design: kubescape/designs-and-proposals#5 — Kubescape CLI Control over Cluster Operations (merged)
Contract: armosec/armoapi-go#655 —
operatorActioncommand contract (merged,v0.0.720)Builds on: kubescape/operator#383 — Phase 1 (annotate + dry-run) (merged) · RBAC from kubescape/helm-charts#855 (merged)
Overview
This is Phase 2 of the merged design: it teaches the operator to execute the
quarantineaction. Phase 1 (#383)shipped
annotateand the extensibleRemediatorframework; this PR plugs thenext action into that framework with no pipeline or transport changes — a new
Remediatoradded to the registry, exactly as Phase 1 was designed to allow.quarantineisolates a workload by creating a deny-allNetworkPolicythatselects the workload's pods (both ingress and egress denied). Per the design's
resolved open question #3,
the default is non-destructive: the pods are not mutated or recreated, so
container state (memory, process trees) is preserved for forensic investigation.
Scale-to-zero / container-stop remains a future, explicit opt-in.
Signed Commits
What changed
mainhandler/remediators/quarantine.go(new) —QuarantineRemediatorimplementing
Plan/Apply/Revert:Planreads the target's pod selector from the live object(
Deployment/StatefulSet/DaemonSetspec.selector.matchLabels, or aPod's own labels) and computes a deny-allNetworkPolicywithout touchingthe cluster. Reading the live selector isolates the existing pods without
recreating them.
Applycreates theNetworkPolicy(deterministic namekubescape-quarantine-<target>); server-side dry-run whendryRun=true;idempotent (
AlreadyExists→ success, the workload is already isolated).Revertdeletes thatNetworkPolicy; a missing policy is a clean no-opsuccess; honors server-side dry-run.
reason,findingRef, the target) is recorded on theNetworkPolicy's labels/annotations.mainhandler/remediators/remediator.go— registerquarantineinNewRegistry(one line;cordonremains the only stubbed action).mainhandler/actionhandler.go— dispatchquarantinethrough a sharedapplyRemediation(Plan → Apply → record), and extendrevertto undoevery reversible action on the target (annotate + quarantine), idempotently.
This is required because the contract's
revertverb does not name which actionit undoes; each
Revertis a safe no-op when there is nothing to undo, and atarget object that no longer exists (
NotFound) is skipped so a leftoverNetworkPolicyis still cleaned up.mainhandler/remediators/quarantine_test.go(11) and additions tomainhandler/actionhandler_test.go(5); the existing "unimplemented actions"test is narrowed to
cordononly.Safe-by-default behaviour (unchanged from Phase 1)
dryRun=false(theCLI's
--confirm) creates theNetworkPolicywithmetav1.DryRunAll— validatedagainst admission, never persisted.
(
config.SkipNamespace) are rejected before any client call.NetworkPolicyrequires the opt-inmutating RBAC from helm-charts#855
(
capabilities.remediation=enable), which the default install does not grant —so a mistakenly-triggered quarantine fails closed with
Forbidden, mutatingnothing. No new RBAC is needed: #855 already provisioned
networking.k8s.io/networkpoliciescreate/delete/get/list.armoapi-go v0.0.720already carries thequarantineverb;this PR needs no
armoapi-gochange.Additional Information
scaling to zero destroys the container and wipes forensic evidence. A deny-all
policy cuts the workload off the network while leaving it running for
investigation. Destructive modes are a later, explicit opt-in.
NetworkPolicy.spec.podSelectoris the workload's ownpod-selector labels, read live at
Plantime. A workload with no selector labelsis rejected with a clear error rather than producing a policy that would match
every pod in the namespace.
succeeds;
revertis safe to call without knowing what was applied and cleans upeven if the original workload was deleted out from under it.
quarantineonly. Findings-driven selector targeting(
--control/--min-severity, reading the stored scan-result CRDs) and themulti-result status aggregation it requires are the focused next operator PR;
cordonand operator-native auto-remediation are Phase 3.How to Test
Manual end-to-end (cluster with the operator installed and
capabilities.remediation=enable), via the CLI once the pairedkubescapesubcommand lands, or by writing an
OperatorCommandCRD:# after a confirmed quarantine of Deployment payments/api: kubectl get networkpolicy -n payments kubescape-quarantine-api -o yaml kubectl get events -n payments --field-selector reason=KubescapeRemediationThe
NetworkPolicyshould select the deployment's pod labels withpolicyTypes: [Ingress, Egress]and no allow rules.Related issues/PRs
v0.0.720).Checklist before requesting a review
Summary by CodeRabbit