Skip to content

feat(remediation): add quarantine action (deny-all NetworkPolicy + dry-run)#388

Open
yugal07 wants to merge 1 commit into
kubescape:mainfrom
yugal07:feat/operator-action-quarantine
Open

feat(remediation): add quarantine action (deny-all NetworkPolicy + dry-run)#388
yugal07 wants to merge 1 commit into
kubescape:mainfrom
yugal07:feat/operator-action-quarantine

Conversation

@yugal07

@yugal07 yugal07 commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Related issue: kubescape/kubescape#1770 — Kubescape CLI control over cluster operations
Design: kubescape/designs-and-proposals#5 — Kubescape CLI Control over Cluster Operations (merged)
Contract: armosec/armoapi-go#655 — operatorAction command contract (merged, v0.0.720)
Builds on: kubescape/operator#383 — Phase 1 (annotate + dry-run) (merged) · RBAC from kubescape/helm-charts#855 (merged)

Overview

This is Phase 2 of the merged design: it teaches the operator to execute the
quarantine action. Phase 1 (#383)
shipped annotate and the extensible Remediator framework; this PR plugs the
next action into that framework with no pipeline or transport changes — a new
Remediator added to the registry, exactly as Phase 1 was designed to allow.

quarantine isolates a workload by creating a deny-all NetworkPolicy that
selects the workload's pods (both ingress and egress denied). Per the design's
resolved open question #3,
the default is non-destructive: the pods are not mutated or recreated, so
container state (memory, process trees) is preserved for forensic investigation.
Scale-to-zero / container-stop remains a future, explicit opt-in.

Signed Commits

  • Yes, I signed my commits.

What changed

  • mainhandler/remediators/quarantine.go (new)QuarantineRemediator
    implementing Plan/Apply/Revert:
    • Plan reads the target's pod selector from the live object
      (Deployment/StatefulSet/DaemonSet spec.selector.matchLabels, or a
      Pod's own labels) and computes a deny-all NetworkPolicy without touching
      the cluster. Reading the live selector isolates the existing pods without
      recreating them.
    • Apply creates the NetworkPolicy (deterministic name
      kubescape-quarantine-<target>); server-side dry-run when dryRun=true;
      idempotent (AlreadyExists → success, the workload is already isolated).
    • Revert deletes that NetworkPolicy; a missing policy is a clean no-op
      success; honors server-side dry-run.
    • Audit context (reason, findingRef, the target) is recorded on the
      NetworkPolicy's labels/annotations.
  • mainhandler/remediators/remediator.go — register quarantine in
    NewRegistry (one line; cordon remains the only stubbed action).
  • mainhandler/actionhandler.go — dispatch quarantine through a shared
    applyRemediation (Plan → Apply → record), and extend revert to undo
    every reversible action on the target (annotate + quarantine), idempotently.
    This is required because the contract's revert verb does not name which action
    it undoes; each Revert is a safe no-op when there is nothing to undo, and a
    target object that no longer exists (NotFound) is skipped so a leftover
    NetworkPolicy is still cleaned up.
  • Testsmainhandler/remediators/quarantine_test.go (11) and additions to
    mainhandler/actionhandler_test.go (5); the existing "unimplemented actions"
    test is narrowed to cordon only.

Safe-by-default behaviour (unchanged from Phase 1)

  • Dry-run is the default. A command without an explicit dryRun=false (the
    CLI's --confirm) creates the NetworkPolicy with metav1.DryRunAll — validated
    against admission, never persisted.
  • Namespace safety rail. Targets in excluded/protected namespaces
    (config.SkipNamespace) are rejected before any client call.
  • No default-install power. Creating NetworkPolicy requires the opt-in
    mutating RBAC from helm-charts#855
    (capabilities.remediation=enable), which the default install does not grant —
    so a mistakenly-triggered quarantine fails closed with Forbidden, mutating
    nothing. No new RBAC is needed: #855 already provisioned
    networking.k8s.io/networkpolicies create/delete/get/list.
  • No new contract. armoapi-go v0.0.720 already carries the quarantine verb;
    this PR needs no armoapi-go change.

Additional Information

  • Why deny-all NetworkPolicy and not scale-to-zero. The design resolved this:
    scaling to zero destroys the container and wipes forensic evidence. A deny-all
    policy cuts the workload off the network while leaving it running for
    investigation. Destructive modes are a later, explicit opt-in.
  • Selector source. The NetworkPolicy.spec.podSelector is the workload's own
    pod-selector labels, read live at Plan time. A workload with no selector labels
    is rejected with a clear error rather than producing a policy that would match
    every pod in the namespace.
  • Idempotency & revert symmetry. Re-quarantining an already-isolated workload
    succeeds; revert is safe to call without knowing what was applied and cleans up
    even if the original workload was deleted out from under it.
  • Scope. Explicit-target quarantine only. Findings-driven selector targeting
    (--control / --min-severity, reading the stored scan-result CRDs) and the
    multi-result status aggregation it requires are the focused next operator PR;
    cordon and operator-native auto-remediation are Phase 3.

How to Test

go build ./...                       # clean
go vet ./mainhandler/...             # clean
go test ./mainhandler/...            # pass (remediators + mainhandler)

Manual end-to-end (cluster with the operator installed and
capabilities.remediation=enable), via the CLI once the paired kubescape
subcommand lands, or by writing an OperatorCommand CRD:

# after a confirmed quarantine of Deployment payments/api:
kubectl get networkpolicy -n payments kubescape-quarantine-api -o yaml
kubectl get events -n payments --field-selector reason=KubescapeRemediation

The NetworkPolicy should select the deployment's pod labels with
policyTypes: [Ingress, Egress] and no allow rules.

Related issues/PRs

Checklist before requesting a review

  • My code follows the style guidelines of this project
  • I have commented on my code, particularly in hard-to-understand areas
  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests.
  • New and existing unit tests pass locally with my changes

Summary by CodeRabbit

  • New Features
    • Added a new quarantine action that isolates workloads by creating a deny-all network policy.
    • Revert now attempts to undo both annotation and quarantine actions when available.
  • Bug Fixes
    • Improved revert behavior to handle missing resources gracefully.
    • Quarantine actions now support dry-run and confirm modes consistently.
  • Tests
    • Added coverage for quarantine planning, apply/revert flows, excluded namespaces, and revert behavior.

…y-run)

Signed-off-by: yugal07 <yashsadhwani544@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a QuarantineRemediator that isolates workloads by creating deny-all NetworkPolicy objects. Registers it in NewRegistry, and refactors handleOperatorAction to dispatch annotate/quarantine through a shared applyRemediation helper and revert through a new revertTarget that iterates both reversible action types.

Changes

Quarantine Remediator and Action Handler Refactor

Layer / File(s) Summary
QuarantineRemediator: types, Plan, Apply, Revert
mainhandler/remediators/quarantine.go
Defines QuarantineRemediator with exported constants (LabelQuarantine, AnnotationQuarantineTarget), deterministic NetworkPolicy naming with truncation, pod selector resolution for Deployment/StatefulSet/DaemonSet/Pod, deny-all NetworkPolicy construction with audit annotations, and Plan/Apply/Revert methods treating AlreadyExists/NotFound as success.
Registry registration and action handler dispatch
mainhandler/remediators/remediator.go, mainhandler/actionhandler.go
Registers QuarantineRemediator in NewRegistry. Refactors handleOperatorAction switch to route annotate/quarantine through applyRemediation and revert through revertTarget, which iterates both reversible action types, aggregates descriptions, and ignores NotFound errors during cleanup.
Tests
mainhandler/remediators/quarantine_test.go, mainhandler/actionhandler_test.go
Adds unit tests for Plan (selector derivation, unsupported kinds, missing selector labels), Apply (confirm, dry-run, Pod labels, idempotency), Revert (found, not-found, dry-run), name truncation, and handler-level quarantine/revert integration tests including excluded namespace rejection.

Sequence Diagram

sequenceDiagram
  participant Client
  participant handleOperatorAction
  participant QuarantineRemediator
  participant KubeAPI

  Client->>handleOperatorAction: OperatorActionQuarantine(target, dryRun)
  handleOperatorAction->>QuarantineRemediator: Plan(ctx, Request)
  QuarantineRemediator->>KubeAPI: Get workload object
  KubeAPI-->>QuarantineRemediator: Deployment/Pod
  QuarantineRemediator-->>handleOperatorAction: Plan{NetworkPolicy JSON}
  handleOperatorAction->>QuarantineRemediator: Apply(ctx, plan, dryRun)
  QuarantineRemediator->>KubeAPI: Create NetworkPolicy [dryRun?]
  KubeAPI-->>QuarantineRemediator: ok / AlreadyExists
  QuarantineRemediator-->>handleOperatorAction: Result{Applied}
  handleOperatorAction-->>Client: result recorded

  Client->>handleOperatorAction: OperatorActionRevert(target)
  handleOperatorAction->>QuarantineRemediator: Revert(ctx, target, dryRun)
  QuarantineRemediator->>KubeAPI: Delete NetworkPolicy
  KubeAPI-->>QuarantineRemediator: ok / NotFound
  QuarantineRemediator-->>handleOperatorAction: Result{Applied}
  handleOperatorAction-->>Client: aggregated revert result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • kubescape/operator#383: Implements the Phase-1 operator action dispatch and remediator registry that this PR extends with the quarantine remediator and multi-action revert logic.

Suggested reviewers

  • matthyx

🐇 A network door slammed shut with care,
deny-all rules float in the air.
Plan, Apply, Revert — three hops of grace,
quarantine policies locked in place.
No pods shall pass without a trace! 🌐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding quarantine remediation with deny-all NetworkPolicy and dry-run support.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
mainhandler/actionhandler_test.go (1)

239-250: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert that excluded namespaces never issue a create.

Line 248 only checks the error path, so this test does not actually prove the “before any write” guarantee. Capture create on networkpolicies (or assert no NetworkPolicy exists afterward) so a regression that writes before failing cannot slip through.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mainhandler/actionhandler_test.go` around lines 239 - 250, The quarantine
test only verifies the error path and does not prove that no write occurs before
rejecting an excluded namespace. Update
TestHandleOperatorAction_QuarantineExcludedNamespace in actionhandler_test.go to
also observe writes through the fake clientset, either by asserting no
NetworkPolicy create happens on networkpolicies or by checking that no
NetworkPolicy exists after handleOperatorAction returns. Use the existing
handleOperatorAction and newActionHandlerForTest setup to ensure the rejection
happens before any create for the excluded kube-system target.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@mainhandler/remediators/quarantine.go`:
- Line 105: The quarantine NetworkPolicy name is not unique enough because
quarantineNPName currently keys only on target.Name, so different target kinds
can collide on the same policy. Update the naming logic used from
Quarantine/Remediate and the related delete/revert path to include the target
identity kind as part of the base name, and add a short hash of the full
identity when truncation is needed. Make sure the new unique name is used
consistently wherever the policy name is generated or looked up so different
targets do not share or delete each other’s NetworkPolicy.
- Around line 140-152: The workload selector handling in quarantine planning
only preserves MatchLabels, so Deployment/StatefulSet/DaemonSet selectors with
matchExpressions can be lost or applied incorrectly. Update the selector flow in
the remediator logic around the workload lookup branches and the
NetworkPolicySpec.PodSelector assignment to carry the full metav1.LabelSelector
through planning instead of converting it to labels-only, and make sure the
planning path uses the original selector from the workload object when building
the quarantine policy.

---

Nitpick comments:
In `@mainhandler/actionhandler_test.go`:
- Around line 239-250: The quarantine test only verifies the error path and does
not prove that no write occurs before rejecting an excluded namespace. Update
TestHandleOperatorAction_QuarantineExcludedNamespace in actionhandler_test.go to
also observe writes through the fake clientset, either by asserting no
NetworkPolicy create happens on networkpolicies or by checking that no
NetworkPolicy exists after handleOperatorAction returns. Use the existing
handleOperatorAction and newActionHandlerForTest setup to ensure the rejection
happens before any create for the excluded kube-system target.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cab49b84-5e06-443f-b548-0bf737de4bf6

📥 Commits

Reviewing files that changed from the base of the PR and between d425e08 and d0ef10f.

📒 Files selected for processing (5)
  • mainhandler/actionhandler.go
  • mainhandler/actionhandler_test.go
  • mainhandler/remediators/quarantine.go
  • mainhandler/remediators/quarantine_test.go
  • mainhandler/remediators/remediator.go

if err := validateTarget(t); err != nil {
return Result{}, err
}
name := quarantineNPName(t.Name)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Make the quarantine policy name unique per target identity.

quarantineNPName() keys only on target.Name, so Deployment/payments/api and StatefulSet/payments/api collide on the same NetworkPolicy. The second apply is then reported as success via AlreadyExists, and reverting either target deletes the other's policy. Include at least the kind in the name, and preferably a short hash of the full target identity when truncation is involved.

Also applies to: 181-181, 204-212

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mainhandler/remediators/quarantine.go` at line 105, The quarantine
NetworkPolicy name is not unique enough because quarantineNPName currently keys
only on target.Name, so different target kinds can collide on the same policy.
Update the naming logic used from Quarantine/Remediate and the related
delete/revert path to include the target identity kind as part of the base name,
and add a short hash of the full identity when truncation is needed. Make sure
the new unique name is used consistently wherever the policy name is generated
or looked up so different targets do not share or delete each other’s
NetworkPolicy.

Comment on lines +140 to +152
labels = matchLabels(o.Spec.Selector)
case "statefulset":
o, err := r.client.AppsV1().StatefulSets(t.Namespace).Get(ctx, t.Name, metav1.GetOptions{})
if err != nil {
return nil, err
}
labels = matchLabels(o.Spec.Selector)
case "daemonset":
o, err := r.client.AppsV1().DaemonSets(t.Namespace).Get(ctx, t.Name, metav1.GetOptions{})
if err != nil {
return nil, err
}
labels = matchLabels(o.Spec.Selector)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Preserve the full workload selector.
This only carries MatchLabels forward. A Deployment/StatefulSet/DaemonSet with matchExpressions will either fail planning (len(labels) == 0) or quarantine the wrong pods when the selector mixes labels and expressions. Carry the full metav1.LabelSelector through planning and assign it directly to NetworkPolicySpec.PodSelector.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mainhandler/remediators/quarantine.go` around lines 140 - 152, The workload
selector handling in quarantine planning only preserves MatchLabels, so
Deployment/StatefulSet/DaemonSet selectors with matchExpressions can be lost or
applied incorrectly. Update the selector flow in the remediator logic around the
workload lookup branches and the NetworkPolicySpec.PodSelector assignment to
carry the full metav1.LabelSelector through planning instead of converting it to
labels-only, and make sure the planning path uses the original selector from the
workload object when building the quarantine policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants