feat(remediation): add quarantine action (deny-all NetworkPolicy + dry-run) by yugal07 · Pull Request #388 · kubescape/operator

yugal07 · 2026-06-28T16:50:23Z

Related issue: kubescape/kubescape#1770 — Kubescape CLI control over cluster operations
Design: kubescape/designs-and-proposals#5 — Kubescape CLI Control over Cluster Operations (merged)
Contract: armosec/armoapi-go#655 — operatorAction command contract (merged, v0.0.720)
Builds on: kubescape/operator#383 — Phase 1 (annotate + dry-run) (merged) · RBAC from kubescape/helm-charts#855 (merged)

Overview

This is Phase 2 of the merged design: it teaches the operator to execute the
quarantine action. Phase 1 (#383)
shipped annotate and the extensible Remediator framework; this PR plugs the
next action into that framework with no pipeline or transport changes — a new
Remediator added to the registry, exactly as Phase 1 was designed to allow.

quarantine isolates a workload by creating a deny-all NetworkPolicy that
selects the workload's pods (both ingress and egress denied). Per the design's
resolved open question #3,
the default is non-destructive: the pods are not mutated or recreated, so
container state (memory, process trees) is preserved for forensic investigation.
Scale-to-zero / container-stop remains a future, explicit opt-in.

Signed Commits

Yes, I signed my commits.

What changed

mainhandler/remediators/quarantine.go (new) — QuarantineRemediator
implementing Plan/Apply/Revert:
- Plan reads the target's pod selector from the live object
  (Deployment/StatefulSet/DaemonSet spec.selector.matchLabels, or a
  Pod's own labels) and computes a deny-all NetworkPolicy without touching
  the cluster. Reading the live selector isolates the existing pods without
  recreating them.
- Apply creates the NetworkPolicy (deterministic name
  kubescape-quarantine-<target>); server-side dry-run when dryRun=true;
  idempotent (AlreadyExists → success, the workload is already isolated).
- Revert deletes that NetworkPolicy; a missing policy is a clean no-op
  success; honors server-side dry-run.
- Audit context (reason, findingRef, the target) is recorded on the
  NetworkPolicy's labels/annotations.
mainhandler/remediators/remediator.go — register quarantine in
NewRegistry (one line; cordon remains the only stubbed action).
mainhandler/actionhandler.go — dispatch quarantine through a shared
applyRemediation (Plan → Apply → record), and extend revert to undo
every reversible action on the target (annotate + quarantine), idempotently.
This is required because the contract's revert verb does not name which action
it undoes; each Revert is a safe no-op when there is nothing to undo, and a
target object that no longer exists (NotFound) is skipped so a leftover
NetworkPolicy is still cleaned up.
Tests — mainhandler/remediators/quarantine_test.go (11) and additions to
mainhandler/actionhandler_test.go (5); the existing "unimplemented actions"
test is narrowed to cordon only.

Safe-by-default behaviour (unchanged from Phase 1)

Dry-run is the default. A command without an explicit dryRun=false (the
CLI's --confirm) creates the NetworkPolicy with metav1.DryRunAll — validated
against admission, never persisted.
Namespace safety rail. Targets in excluded/protected namespaces
(config.SkipNamespace) are rejected before any client call.
No default-install power. Creating NetworkPolicy requires the opt-in
mutating RBAC from helm-charts#855
(capabilities.remediation=enable), which the default install does not grant —
so a mistakenly-triggered quarantine fails closed with Forbidden, mutating
nothing. No new RBAC is needed: #855 already provisioned
networking.k8s.io/networkpolicies create/delete/get/list.
No new contract. armoapi-go v0.0.720 already carries the quarantine verb;
this PR needs no armoapi-go change.

Additional Information

Why deny-all NetworkPolicy and not scale-to-zero. The design resolved this:
scaling to zero destroys the container and wipes forensic evidence. A deny-all
policy cuts the workload off the network while leaving it running for
investigation. Destructive modes are a later, explicit opt-in.
Selector source. The NetworkPolicy.spec.podSelector is the workload's own
pod-selector labels, read live at Plan time. A workload with no selector labels
is rejected with a clear error rather than producing a policy that would match
every pod in the namespace.
Idempotency & revert symmetry. Re-quarantining an already-isolated workload
succeeds; revert is safe to call without knowing what was applied and cleans up
even if the original workload was deleted out from under it.
Scope. Explicit-target quarantine only. Findings-driven selector targeting
(--control / --min-severity, reading the stored scan-result CRDs) and the
multi-result status aggregation it requires are the focused next operator PR;
cordon and operator-native auto-remediation are Phase 3.

How to Test

go build ./...                       # clean
go vet ./mainhandler/...             # clean
go test ./mainhandler/...            # pass (remediators + mainhandler)

Manual end-to-end (cluster with the operator installed and
capabilities.remediation=enable), via the CLI once the paired kubescape
subcommand lands, or by writing an OperatorCommand CRD:

# after a confirmed quarantine of Deployment payments/api:
kubectl get networkpolicy -n payments kubescape-quarantine-api -o yaml
kubectl get events -n payments --field-selector reason=KubescapeRemediation

The NetworkPolicy should select the deployment's pod labels with
policyTypes: [Ingress, Egress] and no allow rules.

Related issues/PRs

Implements (in part): #1770 — CLI control over cluster operations (Phase 2).
Design: kubescape/designs-and-proposals#5 (merged).
Contract: armosec/armoapi-go#655 (merged, v0.0.720).
Builds on: kubescape/operator#383 (merged) — Phase 1 framework.
RBAC/install: kubescape/helm-charts#855 (merged) — already grants the NetworkPolicy RBAC.

Checklist before requesting a review

My code follows the style guidelines of this project
I have commented on my code, particularly in hard-to-understand areas
I have performed a self-review of my code
If it is a core feature, I have added thorough tests.
New and existing unit tests pass locally with my changes

Summary by CodeRabbit

New Features
- Added a new quarantine action that isolates workloads by creating a deny-all network policy.
- Revert now attempts to undo both annotation and quarantine actions when available.
Bug Fixes
- Improved revert behavior to handle missing resources gracefully.
- Quarantine actions now support dry-run and confirm modes consistently.
Tests
- Added coverage for quarantine planning, apply/revert flows, excluded namespaces, and revert behavior.

…y-run) Signed-off-by: yugal07 <yashsadhwani544@gmail.com>

coderabbitai · 2026-06-28T16:50:38Z

📝 Walkthrough

Walkthrough

Adds a QuarantineRemediator that isolates workloads by creating deny-all NetworkPolicy objects. Registers it in NewRegistry, and refactors handleOperatorAction to dispatch annotate/quarantine through a shared applyRemediation helper and revert through a new revertTarget that iterates both reversible action types.

Changes

Quarantine Remediator and Action Handler Refactor

Layer / File(s)	Summary
QuarantineRemediator: types, Plan, Apply, Revert `mainhandler/remediators/quarantine.go`	Defines `QuarantineRemediator` with exported constants (`LabelQuarantine`, `AnnotationQuarantineTarget`), deterministic `NetworkPolicy` naming with truncation, pod selector resolution for Deployment/StatefulSet/DaemonSet/Pod, deny-all `NetworkPolicy` construction with audit annotations, and `Plan`/`Apply`/`Revert` methods treating `AlreadyExists`/`NotFound` as success.
Registry registration and action handler dispatch `mainhandler/remediators/remediator.go`, `mainhandler/actionhandler.go`	Registers `QuarantineRemediator` in `NewRegistry`. Refactors `handleOperatorAction` switch to route annotate/quarantine through `applyRemediation` and revert through `revertTarget`, which iterates both reversible action types, aggregates descriptions, and ignores `NotFound` errors during cleanup.
Tests `mainhandler/remediators/quarantine_test.go`, `mainhandler/actionhandler_test.go`	Adds unit tests for `Plan` (selector derivation, unsupported kinds, missing selector labels), `Apply` (confirm, dry-run, Pod labels, idempotency), `Revert` (found, not-found, dry-run), name truncation, and handler-level quarantine/revert integration tests including excluded namespace rejection.

Sequence Diagram

sequenceDiagram
  participant Client
  participant handleOperatorAction
  participant QuarantineRemediator
  participant KubeAPI

  Client->>handleOperatorAction: OperatorActionQuarantine(target, dryRun)
  handleOperatorAction->>QuarantineRemediator: Plan(ctx, Request)
  QuarantineRemediator->>KubeAPI: Get workload object
  KubeAPI-->>QuarantineRemediator: Deployment/Pod
  QuarantineRemediator-->>handleOperatorAction: Plan{NetworkPolicy JSON}
  handleOperatorAction->>QuarantineRemediator: Apply(ctx, plan, dryRun)
  QuarantineRemediator->>KubeAPI: Create NetworkPolicy [dryRun?]
  KubeAPI-->>QuarantineRemediator: ok / AlreadyExists
  QuarantineRemediator-->>handleOperatorAction: Result{Applied}
  handleOperatorAction-->>Client: result recorded

  Client->>handleOperatorAction: OperatorActionRevert(target)
  handleOperatorAction->>QuarantineRemediator: Revert(ctx, target, dryRun)
  QuarantineRemediator->>KubeAPI: Delete NetworkPolicy
  KubeAPI-->>QuarantineRemediator: ok / NotFound
  QuarantineRemediator-->>handleOperatorAction: Result{Applied}
  handleOperatorAction-->>Client: aggregated revert result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

kubescape/operator#383: Implements the Phase-1 operator action dispatch and remediator registry that this PR extends with the quarantine remediator and multi-action revert logic.

Suggested reviewers

matthyx

🐇 A network door slammed shut with care,
deny-all rules float in the air.
Plan, Apply, Revert — three hops of grace,
quarantine policies locked in place.
No pods shall pass without a trace! 🌐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.48% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding quarantine remediation with deny-all NetworkPolicy and dry-run support.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

mainhandler/actionhandler_test.go (1)
239-250: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert that excluded namespaces never issue a create.

Line 248 only checks the error path, so this test does not actually prove the “before any write” guarantee. Capture create on networkpolicies (or assert no NetworkPolicy exists afterward) so a regression that writes before failing cannot slip through.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mainhandler/actionhandler_test.go` around lines 239 - 250, The quarantine
test only verifies the error path and does not prove that no write occurs before
rejecting an excluded namespace. Update
TestHandleOperatorAction_QuarantineExcludedNamespace in actionhandler_test.go to
also observe writes through the fake clientset, either by asserting no
NetworkPolicy create happens on networkpolicies or by checking that no
NetworkPolicy exists after handleOperatorAction returns. Use the existing
handleOperatorAction and newActionHandlerForTest setup to ensure the rejection
happens before any create for the excluded kube-system target.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@mainhandler/remediators/quarantine.go`:
- Line 105: The quarantine NetworkPolicy name is not unique enough because
quarantineNPName currently keys only on target.Name, so different target kinds
can collide on the same policy. Update the naming logic used from
Quarantine/Remediate and the related delete/revert path to include the target
identity kind as part of the base name, and add a short hash of the full
identity when truncation is needed. Make sure the new unique name is used
consistently wherever the policy name is generated or looked up so different
targets do not share or delete each other’s NetworkPolicy.
- Around line 140-152: The workload selector handling in quarantine planning
only preserves MatchLabels, so Deployment/StatefulSet/DaemonSet selectors with
matchExpressions can be lost or applied incorrectly. Update the selector flow in
the remediator logic around the workload lookup branches and the
NetworkPolicySpec.PodSelector assignment to carry the full metav1.LabelSelector
through planning instead of converting it to labels-only, and make sure the
planning path uses the original selector from the workload object when building
the quarantine policy.

---

Nitpick comments:
In `@mainhandler/actionhandler_test.go`:
- Around line 239-250: The quarantine test only verifies the error path and does
not prove that no write occurs before rejecting an excluded namespace. Update
TestHandleOperatorAction_QuarantineExcludedNamespace in actionhandler_test.go to
also observe writes through the fake clientset, either by asserting no
NetworkPolicy create happens on networkpolicies or by checking that no
NetworkPolicy exists after handleOperatorAction returns. Use the existing
handleOperatorAction and newActionHandlerForTest setup to ensure the rejection
happens before any create for the excluded kube-system target.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cab49b84-5e06-443f-b548-0bf737de4bf6

📥 Commits

Reviewing files that changed from the base of the PR and between d425e08 and d0ef10f.

📒 Files selected for processing (5)

mainhandler/actionhandler.go
mainhandler/actionhandler_test.go
mainhandler/remediators/quarantine.go
mainhandler/remediators/quarantine_test.go
mainhandler/remediators/remediator.go

coderabbitai · 2026-06-28T16:57:25Z

+	if err := validateTarget(t); err != nil {
+		return Result{}, err
+	}
+	name := quarantineNPName(t.Name)


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Make the quarantine policy name unique per target identity.

quarantineNPName() keys only on target.Name, so Deployment/payments/api and StatefulSet/payments/api collide on the same NetworkPolicy. The second apply is then reported as success via AlreadyExists, and reverting either target deletes the other's policy. Include at least the kind in the name, and preferably a short hash of the full target identity when truncation is involved.

Also applies to: 181-181, 204-212

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mainhandler/remediators/quarantine.go` at line 105, The quarantine NetworkPolicy name is not unique enough because quarantineNPName currently keys only on target.Name, so different target kinds can collide on the same policy. Update the naming logic used from Quarantine/Remediate and the related delete/revert path to include the target identity kind as part of the base name, and add a short hash of the full identity when truncation is needed. Make sure the new unique name is used consistently wherever the policy name is generated or looked up so different targets do not share or delete each other’s NetworkPolicy.

coderabbitai · 2026-06-28T16:57:25Z

+		labels = matchLabels(o.Spec.Selector)
+	case "statefulset":
+		o, err := r.client.AppsV1().StatefulSets(t.Namespace).Get(ctx, t.Name, metav1.GetOptions{})
+		if err != nil {
+			return nil, err
+		}
+		labels = matchLabels(o.Spec.Selector)
+	case "daemonset":
+		o, err := r.client.AppsV1().DaemonSets(t.Namespace).Get(ctx, t.Name, metav1.GetOptions{})
+		if err != nil {
+			return nil, err
+		}
+		labels = matchLabels(o.Spec.Selector)


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Preserve the full workload selector.
This only carries MatchLabels forward. A Deployment/StatefulSet/DaemonSet with matchExpressions will either fail planning (len(labels) == 0) or quarantine the wrong pods when the selector mixes labels and expressions. Carry the full metav1.LabelSelector through planning and assign it directly to NetworkPolicySpec.PodSelector.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mainhandler/remediators/quarantine.go` around lines 140 - 152, The workload selector handling in quarantine planning only preserves MatchLabels, so Deployment/StatefulSet/DaemonSet selectors with matchExpressions can be lost or applied incorrectly. Update the selector flow in the remediator logic around the workload lookup branches and the NetworkPolicySpec.PodSelector assignment to carry the full metav1.LabelSelector through planning instead of converting it to labels-only, and make sure the planning path uses the original selector from the workload object when building the quarantine policy.

feat(remediation): add quarantine action (deny-all NetworkPolicy + dr…

d0ef10f

…y-run) Signed-off-by: yugal07 <yashsadhwani544@gmail.com>

coderabbitai Bot reviewed Jun 28, 2026

View reviewed changes

matthyx added this to KS PRs tracking Jun 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(remediation): add quarantine action (deny-all NetworkPolicy + dry-run)#388

feat(remediation): add quarantine action (deny-all NetworkPolicy + dry-run)#388
yugal07 wants to merge 1 commit into
kubescape:mainfrom
yugal07:feat/operator-action-quarantine

yugal07 commented Jun 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Uh oh!

coderabbitai Bot Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yugal07 commented Jun 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What changed

Safe-by-default behaviour (unchanged from Phase 1)

Additional Information

How to Test

Related issues/PRs

Checklist before requesting a review

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yugal07 commented Jun 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading