feat: handle operatorAction commands (annotate + dry run) #383
Conversation
…+ dry-run) Signed-off-by: yugal07 <yashsadhwani544@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughImplements operator action execution with Phase-1 safety constraints, defining remediator abstractions and an annotation-based remediator for plan/apply/revert workflows with server-side dry-run support. Integrates action routing into the request handler, enforces explicit targeting with excluded namespace checks, records results to command status as JSON payloads, emits audit events, and adds comprehensive tests plus a dependency bump. ChangesOperator Action Remediation Implementation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: yugal07 <yashsadhwani544@gmail.com>
matthyx
left a comment
There was a problem hiding this comment.
Reviewed Phase-1 remediation. Pulled the branch: go build ./..., go vet ./... and the new mainhandler / mainhandler/remediators tests all pass. The design is clean — the Remediator registry, two-layer server-side dry-run, the TTL/selector/quarantine/cordon fences, and the namespace rail are all solid, and the test coverage is good.
One issue rises to blocker level (the revert path silently breaks the PR's own safe-by-default contract), plus a couple of minor notes below. Everything else looks good to merge once the revert concern is addressed or explicitly scoped out.
| if err != nil { | ||
| return Result{}, err | ||
| } | ||
| if err := r.patch(ctx, t, patch, false); err != nil { |
There was a problem hiding this comment.
Blocker: Revert performs a real cluster write regardless of the dry-run flag, breaking the safe-by-default contract.
handleOperatorAction computes dryRun := args.IsDryRun() but never passes it down the revert path — Revert hard-codes r.patch(ctx, t, patch, false). The doc comment on handleOperatorAction states the invariant explicitly: "a missing flag can never silently perform a real cluster write." That holds for annotate but not for revert: a caller that sends a revert action without dryRun=false (i.e. the default, where IsDryRun() returns true) expects a preview but gets a real mutation.
Blast radius is low (it only deletes the three kubescape.io/* annotations the operator added), but the contract violation is real and the path is reachable today by any command sender, even though the CLI subcommand ships later.
Suggested fix — honor the flag for revert too, e.g. give Revert a dryRun bool parameter and pass dryRun from the switch:
case apis.OperatorActionRevert:
result, err := r.Revert(ctx, target, dryRun)and inside Revert: r.patch(ctx, t, patch, dryRun). If revert is intentionally always-write for Phase 1, please drop the blanket safety claim from the handleOperatorAction doc comment and note the exception, so the contract and the code agree.
| } | ||
|
|
||
| // Safety rail: never act on excluded / protected namespaces. | ||
| if target.Namespace != "" && actionHandler.config.SkipNamespace(target.Namespace) { |
There was a problem hiding this comment.
Minor: the namespace safety rail is skipped when target.Namespace == "".
The guard only runs SkipNamespace when a namespace is present, while validateTarget requires only Kind and Name. A target with an empty namespace therefore bypasses the excluded-namespace check. In practice the four supported kinds (Deployment/StatefulSet/DaemonSet/Pod) are all namespaced, so the subsequent typed patch with namespace "" would fail at the API server rather than touch a protected namespace — so this is self-limiting today, not exploitable. Still worth tightening: require a non-empty namespace for namespaced kinds (reject up front), so the rail is enforced rather than relying on an API-server 404.
…t/operator-action-remediation
Signed-off-by: yugal07 <yashsadhwani544@gmail.com>
matthyx
left a comment
There was a problem hiding this comment.
Re-reviewed after 8525ec6 ("fix(remediation): make revert respect dry-run and enforce namespace rail"). Both findings from my previous review are resolved:
- Revert now honors dry-run.
Reverttakes adryRunparam threaded fromhandleOperatorAction, the interface andResult(DryRun/Applied) are updated accordingly, and a default (no--confirm) revert now issues a server-side dry-run instead of a real write — the safe-by-default contract holds for revert too. Covered byTestHandleOperatorAction_RevertDefaultsToDryRunandTestAnnotateRevertDryRun. - Namespace rail enforced up front. New
IsNamespacedKindcheck rejects a namespaced target with no namespace before the excluded-namespace check, so nothing slips past the rail to fail late at the API server. Covered byTestHandleOperatorAction_NamespacedKindRequiresNamespace.
Pulled the branch: go build ./..., go vet ./..., and the mainhandler / mainhandler/remediators tests all pass. No remaining blockers from me — LGTM for Phase 1.
operator: handle
operatorActioncommands — Phase 1 remediation (annotate + dry-run)Related issue: kubescape/kubescape#1770 — Kubescape CLI control over cluster operations
Design: kubescape/designs-and-proposals#5 — Kubescape CLI Control over Cluster Operations (merged)
Contract: armosec/armoapi-go#655 —
operatorActioncommand contract (merged, shipped inv0.0.720)Pairs with: kubescape/helm-charts#855
What this PR does
This is Phase 1 of the merged design: it teaches the in-cluster operator to
execute a
TypeOperatorActioncommand end to end, with the lowest-blast-radiusaction (
annotate) and dry-run as the default. It proves theCLI → operator →
OperatorCommandstatus pipeline with essentially zero risk,and lays the extensible
Remediatorframework thatquarantine/cordonplug into in later phases.
No new transport, no new endpoint: remediation is a new verb on the existing
apis.Commandpipeline that already powerskubescape operator scan.go.mod/go.sum— bumparmosec/armoapi-gov0.0.673→v0.0.720topick up the merged
operatorActioncontract (TypeOperatorAction,OperatorActionArgs,IsDryRun,ToArgs/OperatorActionArgsFromMap).mainhandler/handlerequests.go— wire dispatch in two places:case apis.TypeOperatorActioninrunCommand(→handleOperatorAction), andin the itemize switch so an action is handled as a single request (it carries
its own target) rather than fanned out per pod.
mainhandler/actionhandler.go(new) —handleOperatorAction: parses thetyped args, enforces the Phase-1 safety rails, dispatches to the matching
Remediator, writes the result to theOperatorCommandstatus payload, andemits a best-effort Kubernetes Event for audit.
mainhandler/remediators/remediator.go(new) — theRemediatorinterface(
Plan/Apply/Revert), theTarget/Request/Plan/Resulttypes, andNewRegistry(annotate only for now; new actions are added by extending the map,not the pipeline).
mainhandler/remediators/annotate.go(new) —AnnotateRemediatoroverDeployment/StatefulSet/DaemonSet/Pod: a JSON merge patch that sets the
kubescape.io/remediated,…/remediation-reason,…/remediation-finding-refannotations;
Revertdeletes them (null-merge); server-side dry-run support.mainhandler/remediators/annotate_test.go,mainhandler/actionhandler_test.go(new) — 16 unit tests.
Safe-by-default behaviour
OperatorActionArgs.IsDryRun()returnstrueunless the caller set
dryRun=falseexplicitly (the CLI's--confirm), so amissing flag can never silently perform a real write.
with server-side
metav1.DryRunAllso the change is validated againstadmission controllers, never persisted.
(
config.SkipNamespace) are rejected before any client call.default install does not grant (see the paired helm-charts PR). Without it,
annotatereturns a cleanForbiddenthat is recorded on the command status —nothing is mutated.
quarantine/cordonreturn a clear"not implemented yet (later phase)" error; selector-driven (findings) targeting
returns a clear "phase 2" error; unknown actions error explicitly. Phase 1 acts
only on an explicit
target.Audit trail
Every action records its
Plan/Result(action, target, dryRun, applied) on theOperatorCommandstatuspayload(the GitOps- and Headlamp-friendly channel) andemits a
KubescapeRemediationKubernetes Event on the target's namespace.Testing
go build ./...andgo vet ./...— clean.mainhandler,mainhandler/remediators); dry-run assertions usea pass-through reactor to verify the
PatchOptions.DryRunactually sent —necessary because the fake clientset's tracker ignores server-side dry-run and
always persists, so "did not persist" cannot be asserted directly.
capabilities.json(now carrying the new
remediationkey) parses cleanly through the operator'sreal
LoadCapabilitiesConfig; unknown keys are tolerated (viper default), exactlyas the existing
riskAcceptancekey already is.Phasing
This is Phase 1 of the merged design. Next: the
kubescapeCLIoperator remediate annotatesubcommand (completes the user-facing loop), thenPhase 2
quarantine+ findings-driven targeting, then Phase 3cordon+operator-native auto-remediation. The
Remediatorregistry is designed so eachadds an implementation without touching the command pipeline.
Summary by CodeRabbit