Skip to content

feat: handle operatorAction commands (annotate + dry run) #383

Merged
matthyx merged 4 commits into
kubescape:mainfrom
yugal07:feat/operator-action-remediation
Jun 22, 2026
Merged

feat: handle operatorAction commands (annotate + dry run) #383
matthyx merged 4 commits into
kubescape:mainfrom
yugal07:feat/operator-action-remediation

Conversation

@yugal07

@yugal07 yugal07 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

operator: handle operatorAction commands — Phase 1 remediation (annotate + dry-run)

Related issue: kubescape/kubescape#1770 — Kubescape CLI control over cluster operations
Design: kubescape/designs-and-proposals#5 — Kubescape CLI Control over Cluster Operations (merged)
Contract: armosec/armoapi-go#655 — operatorAction command contract (merged, shipped in v0.0.720)
Pairs with: kubescape/helm-charts#855

What this PR does

This is Phase 1 of the merged design: it teaches the in-cluster operator to
execute a TypeOperatorAction command end to end, with the lowest-blast-radius
action (annotate) and dry-run as the default. It proves the
CLI → operator → OperatorCommand status pipeline with essentially zero risk,
and lays the extensible Remediator framework that quarantine/cordon
plug into in later phases.

No new transport, no new endpoint: remediation is a new verb on the existing
apis.Command pipeline that already powers kubescape operator scan.

  • go.mod / go.sum — bump armosec/armoapi-go v0.0.673v0.0.720 to
    pick up the merged operatorAction contract (TypeOperatorAction,
    OperatorActionArgs, IsDryRun, ToArgs/OperatorActionArgsFromMap).
  • mainhandler/handlerequests.go — wire dispatch in two places:
    case apis.TypeOperatorAction in runCommand (→ handleOperatorAction), and
    in the itemize switch so an action is handled as a single request (it carries
    its own target) rather than fanned out per pod.
  • mainhandler/actionhandler.go (new)handleOperatorAction: parses the
    typed args, enforces the Phase-1 safety rails, dispatches to the matching
    Remediator, writes the result to the OperatorCommand status payload, and
    emits a best-effort Kubernetes Event for audit.
  • mainhandler/remediators/remediator.go (new) — the Remediator interface
    (Plan/Apply/Revert), the Target/Request/Plan/Result types, and
    NewRegistry (annotate only for now; new actions are added by extending the map,
    not the pipeline).
  • mainhandler/remediators/annotate.go (new)AnnotateRemediator over
    Deployment/StatefulSet/DaemonSet/Pod: a JSON merge patch that sets the
    kubescape.io/remediated, …/remediation-reason, …/remediation-finding-ref
    annotations; Revert deletes them (null-merge); server-side dry-run support.
  • mainhandler/remediators/annotate_test.go, mainhandler/actionhandler_test.go
    (new) — 16 unit tests.

Safe-by-default behaviour

  • Dry-run is the default. OperatorActionArgs.IsDryRun() returns true
    unless the caller set dryRun=false explicitly (the CLI's --confirm), so a
    missing flag can never silently perform a real write.
  • Two-layer dry-run, matching the design. A dry-run still issues the patch
    with server-side metav1.DryRunAll so the change is validated against
    admission controllers, never persisted.
  • Namespace safety rail. Targets in excluded/protected namespaces
    (config.SkipNamespace) are rejected before any client call.
  • No default-install power. The operator gates real writes on RBAC, which the
    default install does not grant (see the paired helm-charts PR). Without it,
    annotate returns a clean Forbidden that is recorded on the command status —
    nothing is mutated.
  • Explicit scope fences. quarantine/cordon return a clear
    "not implemented yet (later phase)" error; selector-driven (findings) targeting
    returns a clear "phase 2" error; unknown actions error explicitly. Phase 1 acts
    only on an explicit target.

Audit trail

Every action records its Plan/Result (action, target, dryRun, applied) on the
OperatorCommand status payload (the GitOps- and Headlamp-friendly channel) and
emits a KubescapeRemediation Kubernetes Event on the target's namespace.

Testing

  • go build ./... and go vet ./... — clean.
  • New tests pass (mainhandler, mainhandler/remediators); dry-run assertions use
    a pass-through reactor to verify the PatchOptions.DryRun actually sent —
    necessary because the fake clientset's tracker ignores server-side dry-run and
    always persists, so "did not persist" cannot be asserted directly.
  • Cross-component check with helm-charts: the chart-rendered capabilities.json
    (now carrying the new remediation key) parses cleanly through the operator's
    real LoadCapabilitiesConfig; unknown keys are tolerated (viper default), exactly
    as the existing riskAcceptance key already is.

Phasing

This is Phase 1 of the merged design. Next: the kubescape CLI
operator remediate annotate subcommand (completes the user-facing loop), then
Phase 2 quarantine + findings-driven targeting, then Phase 3 cordon +
operator-native auto-remediation. The Remediator registry is designed so each
adds an implementation without touching the command pipeline.

Summary by CodeRabbit

  • New Features
    • Added Phase‑1 operator action support for Annotate and Revert on workloads, including server-side dry-run support and requiring an explicit target.
    • Added action result recording with best-effort event emission for visibility.
    • Added safety validations: excluded namespaces blocked, namespaced kinds must include a namespace, selector targeting not supported, and TTL values are rejected as unsupported.
  • Bug Fixes
    • Improved request routing so operator actions are handled as a single action request.
  • Tests
    • Expanded coverage for dry-run vs confirm behavior, revert annotation removal, and input/error cases.
  • Chores
    • Updated a pinned dependency version.

…+ dry-run)

Signed-off-by: yugal07 <yashsadhwani544@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6ac0e345-f5b8-46f5-a980-023948bba300

📥 Commits

Reviewing files that changed from the base of the PR and between 85df9a1 and 8525ec6.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • go.mod
  • mainhandler/actionhandler.go
  • mainhandler/actionhandler_test.go
  • mainhandler/remediators/annotate.go
  • mainhandler/remediators/annotate_test.go
  • mainhandler/remediators/remediator.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • mainhandler/remediators/remediator.go
  • mainhandler/remediators/annotate_test.go
  • mainhandler/actionhandler.go
  • mainhandler/actionhandler_test.go

📝 Walkthrough

Walkthrough

Implements operator action execution with Phase-1 safety constraints, defining remediator abstractions and an annotation-based remediator for plan/apply/revert workflows with server-side dry-run support. Integrates action routing into the request handler, enforces explicit targeting with excluded namespace checks, records results to command status as JSON payloads, emits audit events, and adds comprehensive tests plus a dependency bump.

Changes

Operator Action Remediation Implementation

Layer / File(s) Summary
Remediator abstractions and registry
mainhandler/remediators/remediator.go
Target, Request, Plan, Result types and Remediator interface with Plan, Apply(dryRun), and Revert(dryRun) methods; NewRegistry registers remediators by action type.
AnnotateRemediator implementation and tests
mainhandler/remediators/annotate.go, mainhandler/remediators/annotate_test.go
Annotate remediator builds JSON merge patches for metadata.annotations with remediation markers, reason, and finding reference; Plan returns patch payload, Apply sends patches with optional metav1.DryRunAll, Revert deletes remediation keys; tests cover planning, dry-run vs confirm writes, supported kinds (Deployment/StatefulSet/DaemonSet/Pod), unsupported-kind errors, and revert preservation of unrelated annotations.
Action handler execution with Phase-1 constraints and audit
mainhandler/actionhandler.go, mainhandler/actionhandler_test.go
Parses operator action args, requires non-empty action and explicit target, rejects selector targeting and TTL/auto-revert inputs, blocks excluded namespaces via config, dispatches Annotate/Revert to remediators with Reason/FindingRef, records JSON payload into OperatorCommand status, emits best-effort Kubernetes Events with outcome messages, and logs completion details; tests cover dry-run defaulting, confirm writes, revert, targeting errors, action validation, and TTL validation.
Request routing for operator actions
mainhandler/handlerequests.go
Treats operator actions as non-itemized requests (disables scope itemization) and routes apis.TypeOperatorAction to the action handler.
Dependency update
go.mod
Bumps github.com/armosec/armoapi-go from v0.0.719 to v0.0.720.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • kubescape/operator#353: Both PRs update the same dependency github.com/armosec/armoapi-go to provide operator action APIs.

Poem

🐰 I hop through patches and marshal the plan,
I nudge annotations with a careful hand,
Phase-one guards watch every little hop,
Events whisper audits that never stop,
Remediations taped — then I softly stand.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing operator action command handling with annotate and dry-run support.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: yugal07 <yashsadhwani544@gmail.com>
@matthyx matthyx moved this to Needs Reviewer in KS PRs tracking Jun 19, 2026

@matthyx matthyx left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed Phase-1 remediation. Pulled the branch: go build ./..., go vet ./... and the new mainhandler / mainhandler/remediators tests all pass. The design is clean — the Remediator registry, two-layer server-side dry-run, the TTL/selector/quarantine/cordon fences, and the namespace rail are all solid, and the test coverage is good.

One issue rises to blocker level (the revert path silently breaks the PR's own safe-by-default contract), plus a couple of minor notes below. Everything else looks good to merge once the revert concern is addressed or explicitly scoped out.

Comment thread mainhandler/remediators/annotate.go Outdated
if err != nil {
return Result{}, err
}
if err := r.patch(ctx, t, patch, false); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker: Revert performs a real cluster write regardless of the dry-run flag, breaking the safe-by-default contract.

handleOperatorAction computes dryRun := args.IsDryRun() but never passes it down the revert path — Revert hard-codes r.patch(ctx, t, patch, false). The doc comment on handleOperatorAction states the invariant explicitly: "a missing flag can never silently perform a real cluster write." That holds for annotate but not for revert: a caller that sends a revert action without dryRun=false (i.e. the default, where IsDryRun() returns true) expects a preview but gets a real mutation.

Blast radius is low (it only deletes the three kubescape.io/* annotations the operator added), but the contract violation is real and the path is reachable today by any command sender, even though the CLI subcommand ships later.

Suggested fix — honor the flag for revert too, e.g. give Revert a dryRun bool parameter and pass dryRun from the switch:

case apis.OperatorActionRevert:
    result, err := r.Revert(ctx, target, dryRun)

and inside Revert: r.patch(ctx, t, patch, dryRun). If revert is intentionally always-write for Phase 1, please drop the blanket safety claim from the handleOperatorAction doc comment and note the exception, so the contract and the code agree.

}

// Safety rail: never act on excluded / protected namespaces.
if target.Namespace != "" && actionHandler.config.SkipNamespace(target.Namespace) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: the namespace safety rail is skipped when target.Namespace == "".

The guard only runs SkipNamespace when a namespace is present, while validateTarget requires only Kind and Name. A target with an empty namespace therefore bypasses the excluded-namespace check. In practice the four supported kinds (Deployment/StatefulSet/DaemonSet/Pod) are all namespaced, so the subsequent typed patch with namespace "" would fail at the API server rather than touch a protected namespace — so this is self-limiting today, not exploitable. Still worth tightening: require a non-empty namespace for namespaced kinds (reject up front), so the rail is enforced rather than relying on an API-server 404.

@matthyx matthyx left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after 8525ec6 ("fix(remediation): make revert respect dry-run and enforce namespace rail"). Both findings from my previous review are resolved:

  • Revert now honors dry-run. Revert takes a dryRun param threaded from handleOperatorAction, the interface and Result (DryRun/Applied) are updated accordingly, and a default (no --confirm) revert now issues a server-side dry-run instead of a real write — the safe-by-default contract holds for revert too. Covered by TestHandleOperatorAction_RevertDefaultsToDryRun and TestAnnotateRevertDryRun.
  • Namespace rail enforced up front. New IsNamespacedKind check rejects a namespaced target with no namespace before the excluded-namespace check, so nothing slips past the rail to fail late at the API server. Covered by TestHandleOperatorAction_NamespacedKindRequiresNamespace.

Pulled the branch: go build ./..., go vet ./..., and the mainhandler / mainhandler/remediators tests all pass. No remaining blockers from me — LGTM for Phase 1.

@matthyx matthyx merged commit d425e08 into kubescape:main Jun 22, 2026
11 checks passed
@matthyx matthyx moved this from Needs Reviewer to To Archive in KS PRs tracking Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To Archive

Development

Successfully merging this pull request may close these issues.

2 participants