feat(policy): add agentic approval loop#1528
Conversation
e135a1c to
a68b370
Compare
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test a68b370 |
|
proposal_approval_mode is currently registered as a free-form string setting, so Can we add key-specific validation for Places I’d expect coverage:
I’d still keep |
a68b370 to
8984c37
Compare
Signed-off-by: Alexander Watson <zredlined@gmail.com>
…roval Run the prover on every proposal regardless of analysis_mode. Auto-approve proposals whose merged-policy delta is empty (proposer-agnostic, with the global-policy gate respected). Calibrate prover findings to a single HIGH severity emitted on link-local hosts, L4+credential-in-scope, and bypass-L7-binary+credential-in-scope. Add implicit supersede on (host, port, binary): newer submissions auto-reject older pending chunks, and incoming mechanistic chunks auto-reject when an approved agent_authored chunk already covers the same endpoint. Audit auto-approvals via CONFIG:APPROVED OCSF events carrying auto=true, source=<mode>, prover_delta=empty as unmapped fields, with message text "auto-approved: no new prover findings". Build credential set from sandbox-attached providers (presence only — no scope modeling in v1). Signed-off-by: Alexander Watson <zredlined@gmail.com>
Signed-off-by: Alexander Watson <zredlined@gmail.com>
The prover now answers four formal questions about a proposed policy change and emits one finding per "yes" answer: - link_local_reach - l7_bypass_credentialed - credential_reach_expansion - capability_expansion There is no severity grade. The category name is the signal; the per-path evidence carries the structured detail. The auto-approval gate is binary — empty delta or not. This removes the previous HIGH/MEDIUM/CRITICAL severity tiers and the narrowness classifier that was inconsistent across the access-shorthand / explicit-rules boundary. Gateway-side finding_delta gains category suppression: capability_expansion paths whose (binary, host, port) appears in the credential_reach_expansion delta are suppressed, so a brand-new credentialed reach surfaces as one finding rather than one reach plus N method findings. The github provider profile now defaults api.github.com to read-only (was: read-write). Writes flow through the agentic loop — the prover audits each capability change rather than treating broad write access as the default. Demo, sandbox skill, and architecture docs updated to describe the four-category model. Prover gains a README.md documenting the formal queries, evidence shape, and how to add a new category. Signed-off-by: Alexander Watson <zredlined@gmail.com>
Signed-off-by: Alexander Watson <zredlined@gmail.com>
…iasing Move proposal_approval_mode out of SandboxSpec and into the existing runtime-mutable settings model so it can be flipped on a running sandbox and pinned fleet-wide via gateway scope. Precedence matches the rest of the settings model: gateway wins over sandbox, default is manual. The CLI's --approval-mode flag on `sandbox create` is now a shorthand that writes the sandbox-scoped setting post-create. Auto-approval audit events carry resolved_from=<gateway|sandbox|default>. Reject agent proposals whose rule_name starts with `_provider_`. That namespace is reserved for provider-profile-synthesized rules; allowing agents to address them by name would bypass the merge guard that splits agent contributions into their own rule so the prover sees them honestly. Refs #1097 Signed-off-by: Alexander Watson <zredlined@gmail.com>
Signed-off-by: Alexander Watson <zredlined@gmail.com>
Previously the setting was a free-form string, so `openshell settings set ... proposal_approval_mode autom` was accepted and silently resolved to manual at runtime. Operators got no signal that they had fat-fingered the value. Extend RegisteredSetting with an optional allowed_string_values whitelist and apply it at every operator entry point: - Server-side proto_setting_to_stored rejects out-of-whitelist values with Status::invalid_argument listing the allowed set, so all gRPC callers get consistent validation. - CLI parse_cli_setting_value rejects client-side before the round-trip. - TUI global and sandbox setting editors surface the same error inline. Runtime resolve_proposal_approval_mode is intentionally unchanged: it still treats any value other than exact "auto" as manual, so stale storage or future-mode values never enable auto-approval on older gateways. Also documents the approval-mode loop in docs/sandboxes/policy-advisor.mdx with new Approval Modes and What Auto-Approval Checks sections covering mode precedence, the --approval-mode create shorthand, the audit-event fields, and the four categorical prover findings. Refs #1528 Signed-off-by: Alexander Watson <zredlined@gmail.com>
8984c37 to
716d436
Compare
|
🌿 Preview your docs: https://nvidia-preview-pr-1528.docs.buildwithfern.com/openshell |
|
Thanks for catching this- addressed in 716d436. Validation now lands at every operator entry point you called out-
Mechanism: Runtime fail-closed contract preserved: |
The opening claim, the loop description, and the Review Proposals section all predated auto-approval mode and read as if a developer always sat in the loop. Update them to reflect the prover-gated auto-approval path: - Opening: preserve the default-deny framing but acknowledge opt-in auto mode lets the gateway approve empty-delta proposals. - Loop: now seven steps. Step 5 mentions the prover. Step 6 splits manual vs auto behavior. Step 7 covers the agent wait/retry path. - Review Proposals: note that under auto mode, only flagged proposals show as pending; empty-delta ones are visible under --status approved with the audit fields documented in Approval Modes. Refs #1528 #1480 Signed-off-by: Alexander Watson <zredlined@gmail.com>
|
Output of running |
|
/ok to test d851129 |
Summary
Ships the agentic policy approval loop end-to-end. When the sandbox denies a network request, an agent inside the sandbox can propose a narrow policy refinement; the gateway runs a formal prover against the merged-policy delta; safe proposals (no new findings) auto-approve in ~1s; risky ones land in
pendingwith structured evidence the reviewer can act on. The agent waits on a socket — zero LLM tokens burn during human review.This is the loop the platform has been building toward: agents do the narrowing work, the prover catches changes the operator should know about, and the audit trail makes every approval reconstructable.
Closes #1097
Refs #1062
Refs #1532
What this PR ships
The loop. Sandbox denial → agent reads
/etc/openshell/skills/policy_advisor.md→ agent POSTs a narrow proposal topolicy.local→ gateway runs the prover → either auto-approve (empty delta) orpending(any finding) → on approval, sandbox hot-reloads → agent retries.Prover wired in as the auto-approval referee. Every proposal (mechanistic and agent-authored alike) runs through
openshell-prover. The prover answers four categorical questions about the proposed change — see What the prover decides. The gateway computes the delta vs the baseline policy and the auto-approval gate fires only when the delta is empty.Providers-v2 in the loop. The prover validates against the effective policy — provider profiles composed in via providers-v2 are part of the model the prover reasons over. Agent-authored chunks for endpoints a provider profile covers land as their own rules (Fix A in
merge.rs) instead of getting silently absorbed into the provider rule, so the prover sees the agent's narrow contribution honestly.Default-deny posture preserved. Auto-approval is opt-in via the
proposal_approval_moderuntime setting: gateway scope (openshell settings set --global proposal_approval_mode auto) or sandbox scope (openshell settings set <name> proposal_approval_mode auto), with gateway scope winning. Default ("manual", the absence of any setting) routes every proposal to human review regardless of prover verdict. CLI exposes a shorthand at create time:openshell sandbox create --approval-mode <manual|auto>, which writes the sandbox-scoped setting post-create. The audit event carriesresolved_from=<gateway|sandbox|default>so operators can see why a given approval was auto vs manual.Demo that walks the full loop.
examples/agent-driven-policy-management/demo.shruns a Codex agent through a two-path flow against a local gateway: one un-credentialed action auto-approves silently; one credentialed action escalates with a categorical finding, demo.sh approves on behalf, the agent retries and the file lands in GitHub. End-to-end in ~50–110s with one human-visible escalation, exactly the kind the prover cannot decide unilaterally.Reconstructable audit. Every auto-approval emits a
CONFIG:APPROVEDOCSF event with unmapped fieldsauto=true,source=<mechanistic|agent_authored>,prover_delta=empty, andresolved_from=<gateway|sandbox|default>. The chunk's persistedvalidation_resultcarries the categorical finding lines for human-reviewed approvals.Provider profile tightening.
providers/github.yamldefaultsapi.github.comfromread-writetoread-only. Writes (gh / git via REST) now flow through the agentic loop — the loop becomes the on-ramp to write access, and the prover audits each capability change.What the prover decides
The prover answers four formal questions about each proposed change. Each "yes" is its own categorical finding — no severity grade. Any finding blocks auto-approval; empty delta means the change is provably safe under the model.
link_local_reach169.254.0.0/16orfe80::/10(cloud-metadata range, serves credentials).l7_bypass_credentialedgit-remote-https,ssh,nc) reaches a host where a credential is in scope.credential_reach_expansioncapability_expansionDetail in
crates/openshell-prover/README.md.What the demo shows
Acceptance criteria (deterministic, in tests)
approved).pendingwithcredential_reach_expansioninvalidation_result.pendingwithcapability_expansionciting the new method.pendingunconditionally withlink_local_reach.pendingwithl7_bypass_credentialed.(host, port, binary)overlap.manual— empty delta does NOT auto-approve when theproposal_approval_modesetting is unset at both scopes,"manual", or any unknown future value. Gateway scope wins over sandbox scope.--approval-mode autowrites the sandbox-scoped setting after create.auto=true,source=<mode>,prover_delta=empty, andresolved_from=<gateway|sandbox|default>as unmapped OCSF fields._provider_prefix are rejected at submit time.validation_result.All covered by unit and integration tests in
crates/openshell-server/src/grpc/policy.rs::tests.Testing
cargo test --workspace --lib— 534 gateway tests, all 16 crates green.cargo clippy -p openshell-server -p openshell-cli -p openshell-core --all-targets -- -D warnings— clean.cargo fmt --check— clean../examples/agent-driven-policy-management/demo.shruns end-to-end against the local Docker gateway and writes the demo file to GitHub.Explicitly deferred (follow-up PRs)
CONFIG:AUTO_APPROVEDOCSF event class (today reusesCONFIG:APPROVEDwithauto=trueunmapped).docs/for the agentic loop.Checklist