Skip to content

fix(CubeMaster): preserve template network rules and resource defaults#581

Open
zyl1121 wants to merge 1 commit into
TencentCloud:masterfrom
zyl1121:fix/cubemaster-template-resource-egress
Open

fix(CubeMaster): preserve template network rules and resource defaults#581
zyl1121 wants to merge 1 commit into
TencentCloud:masterfrom
zyl1121:fix/cubemaster-template-resource-egress

Conversation

@zyl1121

@zyl1121 zyl1121 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Motivation

The template / sandbox request materialization path currently has a few correctness gaps where accepted or configured fields are not preserved consistently.

The most user-visible issue is in template creation from image: cube_network_config.rules can be submitted successfully, but the stored template create_request drops the rules array while keeping the rest of cube_network_config.

This PR fixes that confirmed regression and also tightens two closely related resource-handling cases in the same request materialization path:

  • preserve the first validation error when both CPU and memory exceed configured limits
  • preserve template CPU / memory defaults when applying them to a request whose Resources field is initially nil

Reproduction

I first observed the network rules regression on a one-click deployment running v0.4.0.

Steps:

  1. Create a template from image with both allowInternetAccess and an egress rules entry inside cube_network_config.
  2. Wait for the template image job to succeed.
  3. Inspect the stored template request:
GET /cube/template?template_id=...&include_request=true

Before this fix, the stored create_request.cube_network_config keeps:

{
  "allowInternetAccess": true
}

but silently drops the submitted rules array.

That means the request is accepted, but the resulting template no longer carries the intended egress rule set forward into later sandbox creation.

What This PR Changes

  • Preserve cube_network_config.rules when cloning a template create request from CreateTemplateFromImageReq into CreateCubeSandboxReq.

    • pkg/templatecenter/template_request.go
    • adds deep-copy helpers for egress rules
  • Preserve the first resource validation error when both CPU and memory exceed configured limits.

    • pkg/service/sandbox/util.go
    • prevents the later memory validation check from overwriting an earlier CPU validation error
  • Preserve template CPU / memory defaults when merging template resources into a request with Resources == nil.

    • pkg/service/httpservice/cube/cubeboxutil.go
    • makes applyTemplateResources() return the effective *types.Resource, and callers keep the returned pointer
  • Add focused regression tests for all three cases.

Validation

Manual validation on the deployed cluster:

  • Confirmed that POST /cube/template/from-image accepts cube_network_config.rules, but GET /cube/template?...&include_request=true drops rules while keeping allowInternetAccess.
  • Confirmed that POST /cube/sandbox with cpu=101, mem=2000Mi is rejected with a CPU validation error.
  • Confirmed that POST /cube/sandbox with both cpu=101 and mem=301Gi returns the later memory validation error, which matches the validation-order issue fixed by this PR.

Focused regression tests on this branch:

cd CubeMaster

go test -vet=off ./pkg/templatecenter -run TestGenerateTemplateCreateRequestClonesCubeNetworkRules -count=1
go test -vet=off ./pkg/service/sandbox -run 'TestGetReqResourceRejectsCPUOverflowWhenMemIsValid|TestGetReqResourceRejectsCPUOverflowBeforeMemOverflow' -count=1
go test -vet=off ./pkg/service/httpservice/cube -run TestApplyTemplateToContainerSetsResourcesWhenRequestResourceIsNil -count=1

The focused pkg/service/sandbox tests cover both the cpu overflow + mem valid case and the cpu overflow + mem overflow case.

Scope

This PR is limited to request correctness in the template / sandbox materialization path.

It does not change scheduler limits, template policy semantics, or introduce any new API fields.

The main behavior change is that already accepted request fields and already configured template resource defaults are now preserved consistently when requests are cloned, validated, and merged.

Comment on lines 196 to 198
if resourceIn.Limit != nil {
resourceOut.Limit = resourceIn.Limit
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shallow copy of Resource.Limit — pointer aliasing with template

This assigns the pointer directly (line 198), not a deep copy:

if resourceIn.Limit != nil {
    resourceOut.Limit = resourceIn.Limit
}

After this, the container's Resources.Limit and the template's Resources.Limit point to the same RequestLimit struct. If the template's Limit is ever mutated downstream (even unintentionally), the container's Limit would change too.

The rest of this PR carefully deep-copies egress rule fields (using cloneStringPtr, struct-value copies, etc.), so this inconsistency stands out. A deep copy would be:

if resourceIn.Limit != nil {
    resourceOut.Limit = &types.RequestLimit{
        Cpu: resourceIn.Limit.Cpu,
        Mem: resourceIn.Limit.Mem,
    }
}

How to fix: Replace the pointer assignment with a struct copy on line 198. This is a pre-existing issue, but since applyTemplateResources is being refactored in this PR anyway, it's the right time to close the gap.

Comment on lines +174 to +186
if len(in.Action.Inject) > 0 {
action.Inject = make([]*types.EgressRuleInject, 0, len(in.Action.Inject))
for _, inj := range in.Action.Inject {
if inj == nil {
continue
}
cloned := &types.EgressRuleInject{
Header: inj.Header,
Secret: inj.Secret,
Format: cloneStringPtr(inj.Format),
}
action.Inject = append(action.Inject, cloned)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field-by-field construction vs struct-copy pattern diverges from the sibling implementation

This clones EgressRuleInject fields one by one:

cloned := &types.EgressRuleInject{
    Header: inj.Header,
    Secret: inj.Secret,
    Format: cloneStringPtr(inj.Format),
}

The identical function in cubeboxutil.go:355-367 uses a struct-value copy instead:

cp := *inj
if inj.Format != nil {
    format := *inj.Format
    cp.Format = &format
}
action.Inject = append(action.Inject, &cp)

The struct-copy pattern is more resilient: if a new field is added to EgressRuleInject (which carries credentials), the struct-copy version would automatically pick it up, while the field-by-field version would silently drop it. This is a maintenance hazard — someone adding a field to the type might not know to update all the clone functions.

How to fix: Align this with the cubeboxutil.go pattern by using a struct-value copy instead of field-by-field assignment.

@cubesandboxbot

cubesandboxbot Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review: PR #581

Summary

Three correctness fixes in the template/sandbox materialization path. Deep-copy helpers, regression tests, and validation-order fix are all appropriate.

Inline comment items

  • template_request.go cloneCubeNetworkConfig missing AllowPublicTraffic field
  • mergeEgressRules name-based dedup removes template rules entirely
  • Double space in error message

Additional

  • cloneEgressRule functions duplicated across two files with different deep-copy strategies
  • ensureSandboxTestConfig mutates global config singleton (unsafe for parallel tests)
  • Test coverage gap: cubeboxutil_test.go missing nested pointer identity assertions

}
if resourceIn.Limit != nil {
resourceOut.Limit = resourceIn.Limit
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shallow copy of Resource.Limit — pointer aliasing with template

This assigns the pointer directly (line 198), not a deep copy:

if resourceIn.Limit != nil {
    resourceOut.Limit = resourceIn.Limit
}

After this, the container's Resources.Limit and the template's Resources.Limit point to the same RequestLimit struct. If the template's Limit is ever mutated downstream (even unintentionally), the container's Limit would change too.

The rest of this PR carefully deep-copies egress rule fields (using cloneStringPtr, struct-value copies, etc.), so this inconsistency stands out. A deep copy would be:

if resourceIn.Limit != nil {
    resourceOut.Limit = &types.RequestLimit{
        Cpu: resourceIn.Limit.Cpu,
        Mem: resourceIn.Limit.Mem,
    }
}

How to fix: Replace the pointer assignment with a struct copy on line 198. This is a pre-existing issue, but since applyTemplateResources is being refactored in this PR anyway, it's the right time to close the gap.

Comment thread CubeMaster/pkg/service/sandbox/util.go Outdated
cpu.MilliValue())
}
if mem.Cmp(config.GetConfig().Scheduler.MaxMvmMemoryRes()) >= 0 {
if err == nil && mem.Cmp(config.GetConfig().Scheduler.MaxMvmMemoryRes()) >= 0 {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return early instead of nil check here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return early instead of nil check here.

Done, I updated this in a new commit to return early.

ctr.Resources = &types.Resource{}
}
applyTemplateResources(templateCtr.Resources, ctr.Resources)
ctr.Resources = applyTemplateResources(templateCtr.Resources, ctr.Resources)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like changes to this file is unnecessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like changes to this file is unnecessary.

Good point. I added this as defensive hardening for the case where the destination Resources is nil, but the current call sites already handle allocation. To keep this PR focused, I will drop this change.

out := &types.CubeNetworkConfig{
AllowOut: append([]string(nil), in.AllowOut...),
DenyOut: append([]string(nil), in.DenyOut...),
Rules: cloneEgressRules(in.Rules),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, there is no way to pass egress rules when create template.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, there is no way to pass egress rules when create template.

I think this is true for cubemastercli today, but not for the HTTP API.

CreateTemplateFromImageReq already includes cube_network_config.rules, and the /cube/template/from-image handler unmarshals the request body directly into that type.

I reproduced this on a v0.4.0 one-click deployment by sending cube_network_config.rules in the raw POST body. The request was accepted, allowInternetAccess was preserved, but rules was missing from the stored template create_request.

So the issue here seems to be that raw API callers can already pass rules, but the template request clone path drops them.

If the intended scope is to only support fields currently exposed by cubemastercli, I can split the rules preservation into a follow-up and keep this PR focused on other fixes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think we should also revise mergeEgressRules(). The per sandbox rules should prepend to templates rules.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think we should also revise mergeEgressRules(). The per sandbox rules should prepend to templates rules.

Updated this part.

mergeEgressRules() now prepends per-sandbox/request rules before template rules, so first-match-wins evaluation gives request-side rules precedence.

I also added focused tests for name-based overrides, no-conflict merge order, empty-side handling, and nil rule entries.

@zyl1121 zyl1121 force-pushed the fix/cubemaster-template-resource-egress branch from 22e1921 to 9bc295d Compare June 16, 2026 11:37
// first-match-wins, per-sandbox/request rules must come before template rules.
// Template rules whose Name is overridden by the request side are skipped.
func mergeEgressRules(base []*types.EgressRule, extra []*types.EgressRule) []*types.EgressRule {
if len(extra) == 0 {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: mergeEgressRules now always clones (defensive, correct)

The new code on both early-return paths (len(extra)==0 → returns cloneEgressRules(base), and len(base)==0 → returns cloneEgressRules(extra)) always returns a deep copy. The old code returned base directly when no extra rules were provided.

This pairs with mergeCubeNetworkConfigs (line 267+279): out is already a clone of templateCfg, but mergeEgressRules re-clones it. When both sides have rules, template rules are cloned twice — once in cloneCubeNetworkConfig and again inside mergeEgressRules. Consider passing templateCfg.Rules directly to mergeEgressRules to avoid the redundant clone.

// first-match-wins, per-sandbox/request rules must come before template rules.
// Template rules whose Name is overridden by the request side are skipped.
func mergeEgressRules(base []*types.EgressRule, extra []*types.EgressRule) []*types.EgressRule {
if len(extra) == 0 {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR fixes three data-loss bugs in the template/sandbox request materialization path. The changes are well-scoped and the reasoning is clearly articulated. Below are the findings I consider noteworthy.


Pre-existing bug: parse errors silently dropped in getReqResource (Not PR-introduced)

File: CubeMaster/pkg/service/sandbox/util.go:90-98

The resource.ParseQuantity calls use := which declares a new for-loop-scoped err, shadowing the named return err from the function signature. After break, the for-loop-scoped err goes out of scope and line 104 reads the named return err which is still nil.

Suggestion: Use = instead of := to write to the named return err.


Docs describe wrong rule merge ordering

Files:

  • docs/guide/egress-network-policy.md (English, line 197)
  • docs/zh/guide/egress-network-policy.md (Chinese, line 197)

Both state that request rules with new names are appended after template rules. The mergeEgressRules implementation puts all request rules first, then template rules minus overridden ones. The code is correct; the docs should match.


Minor: redundant clone when both sides have egress rules

When both templateCfg and requestCfg contain rules (cubeboxutil.go:267,279), template rules are deep-cloned twice — once in cloneCubeNetworkConfig and again in mergeEgressRules.


Positive observations

  • The getReqResource early-return fix is correct and unambiguous.
  • Test coverage verifies pointer independence at all nesting levels.
  • Nil entry handling in mergeEgressRules is robust.

@zyl1121 zyl1121 force-pushed the fix/cubemaster-template-resource-egress branch 2 times, most recently from 152d6af to f632d73 Compare June 17, 2026 07:08
@chenhengqi

Copy link
Copy Markdown
Collaborator

@zyl1121 Please resolve conflicts and squash your commits.

Preserve egress rules when cloning template create requests so accepted network policy is not lost during template creation from image.

Also preserve the first resource validation error when both CPU and memory exceed configured limits.

Update egress rule merge order to honor request-side overrides under first-match-wins semantics, and keep the English and Chinese network policy docs aligned with the implemented behavior.

Signed-off-by: zhengyilei <zheng_yilei@qq.com>
@zyl1121 zyl1121 force-pushed the fix/cubemaster-template-resource-egress branch from f632d73 to d4d91c4 Compare June 26, 2026 07:44
@zyl1121

zyl1121 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@chenhengqi Done. The merge conflict is resolved, and the commits have been squashed into one.

// mergeEgressRules combines template + request rules. Because egress rules are
// first-match-wins, per-sandbox/request rules must come before template rules.
// Template rules whose Name is overridden by the request side are skipped.
func mergeEgressRules(base []*types.EgressRule, extra []*types.EgressRule) []*types.EgressRule {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design consideration: name-based dedup removes template rules entirely, not just deprioritizes them

Request rules are placed before template rules, which is correct for first-match-wins precedence. However, the additional name-based dedup (overridden map, line ~341) removes template rules from the output entirely when their name matches a request rule, rather than just letting them be deprioritized by position.

This means a sandbox user who knows a template rule's name (e.g., "block-metadata") can craft a request rule with the same name and a permissive action to bypass that template security rule. The template rule would be gone from the evaluated set — not just out-prioritized.

The old code (in-place overwrite in the base list) had the same concern, and at minimum the behavior should be documented as a known trait. If template rules are meant to be mandatory security boundaries, consider using position-only precedence and removing the name-based dedup.

// Option: position-only precedence, no name-based removal
out := make([]*types.EgressRule, 0, len(extra)+len(base))
for _, r := range extra {
    out = append(out, cloneEgressRule(r))
}
for _, r := range base {
    out = append(out, cloneEgressRule(r))
}

@@ -134,6 +134,7 @@ func cloneCubeNetworkConfig(in *types.CubeNetworkConfig) *types.CubeNetworkConfi
out := &types.CubeNetworkConfig{

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing AllowPublicTraffic field — silent data loss risk

This cloneCubeNetworkConfig deep-copies AllowOut, DenyOut, and Rules, plus AllowInternetAccess, but does not clone AllowPublicTraffic. The equivalent code in cubeboxutil.go:cloneCubeNetworkConfigBase (lines 316-319) does copy it:

if in.AllowPublicTraffic != nil {
    allowPublicTraffic := *in.AllowPublicTraffic
    out.AllowPublicTraffic = &allowPublicTraffic
}

If template creation from image ever starts using AllowPublicTraffic, this omission would silently drop that field. Either copy AllowPublicTraffic too, or add a comment explaining why it's intentionally excluded.

}
if mem.Cmp(config.GetConfig().Scheduler.MaxMvmMemoryRes()) >= 0 {
err = ret.Errorf(errorcode.ErrorCode_MasterParamsError, "request Resources mem[%dKB] is invalid",
return cpu, mem, ret.Errorf(errorcode.ErrorCode_MasterParamsError, "request Resources mem[%dKB] is invalid",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: double space in error message

"request Resources mem[%dKB] is invalid" has an extra space between "Resources" and "mem". The CPU message above (line 110) uses a single space: "request Resources cpu[%dm] is invalid".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants