Skip to content

cp: b200 DSv3 better cfg (3368) into r0.4.0#3401

Open
svcnvidia-nemo-ci wants to merge 1 commit intor0.4.0from
cherry-pick-3368-r0.4.0
Open

cp: b200 DSv3 better cfg (3368) into r0.4.0#3401
svcnvidia-nemo-ci wants to merge 1 commit intor0.4.0from
cherry-pick-3368-r0.4.0

Conversation

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor

@svcnvidia-nemo-ci svcnvidia-nemo-ci commented Apr 18, 2026

beep boop [🤖]: Hi @malay-nagda 👋,

we've cherry picked #3368 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

  • Configuration Updates
    • Enhanced DeepSeek V3 pretraining configurations for B200 GPUs with optimized MoE parallelism, memory allocation, and CUDA graph handling.
    • Extended environment variable settings for B200 hardware to improve CUDA memory allocation and graph registration performance.

Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor Author

/ok to test 8786864

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 95c366a9-b6b3-4607-a621-f936c474a6fa

📥 Commits

Reviewing files that changed from the base of the PR and between 9c2b3e4 and 8786864.

📒 Files selected for processing (2)
  • scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
  • scripts/performance/perf_plugins.py

📝 Walkthrough

Walkthrough

Two DeepSeek V3 pretrain configurations are converted from simple aliases to explicit replace() variants with distinct parameter overrides for MoE, parallelism, and CUDA graph settings. Environment variables for b200 GPU support are added to existing conditions in the performance plugin.

Changes

Cohort / File(s) Summary
DeepSeek V3 Config Updates
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
Converted DEEPSEEK_V3_PRETRAIN_CONFIG_B200_FP8_MX_V2 and DEEPSEEK_V3_PRETRAIN_CONFIG_B200_NVFP4_V2 from aliases to explicit replace() calls with distinct parameter sets (moe_flex_dispatcher_backend, pipeline parallelism sizes, moe_a2a_overlap, pp_layout, cuda_graph_impl, and recompute_modules).
GPU Environment Variables
scripts/performance/perf_plugins.py
Added b200 to GPU-specific environment variable conditions in PerfEnvPlugin._set_model_specific_environment_variables, enabling PYTORCH_CUDA_ALLOC_CONF and NCCL_GRAPH_REGISTER settings for B200 GPUs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR contains significant performance-critical changes to DeepSeek V3 B200 GPU configurations without documented test results, performance benchmarks, convergence validation, or regression testing in the PR description. Update PR description with performance benchmark results, convergence validation, and test execution results from existing DeepSeek test suite confirming no regressions.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references a cherry-pick of PR #3368 with specific improvements to B200 DeepSeek V3 configurations, which aligns with the changeset's focus on updating MX V2 and NVFP4 V2 configs and environment variables for B200.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cherry-pick-3368-r0.4.0

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants