Add trainerv2 with mindspeed#201
Conversation
WalkthroughThis PR introduces documentation for fine-tuning Qwen3 on Huawei Ascend NPUs using Kubeflow Trainer v2 and MindSpeed-LLM. It adds a comprehensive Jupyter notebook with Kubeflow manifests (TrainingRuntime and TrainJob), environment setup, dataset preparation, checkpoint conversion, and training commands, plus a reference section in the existing tutorial guide. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Ruff (0.15.12)docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynbUnexpected end of JSON input Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb (1)
292-300: ContainersecurityContextduplicates pod-level fields.
runAsNonRoot,runAsUser, andrunAsGroupare already set on the pod spec at lines 109-113 with the same values, so the container-level copies are redundant — only the truly container-scoped settings (allowPrivilegeEscalation,capabilities,seccompProfile) need to live here. Reduces the chance of the two blocks drifting apart later.🧹 Proposed cleanup
securityContext: allowPrivilegeEscalation: true capabilities: add: ["IPC_LOCK", "SYS_PTRACE"] - runAsNonRoot: true - runAsUser: 1001 - runAsGroup: 0 seccompProfile: type: RuntimeDefault🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb` around lines 292 - 300, The container-level securityContext duplicates pod-level fields runAsNonRoot, runAsUser, and runAsGroup (same values set earlier); remove those three keys from the container's securityContext block and leave only container-scoped keys (allowPrivilegeEscalation, capabilities, seccompProfile) so the pod-level runAs* settings remain authoritative; locate the container securityContext in the YAML snippet under the container spec (the block containing allowPrivilegeEscalation/capabilities/seccompProfile) and delete runAsNonRoot, runAsUser, and runAsGroup entries there.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb`:
- Around line 158-163: The YAML block scalar fails because the JSONL and PYCHECK
heredocs are not indented to the required column; keep the printf change for
JSONL and apply the same approach to PYCHECK: either replace the PYCHECK heredoc
with a printf that emits the content (like the JSONL fix) or indent every
non-empty line inside the PYCHECK heredoc by at least 20 spaces to match the
surrounding block scalar (ensure symbols RAW_DATA_FILE, JSONL, and PYCHECK are
updated accordingly and that set -o pipefail block indentation is preserved).
In `@docs/en/kubeflow/how_to/fine-tune-with-trainer-v2.mdx`:
- Line 84: The GitHub URL
"https://github.com/alauda/aml-docs/docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb"
is malformed and 404s; update that link (and the other two occurrences using the
same pattern) to either a proper GitHub blob URL including the branch (e.g., add
"/blob/main/" between "alauda/aml-docs" and the path) or convert it to a
site-relative link to the notebook within the docs (so it resolves on the
rendered site); search for the same broken pattern on the page (lines near where
the current link appears) and apply the same fix to each occurrence.
---
Nitpick comments:
In `@docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb`:
- Around line 292-300: The container-level securityContext duplicates pod-level
fields runAsNonRoot, runAsUser, and runAsGroup (same values set earlier); remove
those three keys from the container's securityContext block and leave only
container-scoped keys (allowPrivilegeEscalation, capabilities, seccompProfile)
so the pod-level runAs* settings remain authoritative; locate the container
securityContext in the YAML snippet under the container spec (the block
containing allowPrivilegeEscalation/capabilities/seccompProfile) and delete
runAsNonRoot, runAsUser, and runAsGroup entries there.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 92b4c5fb-fd8a-43c6-b5d3-b92c9aa4f9be
📒 Files selected for processing (2)
docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynbdocs/en/kubeflow/how_to/fine-tune-with-trainer-v2.mdx
| " if [ ! -s \"$RAW_DATA_FILE\" ]; then\n", | ||
| " cat >\"$RAW_DATA_FILE\" <<'JSONL'\n", | ||
| "{\"instruction\":\"Who are you?\",\"input\":\"\",\"output\":\"I am XiaoLing, an AI assistant from Alauda AI Platform.\",\"system\":\"\"}\n", | ||
| "{\"instruction\":\"What is Alauda AI Platform?\",\"input\":\"\",\"output\":\"Alauda AI Platform helps teams build, train, and serve AI workloads on Kubernetes.\",\"system\":\"\"}\n", | ||
| "JSONL\n", | ||
| " fi\n", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "*fine-tune-with-trainer-v2-mindspeed-npu.ipynb" -type fRepository: alauda/aml-docs
Length of output: 131
🏁 Script executed:
cat ./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb | python3 << 'EOF'
import json
import sys
# Load notebook
nb = json.load(sys.stdin)
# Find cells with the heredoc content
for i, cell in enumerate(nb.get('cells', [])):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if 'JSONL' in content_str and 'cat >' in content_str:
print(f"\n=== Cell {i} ===")
print(content_str)
print("\n")
EOFRepository: alauda/aml-docs
Length of output: 780
🏁 Script executed:
python3 << 'EOF'
import json
# Load notebook directly from file
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find cells with the heredoc content
for i, cell in enumerate(nb.get('cells', [])):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if 'JSONL' in content_str and 'cat >' in content_str:
print(f"\n=== Cell {i} ===")
print(content_str)
print("\n")
EOFRepository: alauda/aml-docs
Length of output: 9773
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
# Extract the YAML part (after %%writefile directive)
yaml_content = '\n'.join(content_str.split('\n')[1:])
print("=== Attempting to parse the YAML ===")
try:
parsed = yaml.safe_load(yaml_content)
print("✓ YAML parsed successfully!")
# Check the args section
if parsed and 'spec' in parsed:
template = parsed['spec'].get('template', {})
spec = template.get('spec', {})
replicated = spec.get('replicatedJobs', [{}])[0]
job_template = replicated.get('template', {})
job_spec = job_template.get('spec', {})
pod_spec = job_spec.get('template', {})
containers = pod_spec.get('spec', {}).get('containers', [{}])[0]
args = containers.get('args', [])
if args:
arg_string = args[0]
# Check if the heredoc content is present and intact
if 'JSONL' in arg_string and 'Who are you?' in arg_string:
print("✓ JSONL data is intact in the parsed content")
lines = arg_string.split('\n')
for i, line in enumerate(lines):
if 'JSONL' in line or 'Who are you' in line or 'Alauda AI Platform' in line:
print(f" Line {i}: {repr(line[:80])}")
except yaml.YAMLError as e:
print(f"✗ YAML parsing failed with error:")
print(f" {type(e).__name__}: {e}")
import traceback
traceback.print_exc()
break
EOFRepository: alauda/aml-docs
Length of output: 2613
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
# Extract the YAML part (after %%writefile directive)
yaml_content = '\n'.join(content_str.split('\n')[1:])
# Now test the proposed fix by replacing the heredoc with printf
# Find and replace the problematic heredoc
fixed_yaml = yaml_content.replace(
''' cat >"$RAW_DATA_FILE" <<'JSONL'
{"instruction":"Who are you?","input":"","output":"I am XiaoLing, an AI assistant from Alauda AI Platform.","system":""}
{"instruction":"What is Alauda AI Platform?","input":"","output":"Alauda AI Platform helps teams build, train, and serve AI workloads on Kubernetes.","system":""}
JSONL''',
''' {
printf '%s\\n' '{"instruction":"Who are you?","input":"","output":"I am XiaoLing, an AI assistant from Alauda AI Platform.","system":""}'
printf '%s\\n' '{"instruction":"What is Alauda AI Platform?","input":"","output":"Alauda AI Platform helps teams build, train, and serve AI workloads on Kubernetes.","system":""}'
} >"$RAW_DATA_FILE"'''
)
print("=== Testing PROPOSED FIX (printf-based) ===")
try:
parsed = yaml.safe_load(fixed_yaml)
print("✓ YAML with proposed fix parses successfully!")
# Verify the args section still has the script intact
if parsed and 'spec' in parsed:
template = parsed['spec'].get('template', {})
spec = template.get('spec', {})
replicated = spec.get('replicatedJobs', [{}])[0]
job_template = replicated.get('template', {})
job_spec = job_template.get('spec', {})
pod_spec = job_spec.get('template', {})
containers = pod_spec.get('spec', {}).get('containers', [{}])[0]
args = containers.get('args', [])
if args:
arg_string = args[0]
if 'printf' in arg_string and 'Who are you?' in arg_string:
print("✓ Fixed script content is intact and properly formatted")
except yaml.YAMLError as e:
print(f"✗ Fixed YAML still fails to parse:")
print(f" {type(e).__name__}: {e}")
break
EOFRepository: alauda/aml-docs
Length of output: 372
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
# Extract the YAML part
yaml_content = '\n'.join(content_str.split('\n')[1:])
# Find the lines with the heredoc to check indentation
lines = yaml_content.split('\n')
for i, line in enumerate(lines):
if 'set -o pipefail' in line:
indent = len(line) - len(line.lstrip())
print(f"'set -o pipefail' at line {i}, indentation: {indent} spaces")
if 'cat >"$RAW_DATA_FILE"' in line:
indent = len(line) - len(line.lstrip())
print(f"'cat >' line at line {i}, indentation: {indent} spaces")
if '{"instruction":"Who are you?' in line:
indent = len(line) - len(line.lstrip())
print(f"JSONL data line at line {i}, indentation: {indent} spaces")
if line.strip() == 'JSONL':
indent = len(line) - len(line.lstrip())
print(f"JSONL terminator at line {i}, indentation: {indent} spaces")
# Show the exact section
print("\n=== Showing exact lines around heredoc (with visible spaces) ===")
for i, line in enumerate(lines[65:75], start=65):
print(f"{i}: {repr(line)}")
EOFRepository: alauda/aml-docs
Length of output: 765
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
yaml_content = '\n'.join(content_str.split('\n')[1:])
lines = yaml_content.split('\n')
# Show lines 74-82 to see the exact problem
print("=== Current problematic section (lines 74-82) ===")
for i in range(74, min(82, len(lines))):
print(f"{i}: {repr(lines[i])}")
# The issue is clear: lines starting at column 0 break the block scalar
# The fix needs to maintain 20+ space indentation
# Let me create a proper fix
print("\n=== Creating fixed version ===")
fixed_lines = lines.copy()
# Replace lines 76-79 (the problematic heredoc) with printf statements
# keeping proper indentation (22 spaces like the cat line)
fixed_lines[76] = ' {'
fixed_lines[77] = ' printf \'%s\\n\' \'{"instruction":"Who are you?","input":"","output":"I am XiaoLing, an AI assistant from Alauda AI Platform.","system":""}\''
fixed_lines[78] = ' printf \'%s\\n\' \'{"instruction":"What is Alauda AI Platform?","input":"","output":"Alauda AI Platform helps teams build, train, and serve AI workloads on Kubernetes.","system":""}\''
fixed_lines[79] = ' } >"$RAW_DATA_FILE"'
fixed_yaml = '\n'.join(fixed_lines)
print("=== Testing PROPOSED FIX with correct indentation ===")
try:
parsed = yaml.safe_load(fixed_yaml)
print("✓ YAML with proposed fix parses successfully!")
except yaml.YAMLError as e:
print(f"✗ Fixed YAML still fails:")
print(f" {type(e).__name__}: {str(e)[:200]}")
break
EOFRepository: alauda/aml-docs
Length of output: 1005
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.yaml', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
yaml_content = '\n'.join(content_str.split('\n')[1:])
lines = yaml_content.split('\n')
# Check for ALL heredocs in the block
for i, line in enumerate(lines):
if "<<'" in line:
indent = len(line) - len(line.lstrip())
heredoc_marker = line.split("<<'")[1].split("'")[0]
print(f"Line {i}: Heredoc with marker '{heredoc_marker}', indentation: {indent} spaces")
# Find the terminator
for j in range(i+1, min(i+20, len(lines))):
if lines[j].strip() == heredoc_marker:
term_indent = len(lines[j]) - len(lines[j].lstrip())
print(f" Terminator at line {j}, indentation: {term_indent} spaces")
break
EOFRepository: alauda/aml-docs
Length of output: 261
🏁 Script executed:
python3 << 'EOF'
import json
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
yaml_content = '\n'.join(content_str.split('\n')[1:])
lines = yaml_content.split('\n')
# Check for ALL heredocs in the block
print("=== All heredocs in the YAML ===")
for i, line in enumerate(lines):
if "<<'" in line:
indent = len(line) - len(line.lstrip())
heredoc_marker = line.split("<<'")[1].split("'")[0]
print(f"Line {i}: Heredoc with marker '{heredoc_marker}', indentation: {indent} spaces")
# Find the terminator
for j in range(i+1, min(i+50, len(lines))):
if lines[j].strip() == heredoc_marker:
term_indent = len(lines[j]) - len(lines[j].lstrip())
print(f" Terminator at line {j}, indentation: {term_indent} spaces")
print(f" Content between: {j-i-1} lines")
break
EOFRepository: alauda/aml-docs
Length of output: 365
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
yaml_content = '\n'.join(content_str.split('\n')[1:])
lines = yaml_content.split('\n')
# The block scalar starts at line 43 with "- |" and "set -o pipefail" at line 44
# Minimum indentation is 20 spaces
# Fix line 79 (JSONL terminator) - move from column 0 to column 20
lines[79] = ' JSONL'
fixed_yaml = '\n'.join(lines)
print("=== Testing fix for JSONL terminator indentation only ===")
try:
parsed = yaml.safe_load(fixed_yaml)
print("✓ YAML parses after indenting JSONL terminator!")
except yaml.YAMLError as e:
error_msg = str(e)
if 'PYCHECK' in error_msg:
print(f"✗ Still fails, but now at PYCHECK: {error_msg[:100]}")
else:
print(f"✗ Failed: {error_msg[:150]}")
break
EOFRepository: alauda/aml-docs
Length of output: 278
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
yaml_content = '\n'.join(content_str.split('\n')[1:])
lines = yaml_content.split('\n')
# The block scalar's minimum indentation is 20 spaces (set -o pipefail)
# Fix lines 77, 78, 79 - indent to at least 20 spaces
lines[77] = ' {"instruction":"Who are you?","input":"","output":"I am XiaoLing, an AI assistant from Alauda AI Platform.","system":""}'
lines[78] = ' {"instruction":"What is Alauda AI Platform?","input":"","output":"Alauda AI Platform helps teams build, train, and serve AI workloads on Kubernetes.","system":""}'
lines[79] = ' JSONL'
# Also fix the PYCHECK terminator at line 95
for i, line in enumerate(lines):
if line.strip() == 'PYCHECK':
lines[i] = ' PYCHECK'
fixed_yaml = '\n'.join(lines)
print("=== Testing fix for indenting ALL heredoc content ===")
try:
parsed = yaml.safe_load(fixed_yaml)
print("✓ YAML parses successfully after indenting heredoc content!")
# Verify the parsed data
if parsed and 'spec' in parsed:
print("✓ YAML structure is valid")
except yaml.YAMLError as e:
error_msg = str(e)
print(f"✗ Failed: {error_msg[:150]}")
break
EOFRepository: alauda/aml-docs
Length of output: 274
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
yaml_content = '\n'.join(content_str.split('\n')[1:])
lines = yaml_content.split('\n')
# Find all lines with column 0 indentation that should be indented
# The minimum indentation for the block scalar is 20 spaces
print("=== Lines at column 0 that are inside heredocs ===")
in_heredoc = False
heredoc_marker = None
for i, line in enumerate(lines):
if "<<'" in line:
in_heredoc = True
heredoc_marker = line.split("<<'")[1].split("'")[0]
print(f"Line {i}: Starting heredoc '{heredoc_marker}'")
if in_heredoc and len(line) > 0 and line[0] != ' ' and line.strip() != '':
print(f" Line {i}: Column 0: {repr(line[:60])}")
if in_heredoc and line.strip() == heredoc_marker:
in_heredoc = False
print(f" Line {i}: End heredoc (at column {len(line) - len(line.lstrip())})")
EOFRepository: alauda/aml-docs
Length of output: 1158
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
yaml_content = '\n'.join(content_str.split('\n')[1:])
lines = yaml_content.split('\n')
# Apply the EXACT fix from the reviewer's proposal
# Replace lines 76-79 with the printf-based approach
lines[76] = ' {'
lines[77] = ' printf \'%s\\n\' \'{"instruction":"Who are you?","input":"","output":"I am XiaoLing, an AI assistant from Alauda AI Platform.","system":""}\''
lines[78] = ' printf \'%s\\n\' \'{"instruction":"What is Alauda AI Platform?","input":"","output":"Alauda AI Platform helps teams build, train, and serve AI workloads on Kubernetes.","system":""}\''
lines[79] = ' } >"$RAW_DATA_FILE"'
# Also need to fix the PYCHECK heredoc - indent all its content
# Lines 83-94 need to be indented to at least 20 spaces
for i in range(83, 95):
if lines[i].strip() and not lines[i].startswith(' '):
lines[i] = ' ' + lines[i]
# Line 95 is the PYCHECK terminator
lines[95] = ' PYCHECK'
fixed_yaml = '\n'.join(lines)
print("=== Testing reviewer's fix + indenting PYCHECK heredoc ===")
try:
parsed = yaml.safe_load(fixed_yaml)
print("✓ YAML parses successfully!")
print("✓ Proposed fix works when BOTH heredocs are properly handled")
except yaml.YAMLError as e:
error_msg = str(e)
print(f"✗ Failed: {error_msg[:150]}")
break
EOFRepository: alauda/aml-docs
Length of output: 279
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
yaml_content = '\n'.join(content_str.split('\n')[1:])
lines = yaml_content.split('\n')
# Show lines 82-95 to see the PYCHECK heredoc structure
print("=== PYCHECK heredoc section (lines 82-96) ===")
for i in range(82, min(97, len(lines))):
line = lines[i]
leading_spaces = len(line) - len(line.lstrip())
print(f"{i}: [{leading_spaces:2d} spaces] {repr(line[:70])}")
EOFRepository: alauda/aml-docs
Length of output: 911
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
yaml_content = '\n'.join(content_str.split('\n')[1:])
lines = yaml_content.split('\n')
# Apply ONLY the reviewer's proposed fix for JSONL heredoc
# (lines 76-79)
lines[76] = ' {'
lines[77] = ' printf \'%s\\n\' \'{"instruction":"Who are you?","input":"","output":"I am XiaoLing, an AI assistant from Alauda AI Platform.","system":""}\''
lines[78] = ' printf \'%s\\n\' \'{"instruction":"What is Alauda AI Platform?","input":"","output":"Alauda AI Platform helps teams build, train, and serve AI workloads on Kubernetes.","system":""}\''
lines[79] = ' } >"$RAW_DATA_FILE"'
fixed_yaml = '\n'.join(lines)
print("=== Testing ONLY the reviewer's proposed fix (printf for JSONL) ===")
print("NOTE: This leaves the PYCHECK heredoc at column 0, which should also fail")
try:
parsed = yaml.safe_load(fixed_yaml)
print("✓ YAML parses successfully!")
except yaml.YAMLError as e:
error_msg = str(e)
if 'PYCHECK' in error_msg or 'import' in error_msg:
print(f"✗ Failed at PYCHECK heredoc (same problem as JSONL): {error_msg[:100]}")
else:
print(f"✗ Failed: {error_msg[:150]}")
break
EOFRepository: alauda/aml-docs
Length of output: 355
🏁 Script executed:
python3 << 'EOF'
import json
import yaml
# Load notebook
with open('./docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell with writefile
for cell in nb.get('cells', []):
content = cell.get('source', [])
content_str = ''.join(content) if isinstance(content, list) else content
if '%%writefile kf-trainingruntime-mindspeed-npu.yaml' in content_str:
yaml_content = '\n'.join(content_str.split('\n')[1:])
lines = yaml_content.split('\n')
# Apply the reviewer's fix for JSONL
lines[76] = ' {'
lines[77] = ' printf \'%s\\n\' \'{"instruction":"Who are you?","input":"","output":"I am XiaoLing, an AI assistant from Alauda AI Platform.","system":""}\''
lines[78] = ' printf \'%s\\n\' \'{"instruction":"What is Alauda AI Platform?","input":"","output":"Alauda AI Platform helps teams build, train, and serve AI workloads on Kubernetes.","system":""}\''
lines[79] = ' } >"$RAW_DATA_FILE"'
# Now also fix the PYCHECK heredoc - indent all lines to 20+ spaces
# Lines 83-94 are Python code at column 0, line 95 is the terminator
for i in range(83, 96):
if lines[i].strip(): # Only indent non-empty lines
lines[i] = ' ' + lines[i]
fixed_yaml = '\n'.join(lines)
print("=== Testing comprehensive fix (printf for JSONL + indent PYCHECK) ===")
try:
parsed = yaml.safe_load(fixed_yaml)
print("✓ YAML parses successfully with comprehensive fix!")
# Check that the args were preserved correctly
template = parsed['spec'].get('template', {})
spec = template.get('spec', {})
containers = spec.get('replicatedJobs', [{}])[0].get('template', {}).get('spec', {}).get('template', {}).get('spec', {}).get('containers', [{}])[0]
args = containers.get('args', [])
if args and 'printf' in args[0]:
print("✓ Script content was preserved correctly")
except yaml.YAMLError as e:
error_msg = str(e)
print(f"✗ Failed: {error_msg[:150]}")
break
EOFRepository: alauda/aml-docs
Length of output: 221
YAML parse failure confirmed: Both heredocs break the block scalar indentation.
The issue is real. The YAML | block scalar requires all non-empty lines to be indented at least 20 spaces (matching set -o pipefail), but both the JSONL heredoc (lines 77–79) and the PYCHECK heredoc (lines 83–95) have their content at column 0, causing yaml.safe_load() to fail with ScannerError.
The proposed printf fix for JSONL is correct, but the PYCHECK heredoc has the identical problem and must also be fixed by indenting all its content to at least 20 spaces.
🐛 Complete fix: printf for JSONL + indent PYCHECK heredoc
mkdir -p "$(dirname "$RAW_DATA_FILE")" "$MCORE_WEIGHTS_DIR" "$OUTPUT_DIR"
if [ ! -s "$RAW_DATA_FILE" ]; then
- cat >"$RAW_DATA_FILE" <<'JSONL'
-{"instruction":"Who are you?","input":"","output":"I am XiaoLing, an AI assistant from Alauda AI Platform.","system":""}
-{"instruction":"What is Alauda AI Platform?","input":"","output":"Alauda AI Platform helps teams build, train, and serve AI workloads on Kubernetes.","system":""}
-JSONL
+ {
+ printf '%s\n' '{"instruction":"Who are you?","input":"","output":"I am XiaoLing, an AI assistant from Alauda AI Platform.","system":""}'
+ printf '%s\n' '{"instruction":"What is Alauda AI Platform?","input":"","output":"Alauda AI Platform helps teams build, train, and serve AI workloads on Kubernetes.","system":""}'
+ } >"$RAW_DATA_FILE"
fi
python - <<'PYCHECK'
-import importlib.metadata as md
-import importlib.util
-import torch
-import torch_npu
-for mod in ["torch", "torch_npu", "mindspeed", "mindspeed_llm"]:
+ import importlib.metadata as md
+ import importlib.util
+ import torch
+ import torch_npu
+ for mod in ["torch", "torch_npu", "mindspeed", "mindspeed_llm"]:
- assert importlib.util.find_spec(mod), f"missing {mod}"
-print("torch:", torch.__version__)
-print("torch_npu:", torch_npu.__version__)
-print("mindspeed:", md.version("mindspeed"))
-print("mindspeed_llm:", md.version("mindspeed-llm"))
-print("npu_count:", torch.npu.device_count())
-assert torch.npu.is_available(), "NPU is not available"
+ assert importlib.util.find_spec(mod), f"missing {mod}"
+ print("torch:", torch.__version__)
+ print("torch_npu:", torch_npu.__version__)
+ print("mindspeed:", md.version("mindspeed"))
+ print("mindspeed_llm:", md.version("mindspeed-llm"))
+ print("npu_count:", torch.npu.device_count())
+ assert torch.npu.is_available(), "NPU is not available"
-PYCHECK
+ PYCHECK🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb` around
lines 158 - 163, The YAML block scalar fails because the JSONL and PYCHECK
heredocs are not indented to the required column; keep the printf change for
JSONL and apply the same approach to PYCHECK: either replace the PYCHECK heredoc
with a printf that emits the content (like the JSONL fix) or indent every
non-empty line inside the PYCHECK heredoc by at least 20 spaces to match the
surrounding block scalar (ensure symbols RAW_DATA_FILE, JSONL, and PYCHECK are
updated accordingly and that set -o pipefail block indentation is preserved).
Deploying alauda-ai with
|
| Latest commit: |
1e40109
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://aeca4975.alauda-ai.pages.dev |
| Branch Preview URL: | https://add-trainerv2-mindspeed.alauda-ai.pages.dev |
Summary by CodeRabbit