fix(parser): comment before next binding bleeds into previous content_position#143
fix(parser): comment before next binding bleeds into previous content_position#143thedavidmeister wants to merge 1 commit into
Conversation
…_position Use trailing trim from the modified document (where comments are spaces) for content_position[1] so a block comment immediately before the next #-binding is not included in the preceding binding's content or content_position. Leading trim still uses the original text so comments at the start of a binding's own content area (e.g. /* doc comment */ before the expression) are preserved correctly. Closes #114 Co-Authored-By: Claude <noreply@anthropic.com>
WalkthroughIn Binding content range fix
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/dotrain/src/parser/raindocument/logic.rs`:
- Around line 798-808: The malformed range is being preserved in the
`content_position` value even though `self.text.get(...)` masks it locally.
Update the range calculation in `logic.rs` where `content_position` is derived
in this branch so it is clamped/normalized before assignment, ensuring `start <=
end` for comment-only bindings after whitespace trimming. Keep the fix scoped to
the `content_position` computation used just before `content` is read, and use
the existing `parsed_binding`, `trimmed_orig`, and `raw_trimmed` values to
adjust the bounds consistently.
In `@crates/dotrain/src/parser/raindocument/mod.rs`:
- Around line 953-993: The new regression test only covers exclusion of a
comment before the next binding, but the positional contract also needs coverage
for the complementary cases. Extend the tests around RainDocument::create and
binding parsing to verify that a comment at the start of a binding’s own content
is preserved in that binding’s content/content_position, and add an edge case
for a comment-only binding to confirm it is handled correctly per the logic in
the parser. Use the existing test in raindocument/mod.rs as the location to add
these cases and align expectations with the behavior introduced in the
binding/content extraction logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 37dc1c6e-c1e5-4411-a8fc-2f0ac9fdbbad
📒 Files selected for processing (2)
crates/dotrain/src/parser/raindocument/logic.rscrates/dotrain/src/parser/raindocument/mod.rs
| } else { | ||
| [ | ||
| parsed_binding.1[0] + boundry_offset + 1 + trimmed_content.1, | ||
| parsed_binding.1[1] - trimmed_content.2, | ||
| parsed_binding.1[0] + boundry_offset + 1 + trimmed_orig.1, | ||
| parsed_binding.1[1] - raw_trimmed.2, | ||
| ] | ||
| }; | ||
| content = if trimmed_content.0.is_empty() { | ||
| slices.1.to_owned() | ||
| } else { | ||
| trimmed_content.0.to_owned() | ||
| }; | ||
| content = self | ||
| .text | ||
| .get(content_position[0]..content_position[1]) | ||
| .unwrap_or("") | ||
| .to_owned(); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all usages of content_position to assess underflow/slice-panic risk
rg -nP --type=rust 'content_position' -C3
# Specifically subtraction or range slicing on content_position
rg -nP --type=rust 'content_position\s*\[\s*1\s*\]\s*-\s*.*content_position\s*\[\s*0\s*\]' -C2
rg -nP --type=rust 'content_position\[0\]\s*\.\.\s*.*content_position\[1\]' -C2Repository: rainlanguage/dotrain
Length of output: 403
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the relevant code and all uses of `content_position`
FILE="crates/dotrain/src/parser/raindocument/logic.rs"
echo "== file outline =="
ast-grep outline "$FILE" --view expanded || true
echo
echo "== relevant lines =="
sed -n '780,825p' "$FILE" | cat -n
echo
echo "== all content_position references =="
rg -n --type=rust 'content_position' crates/dotrain/src -C 2 || true
echo
echo "== range/slice usage patterns around content_position =="
rg -n --type=rust 'content_position\s*\[[01]\]' crates/dotrain/src -C 2 || true
echo
echo "== nearby binding fields =="
rg -n --type=rust 'binding.*content_position|content_position.*binding' crates/dotrain/src -C 2 || trueRepository: rainlanguage/dotrain
Length of output: 24199
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="crates/dotrain/src/parser/raindocument/logic.rs"
echo "== tracked_trim / related helpers =="
rg -n --type=rust 'fn tracked_trim|tracked_trim\(' "$FILE" -C 4 || true
echo
echo "== process_binding body around the range calculation =="
sed -n '758,860p' "$FILE" | cat -n
echo
echo "== AST for Binding and Offsets =="
sed -n '220,250p' crates/dotrain/src/types/ast.rs | cat -n
echo
echo "== any tests around comment-only or whitespace-only bindings =="
rg -n --type=rust 'comment-only|whitespace-only|content_position extends|comment before|binding.*comment' crates/dotrain/src/parser/raindocument/mod.rs -C 2 || trueRepository: rainlanguage/dotrain
Length of output: 8244
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="crates/dotrain/src/parser/raindocument/logic.rs"
echo "== tracked_trim definition =="
ast-grep outline "$FILE" --view expanded | sed -n '1,120p' || true
echo
echo "== exact tracked_trim / trim helper block =="
rg -n --type=rust 'tracked_trim|fn .*trim' "$FILE" -C 6 || true
echo
echo "== comment-related test block =="
sed -n '940,990p' crates/dotrain/src/parser/raindocument/mod.rs | cat -n
echo
echo "== other content_position consumers in parser/raindocument =="
rg -n --type=rust 'content_position' crates/dotrain/src/parser/raindocument -C 2 || trueRepository: rainlanguage/dotrain
Length of output: 19722
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== tracked_trim definition across repo =="
rg -n --type=rust 'fn tracked_trim|tracked_trim\(' crates/dotrain/src -C 4 || true
echo
echo "== bindings/tests that mention comment-only or empty binding =="
rg -n --type=rust 'InvalidEmptyBinding|comment-only|comment before|extends through the comment|content_position extends' crates/dotrain/src -C 3 || true
echo
echo "== any direct use of content_position as a start/end offset in composer/parser =="
rg -n --type=rust 'content_position\[[01]\]' crates/dotrain/src -C 2 || trueRepository: rainlanguage/dotrain
Length of output: 11446
Clamp content_position before storing it
Bindings whose content is only comments after leading whitespace can still end up with start > end: trimmed_orig preserves the comment text, while raw_trimmed.2 trims the comment-replaced slice back to the start of the content. The local .get(...).unwrap_or("") hides it here, but content_position is still stored as a malformed range.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/dotrain/src/parser/raindocument/logic.rs` around lines 798 - 808, The
malformed range is being preserved in the `content_position` value even though
`self.text.get(...)` masks it locally. Update the range calculation in
`logic.rs` where `content_position` is derived in this branch so it is
clamped/normalized before assignment, ensuring `start <= end` for comment-only
bindings after whitespace trimming. Keep the fix scoped to the
`content_position` computation used just before `content` is read, and use the
existing `parsed_binding`, `trimmed_orig`, and `raw_trimmed` values to adjust
the bounds consistently.
| #[test] | ||
| fn test_comment_before_binding_excluded_from_previous_content_position() { | ||
| let store = Store::new(); | ||
| let meta_store = Arc::new(RwLock::new(store)); | ||
|
|
||
| // Comment before #b2 must not appear in b1's content or content_position. | ||
| let text = "---\n#b1\n! elided\n\n/* comment for b2 */\n#b2\n! elided2\n"; | ||
| let rain_document = | ||
| RainDocument::create(text.to_owned(), Some(meta_store.clone()), None, None); | ||
|
|
||
| let b1 = rain_document | ||
| .bindings() | ||
| .iter() | ||
| .find(|b| b.name == "b1") | ||
| .expect("binding b1 not found"); | ||
|
|
||
| // b1 content must be the elision text only, not including the comment. | ||
| assert_eq!(b1.content, "! elided"); | ||
|
|
||
| // b1 content_position must end at the last char of "! elided", not at the comment. | ||
| let content_end = b1.content_position[1]; | ||
| let text_at_end = text.get(b1.content_position[0]..content_end).unwrap(); | ||
| assert_eq!(text_at_end, "! elided"); | ||
|
|
||
| // The comment text must not appear anywhere inside b1's content_position range. | ||
| let b1_content_range = text | ||
| .get(b1.content_position[0]..b1.content_position[1]) | ||
| .unwrap(); | ||
| assert!( | ||
| !b1_content_range.contains("/*"), | ||
| "comment leaked into b1 content_position: {b1_content_range:?}" | ||
| ); | ||
|
|
||
| // b2 must also parse correctly. | ||
| let b2 = rain_document | ||
| .bindings() | ||
| .iter() | ||
| .find(|b| b.name == "b2") | ||
| .expect("binding b2 not found"); | ||
| assert_eq!(b2.content, "! elided2"); | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
Solid regression test for the exclusion path; consider covering the complementary cases.
This validates that a trailing comment before #b2 is excluded from b1. The PR also introduces an asymmetric rule where a comment at the start of a binding's own content is preserved (start bound uses original-text trim). That behavior, plus the comment-only binding edge case flagged in logic.rs, is currently untested. Adding cases like #b1\n/* doc */\n_: foo();\n#b2 (leading comment retained) and #b1\n\n/* note */\n#b2 (comment-only b1) would lock down the new positional contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/dotrain/src/parser/raindocument/mod.rs` around lines 953 - 993, The
new regression test only covers exclusion of a comment before the next binding,
but the positional contract also needs coverage for the complementary cases.
Extend the tests around RainDocument::create and binding parsing to verify that
a comment at the start of a binding’s own content is preserved in that binding’s
content/content_position, and add an edge case for a comment-only binding to
confirm it is handled correctly per the logic in the parser. Use the existing
test in raindocument/mod.rs as the location to add these cases and align
expectations with the behavior introduced in the binding/content extraction
logic.
Problem
Block comments immediately before a
#bindingdeclaration were attributed to the previous binding'scontent_positionrange. The previous binding'scontentstring andcontent_positionextended through the comment, which caused:Reproducer:
prev-binding.contentwas"! elided\n\n/* this comment belongs to next-binding, not prev */"instead of"! elided".Fix
In
process_binding,content_position[1](trailing bound) now uses trailing-trim from the modified document (where block comments have been replaced with spaces byfill_in), so trailing comment-whitespace is stripped.content_position[0](leading bound) continues to use the original text so block comments at the start of a binding's own content (e.g. a doc comment before the expression body) are preserved correctly.All 25 existing tests pass. Added a mutation-validated test that fails when the fix is reverted.
Closes #114
Co-Authored-By: Claude noreply@anthropic.com
Summary by CodeRabbit