Skip to content

fix(parser): comment before next binding bleeds into previous content_position#143

Open
thedavidmeister wants to merge 1 commit into
mainfrom
2026-06-29-issue-114-comment-position
Open

fix(parser): comment before next binding bleeds into previous content_position#143
thedavidmeister wants to merge 1 commit into
mainfrom
2026-06-29-issue-114-comment-position

Conversation

@thedavidmeister

@thedavidmeister thedavidmeister commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Problem

Block comments immediately before a #binding declaration were attributed to the previous binding's content_position range. The previous binding's content string and content_position extended through the comment, which caused:

  • LSP semantic-token highlights (elided-binding marker) to cover the comment area
  • Composer output to include the stray comment text when composing the preceding binding

Reproducer:

---
#prev-binding
! elided

/* this comment belongs to next-binding, not prev */
#next-binding
! elided2

prev-binding.content was "! elided\n\n/* this comment belongs to next-binding, not prev */" instead of "! elided".

Fix

In process_binding, content_position[1] (trailing bound) now uses trailing-trim from the modified document (where block comments have been replaced with spaces by fill_in), so trailing comment-whitespace is stripped. content_position[0] (leading bound) continues to use the original text so block comments at the start of a binding's own content (e.g. a doc comment before the expression body) are preserved correctly.

All 25 existing tests pass. Added a mutation-validated test that fails when the fix is reverted.

Closes #114

Co-Authored-By: Claude noreply@anthropic.com

Summary by CodeRabbit

  • Bug Fixes
    • Improved binding content extraction so leading whitespace and comments are preserved where expected, while comments before the next binding no longer get included in the previous binding’s content range.
    • Corrected content position handling to better match the actual text used for parsing and error reporting.
    • Added test coverage to ensure adjacent bindings are parsed with the right content boundaries.

…_position

Use trailing trim from the modified document (where comments are spaces)
for content_position[1] so a block comment immediately before the next #-binding
is not included in the preceding binding's content or content_position.

Leading trim still uses the original text so comments at the start of a
binding's own content area (e.g. /* doc comment */ before the expression)
are preserved correctly.

Closes #114

Co-Authored-By: Claude <noreply@anthropic.com>
@thedavidmeister thedavidmeister self-assigned this Jun 29, 2026
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

In RainDocument::process_binding, the computation of content_position and content is rewritten so that the start offset uses a trim of the original slice and the end offset uses the previously computed raw_trimmed.2, excluding leading comments of the next binding. A regression test verifies the fix.

Binding content range fix

Layer / File(s) Summary
process_binding content-range rewrite and regression test
crates/dotrain/src/parser/raindocument/logic.rs, crates/dotrain/src/parser/raindocument/mod.rs
content_position start is now derived from trimmed_orig and end from raw_trimmed.2; content is re-sliced from self.text using the new range. A new test asserts that a comment before #b2 is excluded from b1's content and content_position.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main fix: comment handling around the next binding's content position.
Linked Issues check ✅ Passed The change addresses #114 by preventing comments before a binding from being attributed to the previous binding, including the elided-binding case.
Out of Scope Changes check ✅ Passed The diff stays focused on the parser fix and its regression test, with no unrelated code changes shown.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 2026-06-29-issue-114-comment-position

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/dotrain/src/parser/raindocument/logic.rs`:
- Around line 798-808: The malformed range is being preserved in the
`content_position` value even though `self.text.get(...)` masks it locally.
Update the range calculation in `logic.rs` where `content_position` is derived
in this branch so it is clamped/normalized before assignment, ensuring `start <=
end` for comment-only bindings after whitespace trimming. Keep the fix scoped to
the `content_position` computation used just before `content` is read, and use
the existing `parsed_binding`, `trimmed_orig`, and `raw_trimmed` values to
adjust the bounds consistently.

In `@crates/dotrain/src/parser/raindocument/mod.rs`:
- Around line 953-993: The new regression test only covers exclusion of a
comment before the next binding, but the positional contract also needs coverage
for the complementary cases. Extend the tests around RainDocument::create and
binding parsing to verify that a comment at the start of a binding’s own content
is preserved in that binding’s content/content_position, and add an edge case
for a comment-only binding to confirm it is handled correctly per the logic in
the parser. Use the existing test in raindocument/mod.rs as the location to add
these cases and align expectations with the behavior introduced in the
binding/content extraction logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 37dc1c6e-c1e5-4411-a8fc-2f0ac9fdbbad

📥 Commits

Reviewing files that changed from the base of the PR and between eefd3b6 and 020fe70.

📒 Files selected for processing (2)
  • crates/dotrain/src/parser/raindocument/logic.rs
  • crates/dotrain/src/parser/raindocument/mod.rs

Comment on lines 798 to +808
} else {
[
parsed_binding.1[0] + boundry_offset + 1 + trimmed_content.1,
parsed_binding.1[1] - trimmed_content.2,
parsed_binding.1[0] + boundry_offset + 1 + trimmed_orig.1,
parsed_binding.1[1] - raw_trimmed.2,
]
};
content = if trimmed_content.0.is_empty() {
slices.1.to_owned()
} else {
trimmed_content.0.to_owned()
};
content = self
.text
.get(content_position[0]..content_position[1])
.unwrap_or("")
.to_owned();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all usages of content_position to assess underflow/slice-panic risk
rg -nP --type=rust 'content_position' -C3
# Specifically subtraction or range slicing on content_position
rg -nP --type=rust 'content_position\s*\[\s*1\s*\]\s*-\s*.*content_position\s*\[\s*0\s*\]' -C2
rg -nP --type=rust 'content_position\[0\]\s*\.\.\s*.*content_position\[1\]' -C2

Repository: rainlanguage/dotrain

Length of output: 403


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the relevant code and all uses of `content_position`
FILE="crates/dotrain/src/parser/raindocument/logic.rs"
echo "== file outline =="
ast-grep outline "$FILE" --view expanded || true

echo
echo "== relevant lines =="
sed -n '780,825p' "$FILE" | cat -n

echo
echo "== all content_position references =="
rg -n --type=rust 'content_position' crates/dotrain/src -C 2 || true

echo
echo "== range/slice usage patterns around content_position =="
rg -n --type=rust 'content_position\s*\[[01]\]' crates/dotrain/src -C 2 || true

echo
echo "== nearby binding fields =="
rg -n --type=rust 'binding.*content_position|content_position.*binding' crates/dotrain/src -C 2 || true

Repository: rainlanguage/dotrain

Length of output: 24199


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="crates/dotrain/src/parser/raindocument/logic.rs"

echo "== tracked_trim / related helpers =="
rg -n --type=rust 'fn tracked_trim|tracked_trim\(' "$FILE" -C 4 || true

echo
echo "== process_binding body around the range calculation =="
sed -n '758,860p' "$FILE" | cat -n

echo
echo "== AST for Binding and Offsets =="
sed -n '220,250p' crates/dotrain/src/types/ast.rs | cat -n

echo
echo "== any tests around comment-only or whitespace-only bindings =="
rg -n --type=rust 'comment-only|whitespace-only|content_position extends|comment before|binding.*comment' crates/dotrain/src/parser/raindocument/mod.rs -C 2 || true

Repository: rainlanguage/dotrain

Length of output: 8244


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="crates/dotrain/src/parser/raindocument/logic.rs"

echo "== tracked_trim definition =="
ast-grep outline "$FILE" --view expanded | sed -n '1,120p' || true

echo
echo "== exact tracked_trim / trim helper block =="
rg -n --type=rust 'tracked_trim|fn .*trim' "$FILE" -C 6 || true

echo
echo "== comment-related test block =="
sed -n '940,990p' crates/dotrain/src/parser/raindocument/mod.rs | cat -n

echo
echo "== other content_position consumers in parser/raindocument =="
rg -n --type=rust 'content_position' crates/dotrain/src/parser/raindocument -C 2 || true

Repository: rainlanguage/dotrain

Length of output: 19722


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== tracked_trim definition across repo =="
rg -n --type=rust 'fn tracked_trim|tracked_trim\(' crates/dotrain/src -C 4 || true

echo
echo "== bindings/tests that mention comment-only or empty binding =="
rg -n --type=rust 'InvalidEmptyBinding|comment-only|comment before|extends through the comment|content_position extends' crates/dotrain/src -C 3 || true

echo
echo "== any direct use of content_position as a start/end offset in composer/parser =="
rg -n --type=rust 'content_position\[[01]\]' crates/dotrain/src -C 2 || true

Repository: rainlanguage/dotrain

Length of output: 11446


Clamp content_position before storing it

Bindings whose content is only comments after leading whitespace can still end up with start > end: trimmed_orig preserves the comment text, while raw_trimmed.2 trims the comment-replaced slice back to the start of the content. The local .get(...).unwrap_or("") hides it here, but content_position is still stored as a malformed range.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/dotrain/src/parser/raindocument/logic.rs` around lines 798 - 808, The
malformed range is being preserved in the `content_position` value even though
`self.text.get(...)` masks it locally. Update the range calculation in
`logic.rs` where `content_position` is derived in this branch so it is
clamped/normalized before assignment, ensuring `start <= end` for comment-only
bindings after whitespace trimming. Keep the fix scoped to the
`content_position` computation used just before `content` is read, and use the
existing `parsed_binding`, `trimmed_orig`, and `raw_trimmed` values to adjust
the bounds consistently.

Comment on lines +953 to +993
#[test]
fn test_comment_before_binding_excluded_from_previous_content_position() {
let store = Store::new();
let meta_store = Arc::new(RwLock::new(store));

// Comment before #b2 must not appear in b1's content or content_position.
let text = "---\n#b1\n! elided\n\n/* comment for b2 */\n#b2\n! elided2\n";
let rain_document =
RainDocument::create(text.to_owned(), Some(meta_store.clone()), None, None);

let b1 = rain_document
.bindings()
.iter()
.find(|b| b.name == "b1")
.expect("binding b1 not found");

// b1 content must be the elision text only, not including the comment.
assert_eq!(b1.content, "! elided");

// b1 content_position must end at the last char of "! elided", not at the comment.
let content_end = b1.content_position[1];
let text_at_end = text.get(b1.content_position[0]..content_end).unwrap();
assert_eq!(text_at_end, "! elided");

// The comment text must not appear anywhere inside b1's content_position range.
let b1_content_range = text
.get(b1.content_position[0]..b1.content_position[1])
.unwrap();
assert!(
!b1_content_range.contains("/*"),
"comment leaked into b1 content_position: {b1_content_range:?}"
);

// b2 must also parse correctly.
let b2 = rain_document
.bindings()
.iter()
.find(|b| b.name == "b2")
.expect("binding b2 not found");
assert_eq!(b2.content, "! elided2");
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Solid regression test for the exclusion path; consider covering the complementary cases.

This validates that a trailing comment before #b2 is excluded from b1. The PR also introduces an asymmetric rule where a comment at the start of a binding's own content is preserved (start bound uses original-text trim). That behavior, plus the comment-only binding edge case flagged in logic.rs, is currently untested. Adding cases like #b1\n/* doc */\n_: foo();\n#b2 (leading comment retained) and #b1\n\n/* note */\n#b2 (comment-only b1) would lock down the new positional contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/dotrain/src/parser/raindocument/mod.rs` around lines 953 - 993, The
new regression test only covers exclusion of a comment before the next binding,
but the positional contract also needs coverage for the complementary cases.
Extend the tests around RainDocument::create and binding parsing to verify that
a comment at the start of a binding’s own content is preserved in that binding’s
content/content_position, and add an edge case for a comment-only binding to
confirm it is handled correctly per the logic in the parser. Use the existing
test in raindocument/mod.rs as the location to add these cases and align
expectations with the behavior introduced in the binding/content extraction
logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix comment parsings

1 participant