Skip to content

Redact embedded credentials before persisting remote_url (DFT-13)#1249

Merged
spoorcc merged 5 commits into
mainfrom
claude/threat-model-low-hanging-fruit-cwtt3
Jun 7, 2026
Merged

Redact embedded credentials before persisting remote_url (DFT-13)#1249
spoorcc merged 5 commits into
mainfrom
claude/threat-model-low-hanging-fruit-cwtt3

Conversation

@spoorcc

@spoorcc spoorcc commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Strip user:password@ userinfo from every remote_url written to .dfetch_data.yaml — both the top-level field and each dependencies[].remote_url (git-submodule / svn:external entries) — so a manifest URL with an embedded Personal Access Token no longer leaks the credential into the on-disk metadata file (and from there into any commit / clone / CI checkout of it).
  • Redaction is local to Metadata.dump() in dfetch/project/metadata.py. The in-memory Metadata object held by the running command keeps the original URL, so in-flight authenticated fetches are unaffected — only the on-disk representation is sanitised.
  • Flip the DFT-13 entry in the usage threat model from Accept to Mitigate, document the new control as C-035 — Persisted-metadata credential redaction, and update the A-17 / A-18 asset descriptions to reflect that the credential no longer reaches disk.

Why this is safe

.dfetch_data.yaml is write-only for recording purposes: no code path reads the persisted remote_url back to drive a fetch. Metadata.__eq__ is the only comparison that references remote_url and it is defined but never called; check, diff, and update always rebuild Metadata from the live ProjectEntry (which still holds the credential), so redacting at dump() does not cause spurious "out of date" reports.

Test plan

  • pytest tests/test_metadata.py — 21 passing, including new parametrised _strip_userinfo cases (user:token@host, user@host, host-only, explicit port, query-string preserved, file:// round-trip, empty / path-only inputs) and dump-then-from_file round-trips for both remote_url and dependencies[].remote_url plus an assertion that the in-memory URL is untouched
  • pytest tests/ — full unit suite green except 6 pre-existing failures in tests/test_sbom_reporter.py caused by a missing optional cyclonedx-python-lib[json-validation] install in this environment (unrelated to this change)
  • behave features/ --tags=update to confirm end-to-end dfetch update still writes .dfetch_data.yaml correctly
  • Manual smoke: point a manifest at https://user:secret@example.com/repo.tar.gz, run dfetch update, confirm the persisted remote_url: contains neither user:secret nor user@

https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV


Generated by Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Credentials embedded in remote URLs are now stripped before being written to persisted metadata, preventing accidental exposure.
  • Documentation

    • Threat model and controls updated to describe the persisted-metadata credential redaction mitigation.
  • Tests

    • Tests added to verify credentials are redacted in persisted metadata while in-memory URLs remain unchanged.

claude added 2 commits June 4, 2026 14:57
Strip the user:password@ component from remote_url (and each
dependencies[].remote_url) when Metadata.dump() writes .dfetch_data.yaml,
so a manifest URL like https://user:TOKEN@host/repo.git no longer leaks
its credential into a file typically committed to VCS. Addresses DFT-13
in the usage threat model.

https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV
Flip the DFT-13 threat response in the usage model from Accept to
Mitigate, add control C-035 (persisted-metadata credential redaction)
pointing at Metadata.dump(), and refresh the A-17 / A-18 asset
descriptions to record that the on-disk metadata file no longer carries
embedded URL credentials. Regenerate doc/explanation/threat_model_usage.rst
and add a changelog entry for 0.14.0.

https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR strips embedded URL userinfo (user:password@) from persisted remote_url fields (top-level and dependencies[]) when writing .dfetch_data.yaml, preserves in-memory URLs for fetch operations, adds tests for the behavior, and updates threat-model docs and the changelog to record the new control (C-036) and DFT-13 mitigation.

Changes

Credential redaction in persisted metadata

Layer / File(s) Summary
Credential stripping implementation
dfetch/project/metadata.py
Adds urlsplit/urlunsplit import and _strip_userinfo(url: str) helper that reconstructs URLs without user:password@, and applies this sanitizer to the project remote_url and to each dependency's remote_url during Metadata.dump().
Unit and integration tests
tests/test_metadata.py
Adds parametrized tests for _strip_userinfo, a _dump_metadata test helper, and tests asserting that Metadata.dump() redacts credentials from on-disk dfetch.remote_url and dfetch.dependencies[].remote_url while leaving the in-memory meta.remote_url unchanged and supporting round-trip load.
Threat model, docs, and changelog
security/tm_usage.py, doc/explanation/threat_model_usage.rst, CHANGELOG.rst
Introduces control C-036 documenting persisted-metadata credential redaction, updates A-17/A-18/DF-08 descriptions and changes DFT-13 from Accept→Mitigate; updates generated threat-model docs and adds a changelog entry noting credential stripping.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

documentation

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: redacting embedded credentials from persisted remote_url entries to address threat DFT-13.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/threat-model-low-hanging-fruit-cwtt3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dfetch/project/metadata.py`:
- Around line 15-32: _strip_userinfo currently rebuilds the netloc using
parsed.hostname which drops IPv6 brackets, producing invalid netlocs like
"::1:8080"; update _strip_userinfo to detect IPv6 hosts (parsed.hostname
containing ":"), wrap the hostname in square brackets when reconstructing
netloc, and then append the port if parsed.port is an int; keep using
parsed.hostname and parsed.port, then call urlunsplit as before (ensure you
still return the original URL when host and parsed.netloc are empty). This
change should be made inside the _strip_userinfo function and use
parsed.hostname, parsed.port, and urlunsplit to locate the fix.

In `@security/tm_usage.py`:
- Around line 997-1014: The Control with id "C-035" (Control(... id="C-035",
name="Persisted-metadata credential redaction")) collides with an existing gap
reference ("gap C-035" mentioned elsewhere); rename one of the entries to a new
unique ID (e.g., "C-036" or another unused tag) and update all references
consistently — update the Control's id field and any textual/back-reference like
"gap C-035" to the new ID so threat-model traceability remains unique and
consistent across Metadata.dump(), dependencies[].remote_url, and the gap
reference.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7b145147-6638-4f52-9ca6-2dd395f05277

📥 Commits

Reviewing files that changed from the base of the PR and between a6e3c1c and 4b51008.

📒 Files selected for processing (5)
  • CHANGELOG.rst
  • dfetch/project/metadata.py
  • doc/explanation/threat_model_usage.rst
  • security/tm_usage.py
  • tests/test_metadata.py

Comment thread dfetch/project/metadata.py
Comment thread security/tm_usage.py
claude added 2 commits June 4, 2026 15:21
parsed.hostname drops the square brackets around IPv6 literals, so
rebuilding the netloc as f"{host}:{port}" produced an invalid form like
"::1:8080".  Detect IPv6 hosts (":" in host) and wrap them in brackets
before reconstruction.

Also tightens Metadata.dump()'s dependencies redaction to construct
each entry via the Dependency TypedDict explicitly, restoring mypy
strict-type checking on the assignment.

https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV
The new control collided with the pre-existing ``gap C-035`` placeholder
that points at the upstream-source-attestation gap (DF-19 description
in tm_usage.py).  Rename the implemented control to C-036, the next
free slot, keeping ``gap C-035`` available for the still-unfilled
upstream attestation control.

Updates the Control id, all back-references in tm_usage.py
(``mitigated by`` comments, A-17 description, DF-08 description) and
the three corresponding cells in the generated threat_model_usage.rst.

https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV
@spoorcc

spoorcc commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
doc/explanation/threat_model_usage.rst (1)

1199-1203: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the trailing newline and commit the regenerated file.

pre-commit end-of-file-fixer changed this file in CI, so the current commit is missing the expected EOF state.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@doc/explanation/threat_model_usage.rst` around lines 1199 - 1203, The file is
missing a trailing newline which broke the pre-commit EOF fixer; open the
regenerated documentation file (the one describing
Metadata.dump()/dfetch/project/metadata.py), add a single newline at the end of
the file, save, and commit the change so CI’s pre-commit check no longer
modifies the file.

Source: Pipeline failures

tests/test_metadata.py (2)

106-128: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Commit the Black rewrite for this file.

CI reports black modified tests/test_metadata.py; please commit formatter output to clear the pre-commit gate.

As per coding guidelines, black is required for formatting, and pipeline logs show this file is not in formatted state.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_metadata.py` around lines 106 - 128, The file
tests/test_metadata.py is not formatted by Black; run the Black formatter on the
repo (or on tests/test_metadata.py) to apply the code style changes that CI
reported, then add and commit the formatted file so the pytest.mark.parametrize
block and surrounding code (e.g., the URL test cases) match Black’s output and
the pre-commit/CI gate passes.

Sources: Coding guidelines, Pipeline failures


135-146: ⚠️ Potential issue | 🟠 Major

Fix Metadata(...) dict literals in tests/test_metadata.py to include required Options keys (Pyright).

Options is a TypedDict with required fields; the two Metadata({...}) calls omit them (first at ~138, second at ~194), causing Pyright to fail.

Suggested fix
+import datetime
 import textwrap
@@
 def _dump_metadata(tmp_path, *, remote_url, dependencies=None):
@@
     meta = Metadata(
         {
+            "last_fetch": datetime.datetime(2000, 1, 1, 0, 0, 0),
             "remote_url": remote_url,
             "branch": "main",
+            "tag": "",
             "revision": "abc123",
             "hash": "deadbeef",
+            "patch": "",
             "destination": str(tmp_path),
             "dependencies": dependencies or [],
         }
     )
@@
     meta = Metadata(
         {
+            "last_fetch": datetime.datetime(2000, 1, 1, 0, 0, 0),
             "remote_url": "https://carol:hunter2@example.com/org/repo.git",
+            "branch": "",
+            "tag": "",
+            "revision": "",
+            "hash": "",
+            "patch": "",
+            "dependencies": [],
             "destination": str(tmp_path),
         }
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_metadata.py` around lines 135 - 146, The Metadata constructor
calls in _dump_metadata and the other test helper are passing dicts missing
required keys from the Options TypedDict; update both dict literals passed to
Metadata(...) to include the required Options fields (add all mandatory keys
from Options with suitable test values), e.g. extend the dict in _dump_metadata
and the second Metadata(...) call to include those missing keys so Pyright
accepts the TypedDict usage; search for Metadata(...) in tests/test_metadata.py
to find both locations.

Sources: Coding guidelines, Pipeline failures

security/tm_usage.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Run and commit Black on this file.

The DevContainer pipeline reports pre-commit black reformatted security/tm_usage.py; commit that change to unblock CI.

As per coding guidelines, black formatting is required, and pipeline logs confirm this file is currently non-compliant.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@security/tm_usage.py` at line 1, Run the Black formatter on
security/tm_usage.py and commit the resulting changes so the file matches the
project's pre-commit formatting; e.g., run the project's pre-commit hook or
`black security/tm_usage.py` (or `pre-commit run --files security/tm_usage.py`)
and then stage and commit the modified file to unblock CI.

Sources: Coding guidelines, Pipeline failures

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@doc/explanation/threat_model_usage.rst`:
- Around line 1199-1203: The file is missing a trailing newline which broke the
pre-commit EOF fixer; open the regenerated documentation file (the one
describing Metadata.dump()/dfetch/project/metadata.py), add a single newline at
the end of the file, save, and commit the change so CI’s pre-commit check no
longer modifies the file.

In `@security/tm_usage.py`:
- Line 1: Run the Black formatter on security/tm_usage.py and commit the
resulting changes so the file matches the project's pre-commit formatting; e.g.,
run the project's pre-commit hook or `black security/tm_usage.py` (or
`pre-commit run --files security/tm_usage.py`) and then stage and commit the
modified file to unblock CI.

In `@tests/test_metadata.py`:
- Around line 106-128: The file tests/test_metadata.py is not formatted by
Black; run the Black formatter on the repo (or on tests/test_metadata.py) to
apply the code style changes that CI reported, then add and commit the formatted
file so the pytest.mark.parametrize block and surrounding code (e.g., the URL
test cases) match Black’s output and the pre-commit/CI gate passes.
- Around line 135-146: The Metadata constructor calls in _dump_metadata and the
other test helper are passing dicts missing required keys from the Options
TypedDict; update both dict literals passed to Metadata(...) to include the
required Options fields (add all mandatory keys from Options with suitable test
values), e.g. extend the dict in _dump_metadata and the second Metadata(...)
call to include those missing keys so Pyright accepts the TypedDict usage;
search for Metadata(...) in tests/test_metadata.py to find both locations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0b12e575-fe06-4c0d-a924-7f08b7d0a01e

📥 Commits

Reviewing files that changed from the base of the PR and between 4b51008 and f63baf2.

📒 Files selected for processing (4)
  • dfetch/project/metadata.py
  • doc/explanation/threat_model_usage.rst
  • security/tm_usage.py
  • tests/test_metadata.py

* end-of-file-fixer trimmed two stray trailing newlines from
  threat_model_usage.rst
* black wrapped two over-long comment lines in tm_usage.py and
  reformatted the parametrize block in test_metadata.py
* Both Metadata({...}) calls in test_metadata.py now include every
  field required by the Options TypedDict (last_fetch, tag, patch
  and, for the in-memory test, branch/revision/hash/dependencies),
  silencing pyright's reportArgumentType warning without resorting
  to suppression comments

https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_metadata.py (1)

133-133: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Convert new test docstrings to Google style format.

The newly added test docstrings are sentence-style, but this repo requires Google-style docstrings for test functions under tests/**/*.py.

As per coding guidelines, “Docstrings in test functions should follow Google style convention.”

Also applies to: 158-158, 172-172, 197-197

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_metadata.py` at line 133, Multiple test docstrings were written as
sentence-style descriptions (e.g. "_strip_userinfo removes userinfo, preserves
scheme/host/port/path/query.") but must follow the repo's Google-style test
docstring convention; update each affected test function's docstring to a
Google-style format (single-line summary followed by a blank line and optional
explanatory text) for the tests in this file (replace the simple sentence
docstrings at the locations noted), ensuring you keep the same meaning and
punctuation and apply the same change to the other occurrences flagged (the
similar docstrings at the other test functions).

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/test_metadata.py`:
- Line 133: Multiple test docstrings were written as sentence-style descriptions
(e.g. "_strip_userinfo removes userinfo, preserves
scheme/host/port/path/query.") but must follow the repo's Google-style test
docstring convention; update each affected test function's docstring to a
Google-style format (single-line summary followed by a blank line and optional
explanatory text) for the tests in this file (replace the simple sentence
docstrings at the locations noted), ensuring you keep the same meaning and
punctuation and apply the same change to the other occurrences flagged (the
similar docstrings at the other test functions).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fec07e20-185b-4d51-b5d9-acf172ad6304

📥 Commits

Reviewing files that changed from the base of the PR and between f63baf2 and 6baf44b.

📒 Files selected for processing (3)
  • doc/explanation/threat_model_usage.rst
  • security/tm_usage.py
  • tests/test_metadata.py
💤 Files with no reviewable changes (1)
  • doc/explanation/threat_model_usage.rst

@spoorcc

spoorcc commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@spoorcc spoorcc merged commit a7ae5b2 into main Jun 7, 2026
35 checks passed
@spoorcc spoorcc deleted the claude/threat-model-low-hanging-fruit-cwtt3 branch June 7, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants