Redact embedded credentials before persisting remote_url (DFT-13) by spoorcc · Pull Request #1249 · dfetch-org/dfetch

spoorcc · 2026-06-04T14:58:20Z

Summary

Strip user:password@ userinfo from every remote_url written to .dfetch_data.yaml — both the top-level field and each dependencies[].remote_url (git-submodule / svn:external entries) — so a manifest URL with an embedded Personal Access Token no longer leaks the credential into the on-disk metadata file (and from there into any commit / clone / CI checkout of it).
Redaction is local to Metadata.dump() in dfetch/project/metadata.py. The in-memory Metadata object held by the running command keeps the original URL, so in-flight authenticated fetches are unaffected — only the on-disk representation is sanitised.
Flip the DFT-13 entry in the usage threat model from Accept to Mitigate, document the new control as C-035 — Persisted-metadata credential redaction, and update the A-17 / A-18 asset descriptions to reflect that the credential no longer reaches disk.

Why this is safe

.dfetch_data.yaml is write-only for recording purposes: no code path reads the persisted remote_url back to drive a fetch. Metadata.__eq__ is the only comparison that references remote_url and it is defined but never called; check, diff, and update always rebuild Metadata from the live ProjectEntry (which still holds the credential), so redacting at dump() does not cause spurious "out of date" reports.

Test plan

pytest tests/test_metadata.py — 21 passing, including new parametrised _strip_userinfo cases (user:token@host, user@host, host-only, explicit port, query-string preserved, file:// round-trip, empty / path-only inputs) and dump-then-from_file round-trips for both remote_url and dependencies[].remote_url plus an assertion that the in-memory URL is untouched
pytest tests/ — full unit suite green except 6 pre-existing failures in tests/test_sbom_reporter.py caused by a missing optional cyclonedx-python-lib[json-validation] install in this environment (unrelated to this change)
behave features/ --tags=update to confirm end-to-end dfetch update still writes .dfetch_data.yaml correctly
Manual smoke: point a manifest at https://user:secret@example.com/repo.tar.gz, run dfetch update, confirm the persisted remote_url: contains neither user:secret nor user@

https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV

Generated by Claude Code

Summary by CodeRabbit

Bug Fixes
- Credentials embedded in remote URLs are now stripped before being written to persisted metadata, preventing accidental exposure.
Documentation
- Threat model and controls updated to describe the persisted-metadata credential redaction mitigation.
Tests
- Tests added to verify credentials are redacted in persisted metadata while in-memory URLs remain unchanged.

Strip the user:password@ component from remote_url (and each dependencies[].remote_url) when Metadata.dump() writes .dfetch_data.yaml, so a manifest URL like https://user:TOKEN@host/repo.git no longer leaks its credential into a file typically committed to VCS. Addresses DFT-13 in the usage threat model. https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV

Flip the DFT-13 threat response in the usage model from Accept to Mitigate, add control C-035 (persisted-metadata credential redaction) pointing at Metadata.dump(), and refresh the A-17 / A-18 asset descriptions to record that the on-disk metadata file no longer carries embedded URL credentials. Regenerate doc/explanation/threat_model_usage.rst and add a changelog entry for 0.14.0. https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV

coderabbitai · 2026-06-04T14:58:34Z

Walkthrough

This PR strips embedded URL userinfo (user:password@) from persisted remote_url fields (top-level and dependencies[]) when writing .dfetch_data.yaml, preserves in-memory URLs for fetch operations, adds tests for the behavior, and updates threat-model docs and the changelog to record the new control (C-036) and DFT-13 mitigation.

Changes

Credential redaction in persisted metadata

Layer / File(s)	Summary
Credential stripping implementation `dfetch/project/metadata.py`	Adds `urlsplit`/`urlunsplit` import and `_strip_userinfo(url: str)` helper that reconstructs URLs without `user:password@`, and applies this sanitizer to the project `remote_url` and to each dependency's `remote_url` during `Metadata.dump()`.
Unit and integration tests `tests/test_metadata.py`	Adds parametrized tests for `_strip_userinfo`, a `_dump_metadata` test helper, and tests asserting that `Metadata.dump()` redacts credentials from on-disk `dfetch.remote_url` and `dfetch.dependencies[].remote_url` while leaving the in-memory `meta.remote_url` unchanged and supporting round-trip load.
Threat model, docs, and changelog `security/tm_usage.py`, `doc/explanation/threat_model_usage.rst`, `CHANGELOG.rst`	Introduces control `C-036` documenting persisted-metadata credential redaction, updates A-17/A-18/DF-08 descriptions and changes DFT-13 from Accept→Mitigate; updates generated threat-model docs and adds a changelog entry noting credential stripping.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

documentation

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: redacting embedded credentials from persisted remote_url entries to address threat DFT-13.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/threat-model-low-hanging-fruit-cwtt3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dfetch/project/metadata.py`:
- Around line 15-32: _strip_userinfo currently rebuilds the netloc using
parsed.hostname which drops IPv6 brackets, producing invalid netlocs like
"::1:8080"; update _strip_userinfo to detect IPv6 hosts (parsed.hostname
containing ":"), wrap the hostname in square brackets when reconstructing
netloc, and then append the port if parsed.port is an int; keep using
parsed.hostname and parsed.port, then call urlunsplit as before (ensure you
still return the original URL when host and parsed.netloc are empty). This
change should be made inside the _strip_userinfo function and use
parsed.hostname, parsed.port, and urlunsplit to locate the fix.

In `@security/tm_usage.py`:
- Around line 997-1014: The Control with id "C-035" (Control(... id="C-035",
name="Persisted-metadata credential redaction")) collides with an existing gap
reference ("gap C-035" mentioned elsewhere); rename one of the entries to a new
unique ID (e.g., "C-036" or another unused tag) and update all references
consistently — update the Control's id field and any textual/back-reference like
"gap C-035" to the new ID so threat-model traceability remains unique and
consistent across Metadata.dump(), dependencies[].remote_url, and the gap
reference.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7b145147-6638-4f52-9ca6-2dd395f05277

📥 Commits

Reviewing files that changed from the base of the PR and between a6e3c1c and 4b51008.

📒 Files selected for processing (5)

CHANGELOG.rst
dfetch/project/metadata.py
doc/explanation/threat_model_usage.rst
security/tm_usage.py
tests/test_metadata.py

parsed.hostname drops the square brackets around IPv6 literals, so rebuilding the netloc as f"{host}:{port}" produced an invalid form like "::1:8080". Detect IPv6 hosts (":" in host) and wrap them in brackets before reconstruction. Also tightens Metadata.dump()'s dependencies redaction to construct each entry via the Dependency TypedDict explicitly, restoring mypy strict-type checking on the assignment. https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV

The new control collided with the pre-existing ``gap C-035`` placeholder that points at the upstream-source-attestation gap (DF-19 description in tm_usage.py). Rename the implemented control to C-036, the next free slot, keeping ``gap C-035`` available for the still-unfilled upstream attestation control. Updates the Control id, all back-references in tm_usage.py (``mitigated by`` comments, A-17 description, DF-08 description) and the three corresponding cells in the generated threat_model_usage.rst. https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV

spoorcc · 2026-06-06T17:12:29Z

@coderabbitai review

coderabbitai · 2026-06-06T17:12:42Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

doc/explanation/threat_model_usage.rst (1)

1199-1203: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the trailing newline and commit the regenerated file.

pre-commit end-of-file-fixer changed this file in CI, so the current commit is missing the expected EOF state.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@doc/explanation/threat_model_usage.rst` around lines 1199 - 1203, The file is
missing a trailing newline which broke the pre-commit EOF fixer; open the
regenerated documentation file (the one describing
Metadata.dump()/dfetch/project/metadata.py), add a single newline at the end of
the file, save, and commit the change so CI’s pre-commit check no longer
modifies the file.
Source: Pipeline failures

tests/test_metadata.py (2)

106-128: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Commit the Black rewrite for this file.

CI reports black modified tests/test_metadata.py; please commit formatter output to clear the pre-commit gate.

As per coding guidelines, black is required for formatting, and pipeline logs show this file is not in formatted state.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_metadata.py` around lines 106 - 128, The file
tests/test_metadata.py is not formatted by Black; run the Black formatter on the
repo (or on tests/test_metadata.py) to apply the code style changes that CI
reported, then add and commit the formatted file so the pytest.mark.parametrize
block and surrounding code (e.g., the URL test cases) match Black’s output and
the pre-commit/CI gate passes.

Sources: Coding guidelines, Pipeline failures

135-146: ⚠️ Potential issue | 🟠 Major

Fix Metadata(...) dict literals in tests/test_metadata.py to include required Options keys (Pyright).

Options is a TypedDict with required fields; the two Metadata({...}) calls omit them (first at ~138, second at ~194), causing Pyright to fail.

Suggested fix

+import datetime
 import textwrap
@@
 def _dump_metadata(tmp_path, *, remote_url, dependencies=None):
@@
     meta = Metadata(
         {
+            "last_fetch": datetime.datetime(2000, 1, 1, 0, 0, 0),
             "remote_url": remote_url,
             "branch": "main",
+            "tag": "",
             "revision": "abc123",
             "hash": "deadbeef",
+            "patch": "",
             "destination": str(tmp_path),
             "dependencies": dependencies or [],
         }
     )
@@
     meta = Metadata(
         {
+            "last_fetch": datetime.datetime(2000, 1, 1, 0, 0, 0),
             "remote_url": "https://carol:hunter2@example.com/org/repo.git",
+            "branch": "",
+            "tag": "",
+            "revision": "",
+            "hash": "",
+            "patch": "",
+            "dependencies": [],
             "destination": str(tmp_path),
         }
     )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_metadata.py` around lines 135 - 146, The Metadata constructor
calls in _dump_metadata and the other test helper are passing dicts missing
required keys from the Options TypedDict; update both dict literals passed to
Metadata(...) to include the required Options fields (add all mandatory keys
from Options with suitable test values), e.g. extend the dict in _dump_metadata
and the second Metadata(...) call to include those missing keys so Pyright
accepts the TypedDict usage; search for Metadata(...) in tests/test_metadata.py
to find both locations.

Sources: Coding guidelines, Pipeline failures

security/tm_usage.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Run and commit Black on this file.

The DevContainer pipeline reports pre-commit black reformatted security/tm_usage.py; commit that change to unblock CI.

As per coding guidelines, black formatting is required, and pipeline logs confirm this file is currently non-compliant.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@security/tm_usage.py` at line 1, Run the Black formatter on
security/tm_usage.py and commit the resulting changes so the file matches the
project's pre-commit formatting; e.g., run the project's pre-commit hook or
`black security/tm_usage.py` (or `pre-commit run --files security/tm_usage.py`)
and then stage and commit the modified file to unblock CI.
Sources: Coding guidelines, Pipeline failures

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@doc/explanation/threat_model_usage.rst`:
- Around line 1199-1203: The file is missing a trailing newline which broke the
pre-commit EOF fixer; open the regenerated documentation file (the one
describing Metadata.dump()/dfetch/project/metadata.py), add a single newline at
the end of the file, save, and commit the change so CI’s pre-commit check no
longer modifies the file.

In `@security/tm_usage.py`:
- Line 1: Run the Black formatter on security/tm_usage.py and commit the
resulting changes so the file matches the project's pre-commit formatting; e.g.,
run the project's pre-commit hook or `black security/tm_usage.py` (or
`pre-commit run --files security/tm_usage.py`) and then stage and commit the
modified file to unblock CI.

In `@tests/test_metadata.py`:
- Around line 106-128: The file tests/test_metadata.py is not formatted by
Black; run the Black formatter on the repo (or on tests/test_metadata.py) to
apply the code style changes that CI reported, then add and commit the formatted
file so the pytest.mark.parametrize block and surrounding code (e.g., the URL
test cases) match Black’s output and the pre-commit/CI gate passes.
- Around line 135-146: The Metadata constructor calls in _dump_metadata and the
other test helper are passing dicts missing required keys from the Options
TypedDict; update both dict literals passed to Metadata(...) to include the
required Options fields (add all mandatory keys from Options with suitable test
values), e.g. extend the dict in _dump_metadata and the second Metadata(...)
call to include those missing keys so Pyright accepts the TypedDict usage;
search for Metadata(...) in tests/test_metadata.py to find both locations.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0b12e575-fe06-4c0d-a924-7f08b7d0a01e

📥 Commits

Reviewing files that changed from the base of the PR and between 4b51008 and f63baf2.

📒 Files selected for processing (4)

dfetch/project/metadata.py
doc/explanation/threat_model_usage.rst
security/tm_usage.py
tests/test_metadata.py

* end-of-file-fixer trimmed two stray trailing newlines from threat_model_usage.rst * black wrapped two over-long comment lines in tm_usage.py and reformatted the parametrize block in test_metadata.py * Both Metadata({...}) calls in test_metadata.py now include every field required by the Options TypedDict (last_fetch, tag, patch and, for the in-memory test, branch/revision/hash/dependencies), silencing pyright's reportArgumentType warning without resorting to suppression comments https://claude.ai/code/session_019iPNdP6S7SwMviFE1uEjEV

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_metadata.py (1)
133-133: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Convert new test docstrings to Google style format.

The newly added test docstrings are sentence-style, but this repo requires Google-style docstrings for test functions under tests/**/*.py.

As per coding guidelines, “Docstrings in test functions should follow Google style convention.”

Also applies to: 158-158, 172-172, 197-197
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_metadata.py` at line 133, Multiple test docstrings were written as
sentence-style descriptions (e.g. "_strip_userinfo removes userinfo, preserves
scheme/host/port/path/query.") but must follow the repo's Google-style test
docstring convention; update each affected test function's docstring to a
Google-style format (single-line summary followed by a blank line and optional
explanatory text) for the tests in this file (replace the simple sentence
docstrings at the locations noted), ensuring you keep the same meaning and
punctuation and apply the same change to the other occurrences flagged (the
similar docstrings at the other test functions).
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/test_metadata.py`:
- Line 133: Multiple test docstrings were written as sentence-style descriptions
(e.g. "_strip_userinfo removes userinfo, preserves
scheme/host/port/path/query.") but must follow the repo's Google-style test
docstring convention; update each affected test function's docstring to a
Google-style format (single-line summary followed by a blank line and optional
explanatory text) for the tests in this file (replace the simple sentence
docstrings at the locations noted), ensuring you keep the same meaning and
punctuation and apply the same change to the other occurrences flagged (the
similar docstrings at the other test functions).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fec07e20-185b-4d51-b5d9-acf172ad6304

📥 Commits

Reviewing files that changed from the base of the PR and between f63baf2 and 6baf44b.

📒 Files selected for processing (3)

doc/explanation/threat_model_usage.rst
security/tm_usage.py
tests/test_metadata.py

💤 Files with no reviewable changes (1)

doc/explanation/threat_model_usage.rst

spoorcc · 2026-06-07T08:26:57Z

@coderabbitai review

coderabbitai · 2026-06-07T08:27:02Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

claude added 2 commits June 4, 2026 14:57

spoorcc temporarily deployed to testpypi June 4, 2026 14:59 — with GitHub Actions Inactive

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread dfetch/project/metadata.py

Comment thread security/tm_usage.py

claude added 2 commits June 4, 2026 15:21

spoorcc temporarily deployed to testpypi June 4, 2026 15:23 — with GitHub Actions Inactive

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

spoorcc temporarily deployed to testpypi June 6, 2026 21:03 — with GitHub Actions Inactive

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

spoorcc merged commit a7ae5b2 into main Jun 7, 2026
35 checks passed

spoorcc deleted the claude/threat-model-low-hanging-fruit-cwtt3 branch June 7, 2026 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redact embedded credentials before persisting remote_url (DFT-13)#1249

Redact embedded credentials before persisting remote_url (DFT-13)#1249
spoorcc merged 5 commits into
mainfrom
claude/threat-model-low-hanging-fruit-cwtt3

spoorcc commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

spoorcc commented Jun 6, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

spoorcc commented Jun 7, 2026

Uh oh!

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

spoorcc commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this is safe

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

spoorcc commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

spoorcc commented Jun 7, 2026

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spoorcc commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

spoorcc commented Jun 6, 2026 •

edited

Loading

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading