Skip to content

Fix path-traversal vulnerability in emergency P2P checkpoint service#3105

Open
YuvalElbar6 wants to merge 2 commits intogoogle:mainfrom
YuvalElbar6:main
Open

Fix path-traversal vulnerability in emergency P2P checkpoint service#3105
YuvalElbar6 wants to merge 2 commits intogoogle:mainfrom
YuvalElbar6:main

Conversation

@YuvalElbar6
Copy link
Copy Markdown

A malicious or compromised peer on the P2P network could supply a manifest whose rel_path contained '..' segments or an absolute path, causing P2PNode.fetch_shard_from_peer() to write attacker-controlled bytes outside the staging directory (e.g. a .pth file in site-packages, yielding persistent RCE on the training host).

  • Add _safe_path_join() which joins a peer-supplied relative path onto a base directory only if the resolved result stays inside that base. Resolution goes through os.path.realpath so symlink-escape attempts are caught as well.
  • Apply the helper on both sides of the wire:
    • Client: fetch_shard_from_peer() validates every manifest entry against stage_dir and aborts the whole fetch on any unsafe entry.
    • Server: handle_download() replaces the substring '..' check with the same resolve-based containment check against self.directory.
  • Log every rejection with peer and request context.
  • Add regression tests for the helper and both call sites.

Reported via the Google OSS VRP.

YuvalElbar6 and others added 2 commits April 17, 2026 08:45
A malicious or compromised peer on the P2P network could supply a
manifest whose rel_path contained '..' segments or an absolute path,
causing P2PNode.fetch_shard_from_peer() to write attacker-controlled
bytes outside the staging directory (e.g. a .pth file in site-packages,
yielding persistent RCE on the training host).

- Add _safe_path_join() which joins a peer-supplied relative path onto
  a base directory only if the resolved result stays inside that base.
  Resolution goes through os.path.realpath so symlink-escape attempts
  are caught as well.
- Apply the helper on both sides of the wire:
  * Client: fetch_shard_from_peer() validates every manifest entry
    against stage_dir and aborts the whole fetch on any unsafe entry.
  * Server: handle_download() replaces the substring '..' check with
    the same resolve-based containment check against self.directory.
- Log every rejection with peer and request context.
- Add regression tests for the helper and both call sites.

Reported via the Google OSS VRP.
Fix path-traversal vulnerability in emergency P2P checkpoint service
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant