Skip to content

Add --trace flag for automatic file access tracing#18

Merged
kozak merged 7 commits intomainfrom
add-file-use-tracing
Apr 13, 2026
Merged

Add --trace flag for automatic file access tracing#18
kozak merged 7 commits intomainfrom
add-file-use-tracing

Conversation

@kozak
Copy link
Copy Markdown
Contributor

@kozak kozak commented Apr 9, 2026

Summary

Adds --trace and --trace-files flags that wrap command execution with fsatrace (LD_PRELOAD-based file system tracer) to automatically discover which files are read/written during task execution. This helps verify that snapshot inputs are complete.

Inspired by Rattle, which uses the same mechanism for automatic dependency tracking in build systems.

Features

1. Directory summary (default with --trace)

=== File System Trace Report ===

Directories read:
  apps/ (51 files)
  docs/ (157 files)
  libs/ (4 files)
  scripts/ (2 files)

Directories written:
  docs/ (206 files)

2. Full file listing (--trace-files)

=== File System Trace Report ===

Files read:
  apps/higgs/docs/LiveQuery.md
  apps/ros/doc/Architecture.md
  docs/ADR.md
  ...

Files written:
  docs/book/html/ADR.html
  ...

3. Snapshot discrepancy detection

Automatically compares traced reads against declared snapshot inputs and flags gaps:

=== Snapshot Discrepancies ===
Files read but NOT covered by snapshot inputs:
  apps/ (51 files)
  libs/ (4 files)

  apps/analyse-menu/README.md
  apps/delivery-optimization/README.md
  apps/dsl/README.md
  ...
  libs/js/client/README.md
  libs/js/semtree/README.md

Changes

  • --trace CLI flag: directory-level summary + discrepancy detection
  • --trace-files CLI flag: individual file listing + discrepancy detection
  • New src/Trace.hs module: fsatrace detection, command wrapping, output parsing, filtering, report formatting, discrepancy analysis
  • Filtered out: .git/, .gitignore files, .taskrunner/, system paths
  • Only the outermost --trace invocation reports — nested taskrunner calls don't produce duplicate sections (parent fsatrace traces the entire process tree via LD_PRELOAD)
  • Test infrastructure: # fsatrace directive in golden tests, auto-skip when fsatrace isn't installed

Real-world example: docs BUILD discrepancy

The docs snapshot . declares only the docs/ directory as input, but mdbook build actually reads 55 files from apps/ and libs/ (gathered READMEs and docs from across the monorepo). A change to e.g. apps/higgs/docs/LiveQuery.md would not invalidate the cache.

Test plan

  • stack build — compiles cleanly, no warnings
  • stack test — all 38 tests pass (37 existing + 1 new trace-basic)
  • Fsatrace tests auto-skip when fsatrace is not installed
  • Manual test on restaumatic docs/scripts/BUILD — found real discrepancy
  • Manual test on restaumatic higgs/marketing-calendar — single report section, no duplicates

🤖 Generated with Claude Code

kozak and others added 6 commits April 9, 2026 14:29
When running with `taskrunner --trace`, the subprocess is wrapped with
fsatrace (LD_PRELOAD-based file system tracer) to discover which files
are actually read and written during execution. After the command
finishes, a report is printed to stderr showing project-relative file
paths categorized as reads and writes.

This helps verify that snapshot inputs are complete — e.g. running
`--trace --force` on the docs BUILD script revealed that mdbook reads
56 files from apps/ and libs/ that aren't declared in the snapshot.

Trace mode propagates to nested taskrunner calls via _taskrunner_trace
env var. Requires fsatrace to be installed (clear error if missing).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Internal git metadata files aren't meaningful project inputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…etection

--trace now shows a directory-level summary by default (e.g. "apps/ (51 files)")
instead of listing every individual file. Use --trace-files for the full file list.

Both modes also compare traced reads against declared snapshot inputs and report
discrepancies — files actually read but not covered by the snapshot. For example,
docs BUILD declares `snapshot .` (= docs/) but mdbook reads from apps/ and libs/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These are not meaningful project inputs:
- .gitignore files at any depth (e.g. libs/hs/re-geo/.gitignore)
- .git bare path (was only filtering .git/ prefix)
- .taskrunner/ internal state directory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The parent fsatrace already traces the entire process tree via
LD_PRELOAD inheritance, so nested taskrunner calls don't need to
independently wrap with fsatrace. This was causing duplicate trace
report sections in the output — one per nested taskrunner invocation.

Now only the outermost --trace/--trace-files invocation wraps and
reports. The _taskrunner_trace env var is still propagated but no
longer triggers tracing on its own.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kozak kozak requested a review from zyla April 11, 2026 11:35
Copy link
Copy Markdown
Collaborator

@zyla zyla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat.

Maybe add a test for nested task?

…alls

Ensures that when an outer taskrunner with --trace invokes a nested
taskrunner, only one File System Trace Report section is produced, and
it captures file operations from both outer and inner processes (since
fsatrace's LD_PRELOAD inherits to all children).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kozak kozak merged commit e6959b6 into main Apr 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants