Skip to content

fix(efficiency): report reclaimable bytes, not the sum of every copy#696

Closed
c-tonneslan wants to merge 1 commit into
wagoodman:mainfrom
c-tonneslan:fix-wasted-bytes-double-counting
Closed

fix(efficiency): report reclaimable bytes, not the sum of every copy#696
c-tonneslan wants to merge 1 commit into
wagoodman:mainfrom
c-tonneslan:fix-wasted-bytes-double-counting

Conversation

@c-tonneslan

Copy link
Copy Markdown

Filed in #684. The "Potential wasted space" / wastedBytes aggregate sums CumulativeSize across every duplicated path, which counts all copies. The user-facing meaning is what you could reclaim by collapsing duplicates: total copies minus the smallest one we'd keep. That's what the efficiency score already uses (minSizeSum / cumulativeSizeSum), so the two numbers were inconsistent.

In the linked issue's example the image reports ~1.8 GB wasted while the 71% efficiency implies only ~900 MB recoverable. The summary just adds up CumulativeSize, so it was double-counting.

This adds an EfficiencyData.WastedSize() helper (returns CumulativeSize - minDiscoveredSize) and uses it in:

  • image.Analyze (the WastedBytes aggregate that drives userWastedPercent and the JSON export)
  • the CI evaluator's per-file "Wasted Space" column
  • the v1 TUI's "Potential wasted space" summary line

Test snapshots for the test docker image (3 copies of /root/saved.txt and 2 each of two other files) were updated to match the new accounting (97 B wasted vs the prior 131 B). The per-row "Wasted Space" column for /root/saved.txt (3 copies) now reads 63 B (one min copy of ~27 B kept) vs the old 80 B (all three).

Closes #684

WastedBytes was a sum of CumulativeSize across duplicate paths, which
counts every copy of a file. The user-facing meaning of "potential
wasted space" is what you'd reclaim by collapsing duplicates: total
copies minus the smallest one we'd keep. So for an image with a 970
MB pnpm-install layer and a 900 MB chown layer that duplicates it,
the report claimed ~1.8 GB recoverable when only ~900 MB actually
was, contradicting the efficiency percentage shown alongside.

The score itself was already computed as
minSize/cumSize, so cumSize - minSize is the right quantity.

Adds an EfficiencyData.WastedSize() helper and uses it in the
analysis aggregate, the CI evaluator's per-file column, the v1 TUI's
"Potential wasted space" summary, and the JSON export's
inefficientBytes field.

Closes wagoodman#684

Signed-off-by: Charlie Tonneslan <cst0520@gmail.com>
@c-tonneslan

Copy link
Copy Markdown
Author

Closing this, #685 already fixes #684 and has been open since February. I should've checked the existing PRs first, sorry for the duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Double counting of potential space savings in inefficiency report

1 participant