fix(efficiency): report reclaimable bytes, not the sum of every copy#696
Closed
c-tonneslan wants to merge 1 commit into
Closed
fix(efficiency): report reclaimable bytes, not the sum of every copy#696c-tonneslan wants to merge 1 commit into
c-tonneslan wants to merge 1 commit into
Conversation
WastedBytes was a sum of CumulativeSize across duplicate paths, which counts every copy of a file. The user-facing meaning of "potential wasted space" is what you'd reclaim by collapsing duplicates: total copies minus the smallest one we'd keep. So for an image with a 970 MB pnpm-install layer and a 900 MB chown layer that duplicates it, the report claimed ~1.8 GB recoverable when only ~900 MB actually was, contradicting the efficiency percentage shown alongside. The score itself was already computed as minSize/cumSize, so cumSize - minSize is the right quantity. Adds an EfficiencyData.WastedSize() helper and uses it in the analysis aggregate, the CI evaluator's per-file column, the v1 TUI's "Potential wasted space" summary, and the JSON export's inefficientBytes field. Closes wagoodman#684 Signed-off-by: Charlie Tonneslan <cst0520@gmail.com>
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Filed in #684. The "Potential wasted space" /
wastedBytesaggregate sumsCumulativeSizeacross every duplicated path, which counts all copies. The user-facing meaning is what you could reclaim by collapsing duplicates: total copies minus the smallest one we'd keep. That's what the efficiency score already uses (minSizeSum / cumulativeSizeSum), so the two numbers were inconsistent.In the linked issue's example the image reports ~1.8 GB wasted while the 71% efficiency implies only ~900 MB recoverable. The summary just adds up
CumulativeSize, so it was double-counting.This adds an
EfficiencyData.WastedSize()helper (returnsCumulativeSize - minDiscoveredSize) and uses it in:image.Analyze(theWastedBytesaggregate that drivesuserWastedPercentand the JSON export)Test snapshots for the test docker image (3 copies of
/root/saved.txtand 2 each of two other files) were updated to match the new accounting (97 Bwasted vs the prior131 B). The per-row "Wasted Space" column for/root/saved.txt(3 copies) now reads63 B(one min copy of ~27 B kept) vs the old80 B(all three).Closes #684