Step 4: numpy NormalizationManagerProtein quadratic+linear normalization#87
Open
mschwoer wants to merge 7 commits into
Open
Conversation
Add regression tests before the numpy override of _normalize_quadratic_and_linear: an executable-spec characterization test (validated against the baseline pandas implementation) over mixed-NaN rows that pins the stable k-fewest-NaN quadratic split and the linear nanmedian shift, plus a noise-free behavioral test asserting scaled-copy profiles collapse onto one profile. Both pass against the unmodified baseline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n (step 4) Override _normalize_quadratic_and_linear with a numpy implementation that drops the label-based .loc[list_of_tuples,:]=df round-trips on a MultiIndex (the dominant estimate-stage cost for >num_samples_quadratic-ion proteins). Reuses get_normfacts/apply_sampleshifts on the stable k-fewest-NaN quadratic subset, takes the column nanmedian as the reference, and shifts the linear rows by nanmedian(reference - row). Bit-identical protein + ion output (max_abs_diff=0); estimate stage drops to ~18s. Suite passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r stage comments --- # Conversation that produced these changes --- ## User prompt "apply the 4-step refactor and run the tests" ## User prompt "I don't like the introduction of new instance variables. please revert. Instead add the four stages as comments to the method" Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y-normalize-quadratic-linear
mschwoer
added a commit
that referenced
this pull request
Jun 16, 2026
--- # Conversation that produced these changes --- ## User prompt "Now, the NormalizationManagerProtein: it is no longer used. This means the refactoring done in #87 was obsolete?" ## Clarifying round 1 Q: How do you want to resolve the duplicated quadratic+linear numpy logic between #87's class override and #88's _normalize_protein_values? - Single source in normalization.py <-- chosen - Revert #87's override - Leave both as-is - Just answer, no change Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overrides
_normalize_quadratic_and_linearwith numpy, dropping the label-based.loc[tuples,:]=dfround-trips on a MultiIndex (the dominant per-protein cost for >num_samples_quadratic-ion proteins).Estimated speedup: estimate stage 102.8 s → 16.2 s (~6.3× — the single biggest win), single-core HYE benchmark.
Bit-identical protein + ion output (
max_abs_diff=0); suite passes.🤖 Generated with Claude Code
@ammarcsj I chose to overwrite the whole method, not the four individual steps, as this would have introduced 3 more instance variables and made it harder to read overall. I added references as comments ("
cf. ..") instead