Step 4: numpy NormalizationManagerProtein quadratic+linear normalization by mschwoer · Pull Request #87 · MannLabs/directlfq

mschwoer · 2026-06-15T12:28:00Z

Overrides _normalize_quadratic_and_linear with numpy, dropping the label-based .loc[tuples,:]=df round-trips on a MultiIndex (the dominant per-protein cost for >num_samples_quadratic-ion proteins).

Estimated speedup: estimate stage 102.8 s → 16.2 s (~6.3× — the single biggest win), single-core HYE benchmark.

Bit-identical protein + ion output (max_abs_diff=0); suite passes.

🤖 Generated with Claude Code

@ammarcsj I chose to overwrite the whole method, not the four individual steps, as this would have introduced 3 more instance variables and made it harder to read overall. I added references as comments ("cf. ..") instead

Add regression tests before the numpy override of _normalize_quadratic_and_linear: an executable-spec characterization test (validated against the baseline pandas implementation) over mixed-NaN rows that pins the stable k-fewest-NaN quadratic split and the linear nanmedian shift, plus a noise-free behavioral test asserting scaled-copy profiles collapse onto one profile. Both pass against the unmodified baseline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…n (step 4) Override _normalize_quadratic_and_linear with a numpy implementation that drops the label-based .loc[list_of_tuples,:]=df round-trips on a MultiIndex (the dominant estimate-stage cost for >num_samples_quadratic-ion proteins). Reuses get_normfacts/apply_sampleshifts on the stable k-fewest-NaN quadratic subset, takes the column nanmedian as the reference, and shifts the linear rows by nanmedian(reference - row). Bit-identical protein + ion output (max_abs_diff=0); estimate stage drops to ~18s. Suite passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…r stage comments --- # Conversation that produced these changes --- ## User prompt "apply the 4-step refactor and run the tests" ## User prompt "I don't like the introduction of new instance variables. please revert. Instead add the four stages as comments to the method" Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…y-normalize-quadratic-linear

--- # Conversation that produced these changes --- ## User prompt "Now, the NormalizationManagerProtein: it is no longer used. This means the refactoring done in #87 was obsolete?" ## Clarifying round 1 Q: How do you want to resolve the duplicated quadratic+linear numpy logic between #87's class override and #88's _normalize_protein_values? - Single source in normalization.py <-- chosen - Revert #87's override - Leave both as-is - Just answer, no change Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mschwoer and others added 6 commits June 15, 2026 14:01

pre-commit

300ba17

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge branch 'opt/step3-vectorize-sample-shifter' into opt/step4-nump…

164a003

…y-normalize-quadratic-linear

pre-commit

c8b2a7d

mschwoer requested a review from ammarcsj June 16, 2026 09:08

remove tests reintroduced by a merge

4309078

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step 4: numpy NormalizationManagerProtein quadratic+linear normalization#87

Step 4: numpy NormalizationManagerProtein quadratic+linear normalization#87
mschwoer wants to merge 7 commits into
opt/step3-vectorize-sample-shifterfrom
opt/step4-numpy-normalize-quadratic-linear

mschwoer commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mschwoer commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mschwoer commented Jun 15, 2026 •

edited

Loading