Skip to content

Replace mip package with pulp#663

Merged
kevalmorabia97 merged 1 commit intomainfrom
kmorabia/remove-mip
Apr 18, 2026
Merged

Replace mip package with pulp#663
kevalmorabia97 merged 1 commit intomainfrom
kmorabia/remove-mip

Conversation

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 commented Dec 8, 2025

Replace mip package with more popular pulp package for puzzle mip solving. Both use the CBC solver under the hood

Testing

  • Results very close for Qwen3-8B and Nemotron-Nano-12B-v2

Summary by CodeRabbit

  • Chores
    • Simplified GPU test environment setup by removing unnecessary system dependency installation
    • Updated internal optimization solver dependencies in the puzzletron module

@kevalmorabia97
Copy link
Copy Markdown
Collaborator Author

Review from from author of Pulp package

the PR looks ok.
Not incorrect, but I would not use the variable.varValue property. I usually use pulp.value function.
Check some of the examples to get other insights on style: https://github.com/coin-or/pulp/tree/master/examples
You should get the same (optimal) solution if the model is the same. Regardless of the modeler (python-mip vs pulp) and solver (cbc, gurobi, highs, cplex, etc.).

@kevalmorabia97 kevalmorabia97 requested review from a team as code owners March 23, 2026 17:20
@kevalmorabia97 kevalmorabia97 requested a review from realAsma March 23, 2026 17:20
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

The pull request migrates the optimization solver in the puzzletron module from the mip library to PuLP. The mip dependency is removed from pyproject.toml, and all solver-related calls are replaced with PuLP equivalents. A redundant CI step for libffi installation is also removed.

Changes

Cohort / File(s) Summary
Optimization Library Migration
modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py
Replaced mip-based solver with PuLP, including changes to variable creation (LpBinary), constraint handling with pulp.lpSum() and math.isfinite() guards, objective setup, solver invocation (problem.solve() instead of optimize()), status checking, and solution extraction (.varValue instead of .x).
Dependency Cleanup
pyproject.toml
Removed mip from the optional puzzletron dependency list.
CI Workflow
.github/workflows/gpu_tests.yml
Removed redundant apt-based libffi library installation step from GPU test job setup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately and concisely describes the main change: replacing the mip package with pulp for solving optimization problems.
Security Anti-Patterns ✅ Passed No security anti-patterns detected in Python code: torch.load with weights_only=False, numpy.load with allow_pickle=True, trust_remote_code=True, eval/exec, or nosec comments.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmorabia/remove-mip

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevalmorabia97 kevalmorabia97 changed the base branch from feature/compress to feature/puzzletron March 23, 2026 17:21
@kevalmorabia97 kevalmorabia97 removed request for a team and realAsma March 23, 2026 17:21
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/remove-mip branch 2 times, most recently from c41025b to d65c324 Compare March 23, 2026 17:32
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 74.47%. Comparing base (e4b054b) to head (833ec9c).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...uzzletron/mip/mip_with_multi_layer_replacements.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #663      +/-   ##
==========================================
+ Coverage   72.54%   74.47%   +1.93%     
==========================================
  Files         459      462       +3     
  Lines       48649    51360    +2711     
==========================================
+ Hits        35290    38248    +2958     
+ Misses      13359    13112     -247     
Flag Coverage Δ
examples 41.34% <6.66%> (+1.96%) ⬆️
gpu 58.75% <93.33%> (+6.43%) ⬆️
unit 52.68% <6.66%> (+0.49%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py (1)

115-115: Prefer pulp.value(...) over .varValue here.

PuLP exposes value() / .value() as a documented way to read solved variable values, so this is a good place to avoid depending on the raw varValue attribute directly. That also matches the package author's note on this PR. (github.com)

🧹 Suggested cleanup
-        is_chosen = replacement["is_chosen"].varValue >= 0.99
+        chosen_value = pulp.value(replacement["is_chosen"])
+        is_chosen = chosen_value is not None and chosen_value >= 0.99
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py` at line
115, Replace direct access to the PuLP variable attribute varValue with the
documented pulp.value(...) call: where the code checks
replacement["is_chosen"].varValue >= 0.99, call
pulp.value(replacement["is_chosen"]) >= 0.99 instead and ensure pulp is imported
in mip_with_multi_layer_replacements.py; update any equivalent occurrences that
read .varValue (e.g., in the loop handling replacement["is_chosen"]) to use
pulp.value or the variable's .value() accessor.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py`:
- Around line 91-95: The model-building code skips NaN/inf bounds by checking
math.isfinite for max_cost/min_cost when adding constraints (using problem and
constraint_vars), but the later post-solve verification still treats any
non-None bound as active and can assert on non-finite values; update the
verification loop to mirror the same normalization (treat non-finite bounds as
if they were None) before performing assertions—i.e., in the post-solve check
for max_cost/min_cost (and where constraint_key is used) only enforce the bound
if math.isfinite(bound), otherwise skip that check.
- Around line 100-105: The code currently discards solutions unless
problem.status == pulp.LpStatusOptimal; after calling solver via
pulp.PULP_CBC_CMD and problem.solve(solver) (see variables solver and problem),
change the feasibility check to also accept the best incumbent reported by PuLP
by inspecting problem.sol_status for pulp.LpSolutionIntegerFeasible (in addition
to pulp.LpStatusOptimal). In other words, after problem.solve(solver) treat the
solution as usable if problem.status == pulp.LpStatusOptimal OR
problem.sol_status == pulp.LpSolutionIntegerFeasible; otherwise handle it as
infeasible/failure. Ensure you reference pulp.PULP_CBC_CMD, problem.solve,
problem.status, problem.sol_status, pulp.LpStatusOptimal and
pulp.LpSolutionIntegerFeasible when making the change.

---

Nitpick comments:
In `@modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py`:
- Line 115: Replace direct access to the PuLP variable attribute varValue with
the documented pulp.value(...) call: where the code checks
replacement["is_chosen"].varValue >= 0.99, call
pulp.value(replacement["is_chosen"]) >= 0.99 instead and ensure pulp is imported
in mip_with_multi_layer_replacements.py; update any equivalent occurrences that
read .varValue (e.g., in the loop handling replacement["is_chosen"]) to use
pulp.value or the variable's .value() accessor.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 63a4fee5-6cb2-4957-b8ae-992797784640

📥 Commits

Reviewing files that changed from the base of the PR and between 4190275 and d65c324.

📒 Files selected for processing (3)
  • .github/workflows/gpu_tests.yml
  • modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py
  • pyproject.toml
💤 Files with no reviewable changes (2)
  • .github/workflows/gpu_tests.yml
  • pyproject.toml

Comment thread modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py
Comment thread modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py
@achidiac-nv
Copy link
Copy Markdown

achidiac-nv commented Mar 25, 2026

Qwen3-8B: MIP Solver Comparison (PuLP vs MIP)

Comparison of block configurations and MMLU benchmarks when solving the Puzzletron compression problem with two different MIP solver libraries.

  • Target: 80% of original memory
  • MIP (old)
  • PuLP (new)

Block Configuration Differences

Only 2 of 36 blocks differ between the two solvers.

Block MIP (old) PuLP (new)
block_2 attention, kv_heads_8, ffn, intermediate_7424 attention, kv_heads_8, ffn, intermediate_12288
block_3 attention, kv_heads_8, ffn, intermediate_12288 attention, kv_heads_8, ffn, intermediate_9984

Note: the PuLP solution shifts FFN capacity from block_3 to block_2. The old MIP solution put a very narrow FFN (intermediate_7424) in block_2 and a full-width FFN (intermediate_12288) in block_3; the new PuLP solution flips this — block_2 becomes full width (intermediate_12288) and block_3 uses a medium width (intermediate_9984).

Full block configuration (MIP — old)
block_0:   attention  kv_heads_8  ffn  intermediate_12288
block_1:   attention  kv_heads_8  ffn  intermediate_12288
block_2:   attention  kv_heads_8  ffn  intermediate_7424
block_3:   attention  kv_heads_8  ffn  intermediate_12288
block_4:   attention  no_op       ffn  intermediate_12288
block_5:   attention  no_op       ffn  intermediate_12288
block_6:   attention  kv_heads_8  ffn  intermediate_12288
block_7:   attention  kv_heads_8  ffn  intermediate_12288
block_8:   attention  kv_heads_8  ffn  intermediate_12288
block_9:   attention  no_op       ffn  intermediate_12288
block_10:  attention  no_op       ffn  intermediate_12288
block_11:  attention  no_op       ffn  intermediate_12288
block_12:  attention  kv_heads_8  ffn  intermediate_12288
block_13:  attention  kv_heads_8  ffn  intermediate_12288
block_14:  attention  kv_heads_8  ffn  intermediate_12288
block_15:  attention  kv_heads_8  ffn  intermediate_12288
block_16:  attention  kv_heads_8  ffn  intermediate_12288
block_17:  attention  kv_heads_8  ffn  intermediate_12288
block_18:  attention  kv_heads_8  ffn  intermediate_12288
block_19:  attention  kv_heads_8  ffn  intermediate_12288
block_20:  attention  kv_heads_8  ffn  intermediate_12288
block_21:  attention  kv_heads_8  ffn  intermediate_12288
block_22:  attention  kv_heads_8  ffn  intermediate_12288
block_23:  attention  kv_heads_8  ffn  intermediate_12288
block_24:  attention  kv_heads_8  ffn  intermediate_12288
block_25:  attention  kv_heads_8  ffn  intermediate_12288
block_26:  attention  no_op       ffn  intermediate_12288
block_27:  attention  no_op       ffn  intermediate_12288
block_28:  attention  no_op       ffn  intermediate_12288
block_29:  attention  kv_heads_8  ffn  intermediate_12288
block_30:  attention  kv_heads_8  ffn  intermediate_12288
block_31:  attention  kv_heads_8  ffn  intermediate_12288
block_32:  attention  kv_heads_8  ffn  intermediate_12288
block_33:  attention  kv_heads_8  ffn  intermediate_12288
block_34:  attention  kv_heads_8  ffn  intermediate_12288
block_35:  attention  kv_heads_8  ffn  intermediate_12288
Full block configuration (PuLP — new)
block_0:   attention  kv_heads_8  ffn  intermediate_12288
block_1:   attention  kv_heads_8  ffn  intermediate_12288
block_2:   attention  kv_heads_8  ffn  intermediate_12288
block_3:   attention  kv_heads_8  ffn  intermediate_9984
block_4:   attention  no_op       ffn  intermediate_12288
block_5:   attention  no_op       ffn  intermediate_12288
block_6:   attention  kv_heads_8  ffn  intermediate_12288
block_7:   attention  kv_heads_8  ffn  intermediate_12288
block_8:   attention  kv_heads_8  ffn  intermediate_12288
block_9:   attention  no_op       ffn  intermediate_12288
block_10:  attention  no_op       ffn  intermediate_12288
block_11:  attention  no_op       ffn  intermediate_12288
block_12:  attention  kv_heads_8  ffn  intermediate_12288
block_13:  attention  kv_heads_8  ffn  intermediate_12288
block_14:  attention  kv_heads_8  ffn  intermediate_12288
block_15:  attention  kv_heads_8  ffn  intermediate_12288
block_16:  attention  kv_heads_8  ffn  intermediate_12288
block_17:  attention  kv_heads_8  ffn  intermediate_12288
block_18:  attention  kv_heads_8  ffn  intermediate_12288
block_19:  attention  kv_heads_8  ffn  intermediate_12288
block_20:  attention  kv_heads_8  ffn  intermediate_12288
block_21:  attention  kv_heads_8  ffn  intermediate_12288
block_22:  attention  kv_heads_8  ffn  intermediate_12288
block_23:  attention  kv_heads_8  ffn  intermediate_12288
block_24:  attention  kv_heads_8  ffn  intermediate_12288
block_25:  attention  kv_heads_8  ffn  intermediate_12288
block_26:  attention  no_op       ffn  intermediate_12288
block_27:  attention  no_op       ffn  intermediate_12288
block_28:  attention  no_op       ffn  intermediate_12288
block_29:  attention  kv_heads_8  ffn  intermediate_12288
block_30:  attention  kv_heads_8  ffn  intermediate_12288
block_31:  attention  kv_heads_8  ffn  intermediate_12288
block_32:  attention  kv_heads_8  ffn  intermediate_12288
block_33:  attention  kv_heads_8  ffn  intermediate_12288
block_34:  attention  kv_heads_8  ffn  intermediate_12288
block_35:  attention  kv_heads_8  ffn  intermediate_12288

MMLU Benchmark Results (80% memory target)

Group MIP (old) PuLP (new) Δ (PuLP − MIP)
mmlu 0.5910 ± 0.0040 0.5914 ± 0.0040 +0.0004
humanities 0.5046 ± 0.0070 0.5069 ± 0.0070 +0.0023
other 0.6363 ± 0.0084 0.6360 ± 0.0084 −0.0003
social sciences 0.6831 ± 0.0083 0.6812 ± 0.0083 −0.0019
stem 0.5855 ± 0.0085 0.5861 ± 0.0085 +0.0006

Summary

  • PuLP configuration differs from MIP in only 2 of 36 blocks (block_2, block_3), essentially a redistribution of FFN width between adjacent layers.
  • Overall MMLU accuracy is effectively identical between the two solvers (Δ = +0.0004, well within stderr).
  • All per-group deltas are within roughly 1× stderr, confirming the two solvers produce effectively equivalent solution quality on this benchmark.

Base automatically changed from feature/puzzletron to main April 15, 2026 19:18
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 18, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-18 15:50 UTC

@achidiac-nv
Copy link
Copy Markdown

achidiac-nv commented Apr 18, 2026

Nemotron-Nano-12B-v2 compressed from 49000MiB to 34000MiB (memory target) with Puzzle: MIP Solver Comparison (PuLP vs MIP)

Comparison of block configurations and MMLU benchmarks when solving the Mixed Integer Programming problem with two different libraries at a 34000 MiB memory target:

  • MIP (old)
  • PuLP (new)

Block Configuration Differences

Only 4 of 62 blocks differ between the two solvers.

Block MIP (old) PuLP (new)
block_5 attention, no_op, ffn, intermediate_16384 attention, no_op, ffn, intermediate_12544
block_21 attention, no_op, ffn, intermediate_12544 attention, no_op, ffn, intermediate_16384
block_26 attention, no_op, ffn, intermediate_12544 attention, no_op, ffn, intermediate_16384
block_28 attention, no_op, ffn, intermediate_16384 attention, no_op, ffn, intermediate_12544

Note: the PuLP solution effectively swaps intermediate_16384 and intermediate_12544 between the pairs (block_5 ↔ block_21) and (block_26 ↔ block_28), so the overall count of each FFN size across the model is preserved; only placement differs.

Full block configuration (MIP — old)
block_0:   mamba      num_heads_128  head_dim_80  ffn  no_op
block_1:   attention  no_op                       ffn  intermediate_20480
block_2:   mamba      num_heads_128  head_dim_80  ffn  no_op
block_3:   attention  no_op                       ffn  intermediate_12544
block_4:   mamba      num_heads_128  head_dim_80  ffn  no_op
block_5:   attention  no_op                       ffn  intermediate_16384
block_6:   mamba      num_heads_128  head_dim_80  ffn  no_op
block_7:   attention  no_op                       ffn  no_op
block_8:   attention  no_op                       ffn  intermediate_12544
block_9:   mamba      num_heads_128  head_dim_80  ffn  no_op
block_10:  attention  no_op                       ffn  intermediate_12544
block_11:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_12:  attention  no_op                       ffn  intermediate_16384
block_13:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_14:  attention  no_op                       ffn  intermediate_16384
block_15:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_16:  attention  no_op                       ffn  no_op
block_17:  attention  no_op                       ffn  intermediate_12544
block_18:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_19:  attention  no_op                       ffn  intermediate_16384
block_20:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_21:  attention  no_op                       ffn  intermediate_12544
block_22:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_23:  attention  no_op                       ffn  intermediate_16384
block_24:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_25:  attention  kv_heads_8                  ffn  no_op
block_26:  attention  no_op                       ffn  intermediate_12544
block_27:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_28:  attention  no_op                       ffn  intermediate_16384
block_29:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_30:  attention  no_op                       ffn  intermediate_16384
block_31:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_32:  attention  no_op                       ffn  intermediate_16384
block_33:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_34:  attention  kv_heads_8                  ffn  no_op
block_35:  attention  no_op                       ffn  intermediate_16384
block_36:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_37:  attention  no_op                       ffn  intermediate_20480
block_38:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_39:  attention  no_op                       ffn  intermediate_16384
block_40:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_41:  attention  no_op                       ffn  intermediate_20480
block_42:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_43:  attention  no_op                       ffn  no_op
block_44:  attention  no_op                       ffn  intermediate_20480
block_45:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_46:  attention  no_op                       ffn  intermediate_20480
block_47:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_48:  attention  no_op                       ffn  intermediate_20480
block_49:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_50:  attention  no_op                       ffn  intermediate_16384
block_51:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_52:  attention  no_op                       ffn  no_op
block_53:  attention  no_op                       ffn  intermediate_20480
block_54:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_55:  attention  no_op                       ffn  intermediate_20480
block_56:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_57:  attention  no_op                       ffn  intermediate_20480
block_58:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_59:  attention  no_op                       ffn  intermediate_12544
block_60:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_61:  attention  no_op                       ffn  intermediate_12544
Full block configuration (PuLP — new)
block_0:   mamba      num_heads_128  head_dim_80  ffn  no_op
block_1:   attention  no_op                       ffn  intermediate_20480
block_2:   mamba      num_heads_128  head_dim_80  ffn  no_op
block_3:   attention  no_op                       ffn  intermediate_12544
block_4:   mamba      num_heads_128  head_dim_80  ffn  no_op
block_5:   attention  no_op                       ffn  intermediate_12544
block_6:   mamba      num_heads_128  head_dim_80  ffn  no_op
block_7:   attention  no_op                       ffn  no_op
block_8:   attention  no_op                       ffn  intermediate_12544
block_9:   mamba      num_heads_128  head_dim_80  ffn  no_op
block_10:  attention  no_op                       ffn  intermediate_12544
block_11:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_12:  attention  no_op                       ffn  intermediate_16384
block_13:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_14:  attention  no_op                       ffn  intermediate_16384
block_15:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_16:  attention  no_op                       ffn  no_op
block_17:  attention  no_op                       ffn  intermediate_12544
block_18:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_19:  attention  no_op                       ffn  intermediate_16384
block_20:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_21:  attention  no_op                       ffn  intermediate_16384
block_22:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_23:  attention  no_op                       ffn  intermediate_16384
block_24:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_25:  attention  kv_heads_8                  ffn  no_op
block_26:  attention  no_op                       ffn  intermediate_16384
block_27:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_28:  attention  no_op                       ffn  intermediate_12544
block_29:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_30:  attention  no_op                       ffn  intermediate_16384
block_31:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_32:  attention  no_op                       ffn  intermediate_16384
block_33:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_34:  attention  kv_heads_8                  ffn  no_op
block_35:  attention  no_op                       ffn  intermediate_16384
block_36:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_37:  attention  no_op                       ffn  intermediate_20480
block_38:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_39:  attention  no_op                       ffn  intermediate_16384
block_40:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_41:  attention  no_op                       ffn  intermediate_20480
block_42:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_43:  attention  no_op                       ffn  no_op
block_44:  attention  no_op                       ffn  intermediate_20480
block_45:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_46:  attention  no_op                       ffn  intermediate_20480
block_47:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_48:  attention  no_op                       ffn  intermediate_20480
block_49:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_50:  attention  no_op                       ffn  intermediate_16384
block_51:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_52:  attention  no_op                       ffn  no_op
block_53:  attention  no_op                       ffn  intermediate_20480
block_54:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_55:  attention  no_op                       ffn  intermediate_20480
block_56:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_57:  attention  no_op                       ffn  intermediate_20480
block_58:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_59:  attention  no_op                       ffn  intermediate_12544
block_60:  mamba      num_heads_128  head_dim_80  ffn  no_op
block_61:  attention  no_op                       ffn  intermediate_12544

MMLU Benchmark Results (Puzzle 34000 MiB)

Group MIP (old) PuLP (new) Δ (PuLP − MIP)
mmlu 0.5427 ± 0.0040 0.5450 ± 0.0040 +0.0023
humanities 0.4489 ± 0.0070 0.4506 ± 0.0070 +0.0017
other 0.5436 ± 0.0087 0.5404 ± 0.0087 −0.0032
social sciences 0.6578 ± 0.0084 0.6646 ± 0.0083 +0.0068
stem 0.5693 ± 0.0087 0.5737 ± 0.0086 +0.0044

Summary

  • PuLP configuration differs from MIP in only 4 of 62 blocks (block_5, block_21, block_26, block_28), all of which are FFN intermediate_16384intermediate_12544 swaps between pairs of blocks.
  • Overall MMLU accuracy is slightly higher with PuLP (+0.23 pp), with gains in humanities, social sciences, and STEM, and a small regression in "other".
  • All per-group deltas are within roughly 1× stderr, so the two solvers produce effectively equivalent solution quality on this benchmark.

@kevalmorabia97 kevalmorabia97 merged commit 2b315ed into main Apr 18, 2026
45 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kmorabia/remove-mip branch April 18, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants