x86_64: Replace rej_uniform_eta2/eta4 intrinsics with hand-written assembly#1188
x86_64: Replace rej_uniform_eta2/eta4 intrinsics with hand-written assembly#1188jakemas wants to merge 1 commit into
Conversation
f789e8d to
1e78719
Compare
CBMC Results (ML-DSA-87, REDUCE-RAM)Full Results (205 proofs)
|
CBMC Results (ML-DSA-65, REDUCE-RAM)Full Results (205 proofs)
|
CBMC Results (ML-DSA-44, REDUCE-RAM)Full Results (205 proofs)
|
CBMC Results (ML-DSA-65)Full Results (205 proofs)
|
CBMC Results (ML-DSA-44)Full Results (205 proofs)
|
CBMC Results (ML-DSA-87)Full Results (205 proofs)
|
…sembly Add hand-written x86_64 AVX2 assembly for rej_uniform_eta2 and rej_uniform_eta4 and remove the AVX2 intrinsics implementations they replace, following the rej_uniform approach in #1014: the table is passed as a parameter and all constants are built from immediates (no .rodata), enabling future HOL-Light verification. Both eta2 and eta4 are wired to the new asm in meta.h, with contracts in arith_native_x86_64.h, bytecode dump targets in autogen and the Makefile, and a poly_uniform_eta_4x component benchmark. The asm entry points are declared MLD_SYSV_ABI (like the other x86_64 asm routines) so they are called with the System V register convention on all platforms, including Windows/MinGW. The endbr64 is emitted via MLD_ASM_FN_SYMBOL (CET-gated) rather than a raw mnemonic, so older assemblers (e.g. clang-6) build cleanly. The eta2 vector path applies the centered mod-5 reduction to (2 - nibble) directly (matching the reference), rather than reducing the raw nibble and subtracting afterwards; the two are not equivalent because vpmulhrsw rounds to nearest. Verified against the ACVP keyGen vectors for all parameter sets. Signed-off-by: jake massimo <jakemas@amazon.com>
1e78719 to
7c13c63
Compare
There was a problem hiding this comment.
@hanno-becker / @mkannwischer would it be possible to get review on dev/x86_64/src/rej_uniform_eta{2/4}_avx2_asm.S as a preliminary pass, before I start the hol-light proofs. This should help prevent too much proof churn, should any large changes be requested in review. Ideally, the asm would be finalized in review first, and then I'll mark the PR ready once the proofs are complete.
There was a problem hiding this comment.
Good idea. Thanks! Will look later today.
There was a problem hiding this comment.
Thank you very much!
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46503 cycles |
46534 cycles |
1.00 |
ML-DSA-44 sign |
131074 cycles |
131065 cycles |
1.00 |
ML-DSA-44 verify |
47313 cycles |
47344 cycles |
1.00 |
ML-DSA-65 keypair |
81687 cycles |
81689 cycles |
1.00 |
ML-DSA-65 sign |
215293 cycles |
215304 cycles |
1.00 |
ML-DSA-65 verify |
79298 cycles |
79301 cycles |
1.00 |
ML-DSA-87 keypair |
132402 cycles |
132414 cycles |
1.00 |
ML-DSA-87 sign |
277403 cycles |
277299 cycles |
1.00 |
ML-DSA-87 verify |
134049 cycles |
134048 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112805 cycles |
112744 cycles |
1.00 |
ML-DSA-44 sign |
401076 cycles |
400857 cycles |
1.00 |
ML-DSA-44 verify |
119486 cycles |
119422 cycles |
1.00 |
ML-DSA-65 keypair |
193004 cycles |
192931 cycles |
1.00 |
ML-DSA-65 sign |
650034 cycles |
649964 cycles |
1.00 |
ML-DSA-65 verify |
192887 cycles |
192858 cycles |
1.00 |
ML-DSA-87 keypair |
318732 cycles |
318783 cycles |
1.00 |
ML-DSA-87 sign |
828739 cycles |
828685 cycles |
1.00 |
ML-DSA-87 verify |
326671 cycles |
326704 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
227496 cycles |
219616 cycles |
1.04 |
ML-DSA-44 sign |
621038 cycles |
608061 cycles |
1.02 |
ML-DSA-44 verify |
235419 cycles |
214195 cycles |
1.10 |
ML-DSA-65 keypair |
388555 cycles |
394028 cycles |
0.99 |
ML-DSA-65 sign |
1008940 cycles |
1005239 cycles |
1.00 |
ML-DSA-65 verify |
369157 cycles |
378176 cycles |
0.98 |
ML-DSA-87 keypair |
661599 cycles |
635525 cycles |
1.04 |
ML-DSA-87 sign |
1369715 cycles |
1306530 cycles |
1.05 |
ML-DSA-87 verify |
645122 cycles |
616298 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
227496 cycles |
219616 cycles |
1.04 |
ML-DSA-44 verify |
235419 cycles |
214195 cycles |
1.10 |
ML-DSA-87 keypair |
661599 cycles |
635525 cycles |
1.04 |
ML-DSA-87 sign |
1369715 cycles |
1306530 cycles |
1.05 |
ML-DSA-87 verify |
645122 cycles |
616298 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
43382 cycles |
43353 cycles |
1.00 |
ML-DSA-44 sign |
130806 cycles |
130679 cycles |
1.00 |
ML-DSA-44 verify |
45264 cycles |
45263 cycles |
1.00 |
ML-DSA-65 keypair |
75789 cycles |
75765 cycles |
1.00 |
ML-DSA-65 sign |
214746 cycles |
214616 cycles |
1.00 |
ML-DSA-65 verify |
74389 cycles |
74462 cycles |
1.00 |
ML-DSA-87 keypair |
123059 cycles |
123089 cycles |
1.00 |
ML-DSA-87 sign |
271462 cycles |
270931 cycles |
1.00 |
ML-DSA-87 verify |
120590 cycles |
120473 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
91550 cycles |
91474 cycles |
1.00 |
ML-DSA-44 sign |
352057 cycles |
352402 cycles |
1.00 |
ML-DSA-44 verify |
99895 cycles |
99810 cycles |
1.00 |
ML-DSA-65 keypair |
153878 cycles |
153977 cycles |
1.00 |
ML-DSA-65 sign |
571455 cycles |
571817 cycles |
1.00 |
ML-DSA-65 verify |
159829 cycles |
159916 cycles |
1.00 |
ML-DSA-87 keypair |
255292 cycles |
255126 cycles |
1.00 |
ML-DSA-87 sign |
726015 cycles |
725242 cycles |
1.00 |
ML-DSA-87 verify |
263880 cycles |
263738 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
55826 cycles |
55161 cycles |
1.01 |
ML-DSA-44 sign |
159302 cycles |
159029 cycles |
1.00 |
ML-DSA-44 verify |
57714 cycles |
57785 cycles |
1.00 |
ML-DSA-65 keypair |
96776 cycles |
95694 cycles |
1.01 |
ML-DSA-65 sign |
261086 cycles |
260702 cycles |
1.00 |
ML-DSA-65 verify |
96269 cycles |
95964 cycles |
1.00 |
ML-DSA-87 keypair |
155272 cycles |
154489 cycles |
1.01 |
ML-DSA-87 sign |
323671 cycles |
322750 cycles |
1.00 |
ML-DSA-87 verify |
152704 cycles |
151028 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
312772 cycles |
294913 cycles |
1.06 |
ML-DSA-44 sign |
1193382 cycles |
1147432 cycles |
1.04 |
ML-DSA-44 verify |
332634 cycles |
329463 cycles |
1.01 |
ML-DSA-65 keypair |
560095 cycles |
553429 cycles |
1.01 |
ML-DSA-65 sign |
1913706 cycles |
1874143 cycles |
1.02 |
ML-DSA-65 verify |
536753 cycles |
533010 cycles |
1.01 |
ML-DSA-87 keypair |
904673 cycles |
847930 cycles |
1.07 |
ML-DSA-87 sign |
2468732 cycles |
2393137 cycles |
1.03 |
ML-DSA-87 verify |
906571 cycles |
874118 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
312772 cycles |
294913 cycles |
1.06 |
ML-DSA-44 sign |
1193382 cycles |
1147432 cycles |
1.04 |
ML-DSA-87 keypair |
904673 cycles |
847930 cycles |
1.07 |
ML-DSA-87 sign |
2468732 cycles |
2393137 cycles |
1.03 |
ML-DSA-87 verify |
906571 cycles |
874118 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
133408 cycles |
133059 cycles |
1.00 |
ML-DSA-44 sign |
520236 cycles |
518626 cycles |
1.00 |
ML-DSA-44 verify |
147060 cycles |
146355 cycles |
1.00 |
ML-DSA-65 keypair |
225079 cycles |
225679 cycles |
1.00 |
ML-DSA-65 sign |
847135 cycles |
848010 cycles |
1.00 |
ML-DSA-65 verify |
235456 cycles |
235882 cycles |
1.00 |
ML-DSA-87 keypair |
370475 cycles |
367342 cycles |
1.01 |
ML-DSA-87 sign |
1069404 cycles |
1059451 cycles |
1.01 |
ML-DSA-87 verify |
384222 cycles |
381076 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112426 cycles |
112391 cycles |
1.00 |
ML-DSA-44 sign |
353829 cycles |
353871 cycles |
1.00 |
ML-DSA-44 verify |
117232 cycles |
117222 cycles |
1.00 |
ML-DSA-65 keypair |
194641 cycles |
194637 cycles |
1.00 |
ML-DSA-65 sign |
584354 cycles |
584454 cycles |
1.00 |
ML-DSA-65 verify |
193206 cycles |
193179 cycles |
1.00 |
ML-DSA-87 keypair |
320931 cycles |
320866 cycles |
1.00 |
ML-DSA-87 sign |
746952 cycles |
746603 cycles |
1.00 |
ML-DSA-87 verify |
318576 cycles |
318613 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
67239 cycles |
67494 cycles |
1.00 |
ML-DSA-44 sign |
198398 cycles |
198222 cycles |
1.00 |
ML-DSA-44 verify |
70268 cycles |
70154 cycles |
1.00 |
ML-DSA-65 keypair |
119318 cycles |
119283 cycles |
1.00 |
ML-DSA-65 sign |
325812 cycles |
326057 cycles |
1.00 |
ML-DSA-65 verify |
116825 cycles |
116834 cycles |
1.00 |
ML-DSA-87 keypair |
196716 cycles |
196570 cycles |
1.00 |
ML-DSA-87 sign |
422203 cycles |
421915 cycles |
1.00 |
ML-DSA-87 verify |
193443 cycles |
193341 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46849 cycles |
46887 cycles |
1.00 |
ML-DSA-44 sign |
139191 cycles |
138924 cycles |
1.00 |
ML-DSA-44 verify |
49276 cycles |
49405 cycles |
1.00 |
ML-DSA-65 keypair |
82925 cycles |
83264 cycles |
1.00 |
ML-DSA-65 sign |
227212 cycles |
227398 cycles |
1.00 |
ML-DSA-65 verify |
82484 cycles |
82718 cycles |
1.00 |
ML-DSA-87 keypair |
129318 cycles |
129835 cycles |
1.00 |
ML-DSA-87 sign |
281021 cycles |
284510 cycles |
0.99 |
ML-DSA-87 verify |
128778 cycles |
130913 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
127640 cycles |
127662 cycles |
1.00 |
ML-DSA-44 sign |
441226 cycles |
441153 cycles |
1.00 |
ML-DSA-44 verify |
136368 cycles |
136392 cycles |
1.00 |
ML-DSA-65 keypair |
220745 cycles |
220751 cycles |
1.00 |
ML-DSA-65 sign |
713952 cycles |
713856 cycles |
1.00 |
ML-DSA-65 verify |
220714 cycles |
220762 cycles |
1.00 |
ML-DSA-87 keypair |
365220 cycles |
365145 cycles |
1.00 |
ML-DSA-87 sign |
915561 cycles |
921328 cycles |
0.99 |
ML-DSA-87 verify |
370865 cycles |
370840 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
61670 cycles |
61568 cycles |
1.00 |
ML-DSA-44 sign |
189150 cycles |
188965 cycles |
1.00 |
ML-DSA-44 verify |
66433 cycles |
66389 cycles |
1.00 |
ML-DSA-65 keypair |
108153 cycles |
108294 cycles |
1.00 |
ML-DSA-65 sign |
310773 cycles |
312120 cycles |
1.00 |
ML-DSA-65 verify |
108909 cycles |
109456 cycles |
1.00 |
ML-DSA-87 keypair |
170970 cycles |
171678 cycles |
1.00 |
ML-DSA-87 sign |
378610 cycles |
379290 cycles |
1.00 |
ML-DSA-87 verify |
169701 cycles |
169549 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
211514 cycles |
211645 cycles |
1.00 |
ML-DSA-44 sign |
759499 cycles |
759768 cycles |
1.00 |
ML-DSA-44 verify |
229074 cycles |
229224 cycles |
1.00 |
ML-DSA-65 keypair |
377633 cycles |
377412 cycles |
1.00 |
ML-DSA-65 sign |
1247628 cycles |
1247442 cycles |
1.00 |
ML-DSA-65 verify |
373031 cycles |
371695 cycles |
1.00 |
ML-DSA-87 keypair |
600257 cycles |
601065 cycles |
1.00 |
ML-DSA-87 sign |
1585029 cycles |
1584744 cycles |
1.00 |
ML-DSA-87 verify |
616424 cycles |
616529 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
118967 cycles |
118751 cycles |
1.00 |
ML-DSA-44 sign |
458258 cycles |
458504 cycles |
1.00 |
ML-DSA-44 verify |
130577 cycles |
130683 cycles |
1.00 |
ML-DSA-65 keypair |
201105 cycles |
201607 cycles |
1.00 |
ML-DSA-65 sign |
745177 cycles |
743209 cycles |
1.00 |
ML-DSA-65 verify |
209237 cycles |
209226 cycles |
1.00 |
ML-DSA-87 keypair |
330130 cycles |
330027 cycles |
1.00 |
ML-DSA-87 sign |
937192 cycles |
939348 cycles |
1.00 |
ML-DSA-87 verify |
343610 cycles |
343318 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
153955 cycles |
153922 cycles |
1.00 |
ML-DSA-44 sign |
586125 cycles |
586291 cycles |
1.00 |
ML-DSA-44 verify |
168817 cycles |
168698 cycles |
1.00 |
ML-DSA-65 keypair |
262391 cycles |
261670 cycles |
1.00 |
ML-DSA-65 sign |
966005 cycles |
961560 cycles |
1.00 |
ML-DSA-65 verify |
272367 cycles |
271431 cycles |
1.00 |
ML-DSA-87 keypair |
431520 cycles |
432139 cycles |
1.00 |
ML-DSA-87 sign |
1214437 cycles |
1208816 cycles |
1.00 |
ML-DSA-87 verify |
447189 cycles |
446664 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
71506 cycles |
71555 cycles |
1.00 |
ML-DSA-44 sign |
209012 cycles |
209006 cycles |
1.00 |
ML-DSA-44 verify |
74740 cycles |
74747 cycles |
1.00 |
ML-DSA-65 keypair |
125928 cycles |
125942 cycles |
1.00 |
ML-DSA-65 sign |
345460 cycles |
345438 cycles |
1.00 |
ML-DSA-65 verify |
123996 cycles |
124189 cycles |
1.00 |
ML-DSA-87 keypair |
206638 cycles |
206597 cycles |
1.00 |
ML-DSA-87 sign |
439693 cycles |
439813 cycles |
1.00 |
ML-DSA-87 verify |
204443 cycles |
204460 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112122 cycles |
112118 cycles |
1.00 |
ML-DSA-44 sign |
353881 cycles |
353767 cycles |
1.00 |
ML-DSA-44 verify |
117195 cycles |
117191 cycles |
1.00 |
ML-DSA-65 keypair |
194355 cycles |
194374 cycles |
1.00 |
ML-DSA-65 sign |
583731 cycles |
583730 cycles |
1.00 |
ML-DSA-65 verify |
193133 cycles |
193093 cycles |
1.00 |
ML-DSA-87 keypair |
320002 cycles |
320119 cycles |
1.00 |
ML-DSA-87 sign |
747427 cycles |
747165 cycles |
1.00 |
ML-DSA-87 verify |
317882 cycles |
318002 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138025 cycles |
138072 cycles |
1.00 |
ML-DSA-44 sign |
485995 cycles |
486271 cycles |
1.00 |
ML-DSA-44 verify |
149089 cycles |
149116 cycles |
1.00 |
ML-DSA-65 keypair |
241809 cycles |
241864 cycles |
1.00 |
ML-DSA-65 sign |
791628 cycles |
791723 cycles |
1.00 |
ML-DSA-65 verify |
241527 cycles |
241299 cycles |
1.00 |
ML-DSA-87 keypair |
396331 cycles |
396324 cycles |
1.00 |
ML-DSA-87 sign |
1013226 cycles |
1019414 cycles |
0.99 |
ML-DSA-87 verify |
403783 cycles |
403735 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
211752 cycles |
211724 cycles |
1.00 |
ML-DSA-44 sign |
760132 cycles |
760180 cycles |
1.00 |
ML-DSA-44 verify |
229569 cycles |
229569 cycles |
1 |
ML-DSA-65 keypair |
378185 cycles |
378138 cycles |
1.00 |
ML-DSA-65 sign |
1247309 cycles |
1247214 cycles |
1.00 |
ML-DSA-65 verify |
372234 cycles |
371998 cycles |
1.00 |
ML-DSA-87 keypair |
601566 cycles |
601823 cycles |
1.00 |
ML-DSA-87 sign |
1582270 cycles |
1582328 cycles |
1.00 |
ML-DSA-87 verify |
617531 cycles |
617758 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
271669 cycles |
268067 cycles |
1.01 |
ML-DSA-44 sign |
810367 cycles |
814806 cycles |
0.99 |
ML-DSA-44 verify |
273132 cycles |
271238 cycles |
1.01 |
ML-DSA-65 keypair |
466736 cycles |
462202 cycles |
1.01 |
ML-DSA-65 sign |
1356642 cycles |
1331273 cycles |
1.02 |
ML-DSA-65 verify |
455030 cycles |
448838 cycles |
1.01 |
ML-DSA-87 keypair |
799623 cycles |
792168 cycles |
1.01 |
ML-DSA-87 sign |
1848570 cycles |
1802141 cycles |
1.03 |
ML-DSA-87 verify |
776020 cycles |
771477 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
465420 cycles |
465524 cycles |
1.00 |
ML-DSA-44 sign |
2136883 cycles |
2145613 cycles |
1.00 |
ML-DSA-44 verify |
557374 cycles |
557314 cycles |
1.00 |
ML-DSA-65 keypair |
784496 cycles |
785558 cycles |
1.00 |
ML-DSA-65 sign |
3494793 cycles |
3501121 cycles |
1.00 |
ML-DSA-65 verify |
868115 cycles |
871709 cycles |
1.00 |
ML-DSA-87 keypair |
1273557 cycles |
1268515 cycles |
1.00 |
ML-DSA-87 sign |
4353626 cycles |
4307540 cycles |
1.01 |
ML-DSA-87 verify |
1395248 cycles |
1395476 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: 7c13c63 | Previous: 70fdba0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
760220 cycles |
759843 cycles |
1.00 |
ML-DSA-44 sign |
3140735 cycles |
3139003 cycles |
1.00 |
ML-DSA-44 verify |
859523 cycles |
859050 cycles |
1.00 |
ML-DSA-65 keypair |
1285550 cycles |
1286052 cycles |
1.00 |
ML-DSA-65 sign |
5074741 cycles |
5077105 cycles |
1.00 |
ML-DSA-65 verify |
1363668 cycles |
1364403 cycles |
1.00 |
ML-DSA-87 keypair |
2111679 cycles |
2110480 cycles |
1.00 |
ML-DSA-87 sign |
6348351 cycles |
6365186 cycles |
1.00 |
ML-DSA-87 verify |
2227517 cycles |
2229232 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Summary
Draft — opened while the HOL-Light proofs are still in progress.
Replaces the AVX2 intrinsics implementations of
rej_uniform_eta2andrej_uniform_eta4with hand-written x86_64 assembly, following the same approach as #1014:.rodata), enabling future HOL-Light formal verification.__contract__annotations on the asm entry points (CBMC), to be kept in sync with the HOL-Light specs.meta.hwires both eta2 and eta4 to the new asm, so the functional test suite exercises both paths.scripts/autogenand the x86_64 HOL-LightMakefileregister the eta2/eta4 bytecode dump targets.poly_uniform_eta_4xcomponent benchmark.Scope of this draft
This draft contains assembly + build/bytecode infrastructure only. It intentionally excludes the HOL-Light
.mlproofs, which are still being developed. The proofs will be added before this is marked ready for review.TODO before ready-for-review