x86_64 + HOL-Light: Replace polyz_unpack AVX2 intrinsics with hand-written assembly and HOL-Light proofs#1182
Conversation
923c582 to
bbb1114
Compare
6db58be to
5fa4415
Compare
CBMC Results (ML-DSA-65, REDUCE-RAM)Full Results (205 proofs)
|
CBMC Results (ML-DSA-44, REDUCE-RAM)Full Results (205 proofs)
|
CBMC Results (ML-DSA-87, REDUCE-RAM)Full Results (205 proofs)
|
CBMC Results (ML-DSA-44)Full Results (205 proofs)
|
CBMC Results (ML-DSA-65)Full Results (205 proofs)
|
CBMC Results (ML-DSA-87)Full Results (205 proofs)
|
Replace the AVX2 C intrinsics for polyz_unpack with fully-unrolled hand-written assembly, mirroring the existing AArch64 conversion, and add HOL-Light functional-correctness and memory-safety proofs together with CBMC contracts. Both variants are covered: - polyz_unpack_17 (GAMMA1 = 2^17, 18-bit packed, ML-DSA-44) - polyz_unpack_19 (GAMMA1 = 2^19, 20-bit packed, ML-DSA-65/87) Each routine unpacks 8 coefficients per block (VPSHUFB/VPSRLVD/VPAND/ VPSUBD) and builds the shuffle, shift, mask and gamma1 constants inline, so it takes no table argument. Resolves #925. Resolves #915. Signed-off-by: Jake Massimo <jakemas@amazon.com>
5fa4415 to
59d66bc
Compare
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46538 cycles |
46504 cycles |
1.00 |
ML-DSA-44 sign |
131056 cycles |
131079 cycles |
1.00 |
ML-DSA-44 verify |
47344 cycles |
47310 cycles |
1.00 |
ML-DSA-65 keypair |
81686 cycles |
81680 cycles |
1.00 |
ML-DSA-65 sign |
215338 cycles |
215311 cycles |
1.00 |
ML-DSA-65 verify |
79299 cycles |
79299 cycles |
1 |
ML-DSA-87 keypair |
132402 cycles |
132405 cycles |
1.00 |
ML-DSA-87 sign |
277420 cycles |
277328 cycles |
1.00 |
ML-DSA-87 verify |
134053 cycles |
134051 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113005 cycles |
112758 cycles |
1.00 |
ML-DSA-44 sign |
401889 cycles |
400845 cycles |
1.00 |
ML-DSA-44 verify |
119695 cycles |
119413 cycles |
1.00 |
ML-DSA-65 keypair |
193028 cycles |
192933 cycles |
1.00 |
ML-DSA-65 sign |
650224 cycles |
649924 cycles |
1.00 |
ML-DSA-65 verify |
192946 cycles |
192850 cycles |
1.00 |
ML-DSA-87 keypair |
318796 cycles |
318753 cycles |
1.00 |
ML-DSA-87 sign |
828689 cycles |
828716 cycles |
1.00 |
ML-DSA-87 verify |
326818 cycles |
326677 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112162 cycles |
112128 cycles |
1.00 |
ML-DSA-44 sign |
353476 cycles |
353788 cycles |
1.00 |
ML-DSA-44 verify |
117003 cycles |
117189 cycles |
1.00 |
ML-DSA-65 keypair |
194780 cycles |
194358 cycles |
1.00 |
ML-DSA-65 sign |
583867 cycles |
583733 cycles |
1.00 |
ML-DSA-65 verify |
192717 cycles |
193111 cycles |
1.00 |
ML-DSA-87 keypair |
320898 cycles |
320083 cycles |
1.00 |
ML-DSA-87 sign |
747284 cycles |
747201 cycles |
1.00 |
ML-DSA-87 verify |
318777 cycles |
317895 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
211481 cycles |
211527 cycles |
1.00 |
ML-DSA-44 sign |
758522 cycles |
759858 cycles |
1.00 |
ML-DSA-44 verify |
228897 cycles |
229351 cycles |
1.00 |
ML-DSA-65 keypair |
377221 cycles |
378548 cycles |
1.00 |
ML-DSA-65 sign |
1247439 cycles |
1247648 cycles |
1.00 |
ML-DSA-65 verify |
371392 cycles |
372375 cycles |
1.00 |
ML-DSA-87 keypair |
600368 cycles |
601887 cycles |
1.00 |
ML-DSA-87 sign |
1582562 cycles |
1582472 cycles |
1.00 |
ML-DSA-87 verify |
616028 cycles |
617767 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
269223 cycles |
266639 cycles |
1.01 |
ML-DSA-44 sign |
810461 cycles |
808808 cycles |
1.00 |
ML-DSA-44 verify |
269978 cycles |
270153 cycles |
1.00 |
ML-DSA-65 keypair |
462109 cycles |
461300 cycles |
1.00 |
ML-DSA-65 sign |
1320714 cycles |
1323365 cycles |
1.00 |
ML-DSA-65 verify |
446492 cycles |
447313 cycles |
1.00 |
ML-DSA-87 keypair |
788628 cycles |
789934 cycles |
1.00 |
ML-DSA-87 sign |
1804002 cycles |
1827662 cycles |
0.99 |
ML-DSA-87 verify |
769951 cycles |
769879 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
43380 cycles |
43415 cycles |
1.00 |
ML-DSA-44 sign |
130483 cycles |
130832 cycles |
1.00 |
ML-DSA-44 verify |
45056 cycles |
45316 cycles |
0.99 |
ML-DSA-65 keypair |
75565 cycles |
75430 cycles |
1.00 |
ML-DSA-65 sign |
214566 cycles |
215011 cycles |
1.00 |
ML-DSA-65 verify |
74423 cycles |
74350 cycles |
1.00 |
ML-DSA-87 keypair |
123348 cycles |
123299 cycles |
1.00 |
ML-DSA-87 sign |
271867 cycles |
271370 cycles |
1.00 |
ML-DSA-87 verify |
120820 cycles |
120802 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
91440 cycles |
91498 cycles |
1.00 |
ML-DSA-44 sign |
351947 cycles |
352373 cycles |
1.00 |
ML-DSA-44 verify |
99663 cycles |
99791 cycles |
1.00 |
ML-DSA-65 keypair |
153951 cycles |
153828 cycles |
1.00 |
ML-DSA-65 sign |
571505 cycles |
571106 cycles |
1.00 |
ML-DSA-65 verify |
159919 cycles |
159761 cycles |
1.00 |
ML-DSA-87 keypair |
255278 cycles |
255664 cycles |
1.00 |
ML-DSA-87 sign |
726078 cycles |
726163 cycles |
1.00 |
ML-DSA-87 verify |
264097 cycles |
263879 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
55459 cycles |
55320 cycles |
1.00 |
ML-DSA-44 sign |
158876 cycles |
159183 cycles |
1.00 |
ML-DSA-44 verify |
57447 cycles |
57873 cycles |
0.99 |
ML-DSA-65 keypair |
96676 cycles |
95830 cycles |
1.01 |
ML-DSA-65 sign |
261462 cycles |
263504 cycles |
0.99 |
ML-DSA-65 verify |
96066 cycles |
96058 cycles |
1.00 |
ML-DSA-87 keypair |
154936 cycles |
154555 cycles |
1.00 |
ML-DSA-87 sign |
322625 cycles |
322765 cycles |
1.00 |
ML-DSA-87 verify |
151396 cycles |
150846 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
133301 cycles |
133213 cycles |
1.00 |
ML-DSA-44 sign |
518520 cycles |
518708 cycles |
1.00 |
ML-DSA-44 verify |
146594 cycles |
146419 cycles |
1.00 |
ML-DSA-65 keypair |
224120 cycles |
225658 cycles |
0.99 |
ML-DSA-65 sign |
843541 cycles |
848883 cycles |
0.99 |
ML-DSA-65 verify |
234447 cycles |
236008 cycles |
0.99 |
ML-DSA-87 keypair |
370009 cycles |
366727 cycles |
1.01 |
ML-DSA-87 sign |
1065537 cycles |
1060028 cycles |
1.01 |
ML-DSA-87 verify |
383425 cycles |
380316 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
462528 cycles |
462947 cycles |
1.00 |
ML-DSA-44 sign |
2133159 cycles |
2132913 cycles |
1.00 |
ML-DSA-44 verify |
554944 cycles |
555065 cycles |
1.00 |
ML-DSA-65 keypair |
781336 cycles |
781178 cycles |
1.00 |
ML-DSA-65 sign |
3478730 cycles |
3495803 cycles |
1.00 |
ML-DSA-65 verify |
864668 cycles |
863872 cycles |
1.00 |
ML-DSA-87 keypair |
1261932 cycles |
1265586 cycles |
1.00 |
ML-DSA-87 sign |
4306755 cycles |
4308959 cycles |
1.00 |
ML-DSA-87 verify |
1387384 cycles |
1390453 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
67305 cycles |
67251 cycles |
1.00 |
ML-DSA-44 sign |
198288 cycles |
198318 cycles |
1.00 |
ML-DSA-44 verify |
70209 cycles |
70240 cycles |
1.00 |
ML-DSA-65 keypair |
119599 cycles |
119362 cycles |
1.00 |
ML-DSA-65 sign |
326513 cycles |
325982 cycles |
1.00 |
ML-DSA-65 verify |
116914 cycles |
116934 cycles |
1.00 |
ML-DSA-87 keypair |
196427 cycles |
196595 cycles |
1.00 |
ML-DSA-87 sign |
421396 cycles |
421892 cycles |
1.00 |
ML-DSA-87 verify |
193328 cycles |
193379 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46677 cycles |
46860 cycles |
1.00 |
ML-DSA-44 sign |
138850 cycles |
139486 cycles |
1.00 |
ML-DSA-44 verify |
49381 cycles |
49318 cycles |
1.00 |
ML-DSA-65 keypair |
82168 cycles |
81920 cycles |
1.00 |
ML-DSA-65 sign |
227285 cycles |
227729 cycles |
1.00 |
ML-DSA-65 verify |
82051 cycles |
81877 cycles |
1.00 |
ML-DSA-87 keypair |
129777 cycles |
131127 cycles |
0.99 |
ML-DSA-87 sign |
280956 cycles |
281288 cycles |
1.00 |
ML-DSA-87 verify |
128748 cycles |
129015 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
61569 cycles |
61685 cycles |
1.00 |
ML-DSA-44 sign |
188918 cycles |
188995 cycles |
1.00 |
ML-DSA-44 verify |
66333 cycles |
66273 cycles |
1.00 |
ML-DSA-65 keypair |
108150 cycles |
111473 cycles |
0.97 |
ML-DSA-65 sign |
311564 cycles |
312617 cycles |
1.00 |
ML-DSA-65 verify |
108710 cycles |
110558 cycles |
0.98 |
ML-DSA-87 keypair |
170916 cycles |
171460 cycles |
1.00 |
ML-DSA-87 sign |
378529 cycles |
377513 cycles |
1.00 |
ML-DSA-87 verify |
169949 cycles |
169102 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
127582 cycles |
127647 cycles |
1.00 |
ML-DSA-44 sign |
441135 cycles |
441052 cycles |
1.00 |
ML-DSA-44 verify |
136422 cycles |
136340 cycles |
1.00 |
ML-DSA-65 keypair |
220535 cycles |
220707 cycles |
1.00 |
ML-DSA-65 sign |
714612 cycles |
713810 cycles |
1.00 |
ML-DSA-65 verify |
220992 cycles |
220735 cycles |
1.00 |
ML-DSA-87 keypair |
364622 cycles |
365112 cycles |
1.00 |
ML-DSA-87 sign |
915759 cycles |
921310 cycles |
0.99 |
ML-DSA-87 verify |
370898 cycles |
370798 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
118937 cycles |
118568 cycles |
1.00 |
ML-DSA-44 sign |
458463 cycles |
458777 cycles |
1.00 |
ML-DSA-44 verify |
130553 cycles |
131112 cycles |
1.00 |
ML-DSA-65 keypair |
201113 cycles |
200686 cycles |
1.00 |
ML-DSA-65 sign |
747063 cycles |
743736 cycles |
1.00 |
ML-DSA-65 verify |
209161 cycles |
209264 cycles |
1.00 |
ML-DSA-87 keypair |
330712 cycles |
330171 cycles |
1.00 |
ML-DSA-87 sign |
940688 cycles |
935697 cycles |
1.01 |
ML-DSA-87 verify |
343632 cycles |
343489 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
154262 cycles |
154753 cycles |
1.00 |
ML-DSA-44 sign |
587223 cycles |
591158 cycles |
0.99 |
ML-DSA-44 verify |
169382 cycles |
169947 cycles |
1.00 |
ML-DSA-65 keypair |
261747 cycles |
261726 cycles |
1.00 |
ML-DSA-65 sign |
962117 cycles |
963022 cycles |
1.00 |
ML-DSA-65 verify |
271696 cycles |
271733 cycles |
1.00 |
ML-DSA-87 keypair |
431494 cycles |
431775 cycles |
1.00 |
ML-DSA-87 sign |
1210792 cycles |
1213443 cycles |
1.00 |
ML-DSA-87 verify |
446908 cycles |
447885 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
71378 cycles |
71561 cycles |
1.00 |
ML-DSA-44 sign |
208977 cycles |
209019 cycles |
1.00 |
ML-DSA-44 verify |
74785 cycles |
74745 cycles |
1.00 |
ML-DSA-65 keypair |
125939 cycles |
125918 cycles |
1.00 |
ML-DSA-65 sign |
345606 cycles |
345393 cycles |
1.00 |
ML-DSA-65 verify |
124103 cycles |
124184 cycles |
1.00 |
ML-DSA-87 keypair |
207057 cycles |
206649 cycles |
1.00 |
ML-DSA-87 sign |
444011 cycles |
439798 cycles |
1.01 |
ML-DSA-87 verify |
204091 cycles |
204451 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
137909 cycles |
138038 cycles |
1.00 |
ML-DSA-44 sign |
485980 cycles |
486082 cycles |
1.00 |
ML-DSA-44 verify |
149036 cycles |
149093 cycles |
1.00 |
ML-DSA-65 keypair |
241505 cycles |
241929 cycles |
1.00 |
ML-DSA-65 sign |
791913 cycles |
791534 cycles |
1.00 |
ML-DSA-65 verify |
242154 cycles |
241280 cycles |
1.00 |
ML-DSA-87 keypair |
395744 cycles |
396309 cycles |
1.00 |
ML-DSA-87 sign |
1013505 cycles |
1019346 cycles |
0.99 |
ML-DSA-87 verify |
403614 cycles |
403741 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112384 cycles |
112529 cycles |
1.00 |
ML-DSA-44 sign |
354309 cycles |
354040 cycles |
1.00 |
ML-DSA-44 verify |
117483 cycles |
117389 cycles |
1.00 |
ML-DSA-65 keypair |
194487 cycles |
194738 cycles |
1.00 |
ML-DSA-65 sign |
584490 cycles |
584565 cycles |
1.00 |
ML-DSA-65 verify |
193511 cycles |
193297 cycles |
1.00 |
ML-DSA-87 keypair |
320904 cycles |
321000 cycles |
1.00 |
ML-DSA-87 sign |
747799 cycles |
746870 cycles |
1.00 |
ML-DSA-87 verify |
318013 cycles |
318737 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
224455 cycles |
214312 cycles |
1.05 |
ML-DSA-44 sign |
634826 cycles |
605496 cycles |
1.05 |
ML-DSA-44 verify |
231700 cycles |
222932 cycles |
1.04 |
ML-DSA-65 keypair |
393071 cycles |
390142 cycles |
1.01 |
ML-DSA-65 sign |
1021382 cycles |
1004679 cycles |
1.02 |
ML-DSA-65 verify |
371512 cycles |
374059 cycles |
0.99 |
ML-DSA-87 keypair |
668383 cycles |
652450 cycles |
1.02 |
ML-DSA-87 sign |
1390293 cycles |
1336385 cycles |
1.04 |
ML-DSA-87 verify |
648013 cycles |
633167 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
224455 cycles |
214312 cycles |
1.05 |
ML-DSA-44 sign |
634826 cycles |
605496 cycles |
1.05 |
ML-DSA-44 verify |
231700 cycles |
222932 cycles |
1.04 |
ML-DSA-87 sign |
1390293 cycles |
1336385 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212278 cycles |
211802 cycles |
1.00 |
ML-DSA-44 sign |
761115 cycles |
759783 cycles |
1.00 |
ML-DSA-44 verify |
230009 cycles |
229307 cycles |
1.00 |
ML-DSA-65 keypair |
378733 cycles |
377288 cycles |
1.00 |
ML-DSA-65 sign |
1248066 cycles |
1247192 cycles |
1.00 |
ML-DSA-65 verify |
373449 cycles |
371605 cycles |
1.00 |
ML-DSA-87 keypair |
602576 cycles |
601230 cycles |
1.00 |
ML-DSA-87 sign |
1584726 cycles |
1584827 cycles |
1.00 |
ML-DSA-87 verify |
618395 cycles |
616717 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
308287 cycles |
312767 cycles |
0.99 |
ML-DSA-44 sign |
1183932 cycles |
1207539 cycles |
0.98 |
ML-DSA-44 verify |
345058 cycles |
342752 cycles |
1.01 |
ML-DSA-65 keypair |
574462 cycles |
567893 cycles |
1.01 |
ML-DSA-65 sign |
1953652 cycles |
1961247 cycles |
1.00 |
ML-DSA-65 verify |
551106 cycles |
560470 cycles |
0.98 |
ML-DSA-87 keypair |
880489 cycles |
849145 cycles |
1.04 |
ML-DSA-87 sign |
2480605 cycles |
2376735 cycles |
1.04 |
ML-DSA-87 verify |
925298 cycles |
890655 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-87 keypair |
880489 cycles |
849145 cycles |
1.04 |
ML-DSA-87 sign |
2480605 cycles |
2376735 cycles |
1.04 |
ML-DSA-87 verify |
925298 cycles |
890655 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: 59d66bc | Previous: 9c5eceb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
759819 cycles |
759567 cycles |
1.00 |
ML-DSA-44 sign |
3138270 cycles |
3137937 cycles |
1.00 |
ML-DSA-44 verify |
858910 cycles |
858870 cycles |
1.00 |
ML-DSA-65 keypair |
1285508 cycles |
1285661 cycles |
1.00 |
ML-DSA-65 sign |
5072844 cycles |
5075651 cycles |
1.00 |
ML-DSA-65 verify |
1363435 cycles |
1364449 cycles |
1.00 |
ML-DSA-87 keypair |
2112235 cycles |
2111749 cycles |
1.00 |
ML-DSA-87 sign |
6345694 cycles |
6355811 cycles |
1.00 |
ML-DSA-87 verify |
2227450 cycles |
2228276 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
mkannwischer
left a comment
There was a problem hiding this comment.
Thanks @jakemas! This looks great!
@hanno-becker, could take a quick look, too?
hanno-becker
left a comment
There was a problem hiding this comment.
This is looking great, thank you @jakemas!
Resolves #925
Resolves #915
Replace the AVX2 C intrinsics for
polyz_unpackwith fully-unrolled hand-written assembly, mirroring the existing AArch64 conversion, and add HOL-Light functional-correctness and memory-safety proofs together with CBMC contracts. Both variants are covered:polyz_unpack_17(GAMMA1 = 2^17, 18-bit packed, ML-DSA-44)polyz_unpack_19(GAMMA1 = 2^19, 20-bit packed, ML-DSA-65/87)Performance
polyz_unpackcomponent benchmark, median cycles on AMD EPYC (c6a),OPT=1 CYCLES=PMU:main)polyz_unpack ML-DSA-44polyz_unpack ML-DSA-65/87With this, just 2:
use_hint(conversion in PR),uniform_rej_eta{2/4}(I've done the conversion, now working on proof) remain as x86 intrinsics.