Simplify decompose C reference to a single high multiplication#1177
Simplify decompose C reference to a single high multiplication#1177mkannwischer wants to merge 1 commit into
Conversation
CBMC Results (ML-DSA-65, REDUCE-RAM)Full Results (204 proofs)
|
CBMC Results (ML-DSA-44, REDUCE-RAM)Full Results (204 proofs)
|
CBMC Results (ML-DSA-87, REDUCE-RAM)Full Results (204 proofs)
|
CBMC Results (ML-DSA-44)Full Results (204 proofs)
|
CBMC Results (ML-DSA-65)Full Results (204 proofs)
|
CBMC Results (ML-DSA-87)Full Results (204 proofs)
|
c5826af to
9bb8f8d
Compare
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46485 cycles |
46485 cycles |
1 |
ML-DSA-44 sign |
131073 cycles |
131058 cycles |
1.00 |
ML-DSA-44 verify |
47306 cycles |
47305 cycles |
1.00 |
ML-DSA-65 keypair |
81684 cycles |
81689 cycles |
1.00 |
ML-DSA-65 sign |
215307 cycles |
215323 cycles |
1.00 |
ML-DSA-65 verify |
79296 cycles |
79302 cycles |
1.00 |
ML-DSA-87 keypair |
132405 cycles |
132467 cycles |
1.00 |
ML-DSA-87 sign |
277644 cycles |
277549 cycles |
1.00 |
ML-DSA-87 verify |
134050 cycles |
134119 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113139 cycles |
112736 cycles |
1.00 |
ML-DSA-44 sign |
399645 cycles |
400878 cycles |
1.00 |
ML-DSA-44 verify |
119028 cycles |
119432 cycles |
1.00 |
ML-DSA-65 keypair |
193658 cycles |
192975 cycles |
1.00 |
ML-DSA-65 sign |
644542 cycles |
649981 cycles |
0.99 |
ML-DSA-65 verify |
191939 cycles |
192870 cycles |
1.00 |
ML-DSA-87 keypair |
318942 cycles |
318797 cycles |
1.00 |
ML-DSA-87 sign |
822796 cycles |
828723 cycles |
0.99 |
ML-DSA-87 verify |
325445 cycles |
326796 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
43417 cycles |
43553 cycles |
1.00 |
ML-DSA-44 sign |
130691 cycles |
131114 cycles |
1.00 |
ML-DSA-44 verify |
45128 cycles |
45587 cycles |
0.99 |
ML-DSA-65 keypair |
75814 cycles |
75908 cycles |
1.00 |
ML-DSA-65 sign |
215254 cycles |
215310 cycles |
1.00 |
ML-DSA-65 verify |
74377 cycles |
74636 cycles |
1.00 |
ML-DSA-87 keypair |
123621 cycles |
123588 cycles |
1.00 |
ML-DSA-87 sign |
271463 cycles |
272387 cycles |
1.00 |
ML-DSA-87 verify |
120973 cycles |
120775 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
91331 cycles |
91530 cycles |
1.00 |
ML-DSA-44 sign |
351183 cycles |
352430 cycles |
1.00 |
ML-DSA-44 verify |
99090 cycles |
99915 cycles |
0.99 |
ML-DSA-65 keypair |
154160 cycles |
153947 cycles |
1.00 |
ML-DSA-65 sign |
570281 cycles |
572172 cycles |
1.00 |
ML-DSA-65 verify |
159498 cycles |
159760 cycles |
1.00 |
ML-DSA-87 keypair |
255196 cycles |
254922 cycles |
1.00 |
ML-DSA-87 sign |
722193 cycles |
726518 cycles |
0.99 |
ML-DSA-87 verify |
263128 cycles |
263741 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
55431 cycles |
55164 cycles |
1.00 |
ML-DSA-44 sign |
159071 cycles |
159123 cycles |
1.00 |
ML-DSA-44 verify |
57563 cycles |
57907 cycles |
0.99 |
ML-DSA-65 keypair |
96226 cycles |
95536 cycles |
1.01 |
ML-DSA-65 sign |
263817 cycles |
263540 cycles |
1.00 |
ML-DSA-65 verify |
96247 cycles |
96187 cycles |
1.00 |
ML-DSA-87 keypair |
154986 cycles |
154719 cycles |
1.00 |
ML-DSA-87 sign |
322492 cycles |
322467 cycles |
1.00 |
ML-DSA-87 verify |
151333 cycles |
150859 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
133161 cycles |
133031 cycles |
1.00 |
ML-DSA-44 sign |
516727 cycles |
518961 cycles |
1.00 |
ML-DSA-44 verify |
146251 cycles |
146356 cycles |
1.00 |
ML-DSA-65 keypair |
224163 cycles |
223780 cycles |
1.00 |
ML-DSA-65 sign |
844780 cycles |
843408 cycles |
1.00 |
ML-DSA-65 verify |
233848 cycles |
234043 cycles |
1.00 |
ML-DSA-87 keypair |
370291 cycles |
367594 cycles |
1.01 |
ML-DSA-87 sign |
1065740 cycles |
1061107 cycles |
1.00 |
ML-DSA-87 verify |
382861 cycles |
380814 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112523 cycles |
112521 cycles |
1.00 |
ML-DSA-44 sign |
354964 cycles |
354071 cycles |
1.00 |
ML-DSA-44 verify |
117553 cycles |
117392 cycles |
1.00 |
ML-DSA-65 keypair |
194393 cycles |
194710 cycles |
1.00 |
ML-DSA-65 sign |
584208 cycles |
584562 cycles |
1.00 |
ML-DSA-65 verify |
193325 cycles |
193311 cycles |
1.00 |
ML-DSA-87 keypair |
321448 cycles |
321242 cycles |
1.00 |
ML-DSA-87 sign |
749057 cycles |
747207 cycles |
1.00 |
ML-DSA-87 verify |
318617 cycles |
318958 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
47023 cycles |
46737 cycles |
1.01 |
ML-DSA-44 sign |
140001 cycles |
139162 cycles |
1.01 |
ML-DSA-44 verify |
49133 cycles |
49579 cycles |
0.99 |
ML-DSA-65 keypair |
82397 cycles |
82474 cycles |
1.00 |
ML-DSA-65 sign |
228366 cycles |
228196 cycles |
1.00 |
ML-DSA-65 verify |
82016 cycles |
81813 cycles |
1.00 |
ML-DSA-87 keypair |
130805 cycles |
129675 cycles |
1.01 |
ML-DSA-87 sign |
281569 cycles |
279317 cycles |
1.01 |
ML-DSA-87 verify |
130616 cycles |
128340 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
67406 cycles |
67286 cycles |
1.00 |
ML-DSA-44 sign |
198364 cycles |
198339 cycles |
1.00 |
ML-DSA-44 verify |
70204 cycles |
70253 cycles |
1.00 |
ML-DSA-65 keypair |
119285 cycles |
119300 cycles |
1.00 |
ML-DSA-65 sign |
326202 cycles |
325686 cycles |
1.00 |
ML-DSA-65 verify |
116868 cycles |
116842 cycles |
1.00 |
ML-DSA-87 keypair |
196485 cycles |
196569 cycles |
1.00 |
ML-DSA-87 sign |
421245 cycles |
421883 cycles |
1.00 |
ML-DSA-87 verify |
193229 cycles |
193336 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
118231 cycles |
118077 cycles |
1.00 |
ML-DSA-44 sign |
455256 cycles |
458854 cycles |
0.99 |
ML-DSA-44 verify |
130717 cycles |
130666 cycles |
1.00 |
ML-DSA-65 keypair |
200754 cycles |
201340 cycles |
1.00 |
ML-DSA-65 sign |
742358 cycles |
741817 cycles |
1.00 |
ML-DSA-65 verify |
208879 cycles |
209500 cycles |
1.00 |
ML-DSA-87 keypair |
330201 cycles |
330842 cycles |
1.00 |
ML-DSA-87 sign |
938760 cycles |
936928 cycles |
1.00 |
ML-DSA-87 verify |
342173 cycles |
343616 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212090 cycles |
211907 cycles |
1.00 |
ML-DSA-44 sign |
753861 cycles |
760155 cycles |
0.99 |
ML-DSA-44 verify |
228257 cycles |
229379 cycles |
1.00 |
ML-DSA-65 keypair |
376614 cycles |
378157 cycles |
1.00 |
ML-DSA-65 sign |
1232908 cycles |
1250904 cycles |
0.99 |
ML-DSA-65 verify |
370936 cycles |
372722 cycles |
1.00 |
ML-DSA-87 keypair |
603139 cycles |
600977 cycles |
1.00 |
ML-DSA-87 sign |
1572054 cycles |
1584763 cycles |
0.99 |
ML-DSA-87 verify |
616641 cycles |
616480 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
127569 cycles |
127637 cycles |
1.00 |
ML-DSA-44 sign |
436957 cycles |
441211 cycles |
0.99 |
ML-DSA-44 verify |
135453 cycles |
136398 cycles |
0.99 |
ML-DSA-65 keypair |
220890 cycles |
220759 cycles |
1.00 |
ML-DSA-65 sign |
707720 cycles |
713650 cycles |
0.99 |
ML-DSA-65 verify |
220274 cycles |
220740 cycles |
1.00 |
ML-DSA-87 keypair |
364568 cycles |
365093 cycles |
1.00 |
ML-DSA-87 sign |
908294 cycles |
921276 cycles |
0.99 |
ML-DSA-87 verify |
369293 cycles |
370783 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
61928 cycles |
61599 cycles |
1.01 |
ML-DSA-44 sign |
189843 cycles |
189177 cycles |
1.00 |
ML-DSA-44 verify |
66558 cycles |
66462 cycles |
1.00 |
ML-DSA-65 keypair |
111312 cycles |
110803 cycles |
1.00 |
ML-DSA-65 sign |
316194 cycles |
315558 cycles |
1.00 |
ML-DSA-65 verify |
110980 cycles |
111490 cycles |
1.00 |
ML-DSA-87 keypair |
170913 cycles |
171547 cycles |
1.00 |
ML-DSA-87 sign |
378782 cycles |
378591 cycles |
1.00 |
ML-DSA-87 verify |
169723 cycles |
169379 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
154749 cycles |
154402 cycles |
1.00 |
ML-DSA-44 sign |
587983 cycles |
589538 cycles |
1.00 |
ML-DSA-44 verify |
168810 cycles |
169828 cycles |
0.99 |
ML-DSA-65 keypair |
262287 cycles |
263339 cycles |
1.00 |
ML-DSA-65 sign |
957036 cycles |
966082 cycles |
0.99 |
ML-DSA-65 verify |
271611 cycles |
272946 cycles |
1.00 |
ML-DSA-87 keypair |
432882 cycles |
432716 cycles |
1.00 |
ML-DSA-87 sign |
1209381 cycles |
1216186 cycles |
0.99 |
ML-DSA-87 verify |
447652 cycles |
447711 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
71369 cycles |
71552 cycles |
1.00 |
ML-DSA-44 sign |
208953 cycles |
208994 cycles |
1.00 |
ML-DSA-44 verify |
74780 cycles |
74736 cycles |
1.00 |
ML-DSA-65 keypair |
125947 cycles |
125905 cycles |
1.00 |
ML-DSA-65 sign |
345638 cycles |
345393 cycles |
1.00 |
ML-DSA-65 verify |
124113 cycles |
124212 cycles |
1.00 |
ML-DSA-87 keypair |
207065 cycles |
206527 cycles |
1.00 |
ML-DSA-87 sign |
444028 cycles |
439858 cycles |
1.01 |
ML-DSA-87 verify |
204117 cycles |
204429 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
137911 cycles |
138034 cycles |
1.00 |
ML-DSA-44 sign |
482215 cycles |
486060 cycles |
0.99 |
ML-DSA-44 verify |
148228 cycles |
149072 cycles |
0.99 |
ML-DSA-65 keypair |
240961 cycles |
241789 cycles |
1.00 |
ML-DSA-65 sign |
784814 cycles |
791605 cycles |
0.99 |
ML-DSA-65 verify |
240936 cycles |
241325 cycles |
1.00 |
ML-DSA-87 keypair |
395872 cycles |
396314 cycles |
1.00 |
ML-DSA-87 sign |
1006622 cycles |
1019348 cycles |
0.99 |
ML-DSA-87 verify |
402277 cycles |
403755 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112170 cycles |
112136 cycles |
1.00 |
ML-DSA-44 sign |
353500 cycles |
353794 cycles |
1.00 |
ML-DSA-44 verify |
117008 cycles |
117198 cycles |
1.00 |
ML-DSA-65 keypair |
194777 cycles |
194374 cycles |
1.00 |
ML-DSA-65 sign |
583911 cycles |
583728 cycles |
1.00 |
ML-DSA-65 verify |
192722 cycles |
193104 cycles |
1.00 |
ML-DSA-87 keypair |
320921 cycles |
320068 cycles |
1.00 |
ML-DSA-87 sign |
747350 cycles |
747194 cycles |
1.00 |
ML-DSA-87 verify |
318786 cycles |
317899 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
211603 cycles |
211623 cycles |
1.00 |
ML-DSA-44 sign |
751832 cycles |
760075 cycles |
0.99 |
ML-DSA-44 verify |
227491 cycles |
229458 cycles |
0.99 |
ML-DSA-65 keypair |
375371 cycles |
378162 cycles |
0.99 |
ML-DSA-65 sign |
1232745 cycles |
1247139 cycles |
0.99 |
ML-DSA-65 verify |
369289 cycles |
371939 cycles |
0.99 |
ML-DSA-87 keypair |
600293 cycles |
601521 cycles |
1.00 |
ML-DSA-87 sign |
1568166 cycles |
1582092 cycles |
0.99 |
ML-DSA-87 verify |
613423 cycles |
617517 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
hanno-becker
left a comment
There was a problem hiding this comment.
This looks right to me.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: 0f9bd57 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
268375 cycles |
267651 cycles |
1.00 |
ML-DSA-44 sign |
809831 cycles |
811144 cycles |
1.00 |
ML-DSA-44 verify |
269969 cycles |
270460 cycles |
1.00 |
ML-DSA-65 keypair |
459702 cycles |
459568 cycles |
1.00 |
ML-DSA-65 sign |
1315458 cycles |
1315519 cycles |
1.00 |
ML-DSA-65 verify |
445789 cycles |
445438 cycles |
1.00 |
ML-DSA-87 keypair |
796724 cycles |
786888 cycles |
1.01 |
ML-DSA-87 sign |
1805086 cycles |
1790709 cycles |
1.01 |
ML-DSA-87 verify |
777883 cycles |
767214 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: 0f9bd57 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
463089 cycles |
462914 cycles |
1.00 |
ML-DSA-44 sign |
2118538 cycles |
2132089 cycles |
0.99 |
ML-DSA-44 verify |
550906 cycles |
554284 cycles |
0.99 |
ML-DSA-65 keypair |
781134 cycles |
780479 cycles |
1.00 |
ML-DSA-65 sign |
3454267 cycles |
3479571 cycles |
0.99 |
ML-DSA-65 verify |
860012 cycles |
864154 cycles |
1.00 |
ML-DSA-87 keypair |
1261699 cycles |
1265342 cycles |
1.00 |
ML-DSA-87 sign |
4274189 cycles |
4320477 cycles |
0.99 |
ML-DSA-87 verify |
1380932 cycles |
1388614 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
760536 cycles |
760114 cycles |
1.00 |
ML-DSA-44 sign |
3115896 cycles |
3141235 cycles |
0.99 |
ML-DSA-44 verify |
855663 cycles |
859284 cycles |
1.00 |
ML-DSA-65 keypair |
1285920 cycles |
1285249 cycles |
1.00 |
ML-DSA-65 sign |
5085189 cycles |
5074895 cycles |
1.00 |
ML-DSA-65 verify |
1364204 cycles |
1363443 cycles |
1.00 |
ML-DSA-87 keypair |
2113049 cycles |
2114103 cycles |
1.00 |
ML-DSA-87 sign |
6359782 cycles |
6362879 cycles |
1.00 |
ML-DSA-87 verify |
2229223 cycles |
2230277 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: ec15c59 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
223726 cycles |
221766 cycles |
1.01 |
ML-DSA-44 sign |
613726 cycles |
614083 cycles |
1.00 |
ML-DSA-44 verify |
218800 cycles |
233383 cycles |
0.94 |
ML-DSA-65 keypair |
384926 cycles |
393803 cycles |
0.98 |
ML-DSA-65 sign |
998355 cycles |
1023539 cycles |
0.98 |
ML-DSA-65 verify |
368420 cycles |
377414 cycles |
0.98 |
ML-DSA-87 keypair |
663434 cycles |
669482 cycles |
0.99 |
ML-DSA-87 sign |
1356972 cycles |
1392850 cycles |
0.97 |
ML-DSA-87 verify |
644004 cycles |
643114 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: 9bb8f8d | Previous: 0f9bd57 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
302274 cycles |
301193 cycles |
1.00 |
ML-DSA-44 sign |
1139196 cycles |
1138793 cycles |
1.00 |
ML-DSA-44 verify |
322598 cycles |
328764 cycles |
0.98 |
ML-DSA-65 keypair |
558400 cycles |
549022 cycles |
1.02 |
ML-DSA-65 sign |
1865495 cycles |
1904832 cycles |
0.98 |
ML-DSA-65 verify |
547034 cycles |
529988 cycles |
1.03 |
ML-DSA-87 keypair |
890507 cycles |
850322 cycles |
1.05 |
ML-DSA-87 sign |
2420330 cycles |
2373299 cycles |
1.02 |
ML-DSA-87 verify |
899250 cycles |
877505 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 9bb8f8d | Previous: 0f9bd57 | Ratio |
|---|---|---|---|
ML-DSA-65 verify |
547034 cycles |
529988 cycles |
1.03 |
ML-DSA-87 keypair |
890507 cycles |
850322 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
Replace the two-step Barrett division (ceil(a/128) then Barrett-divide by 2*GAMMA2/128) with a direct high multiplication by floor(2^N / 2*GAMMA2), mirroring the AArch64 backend. For ML-DSA-44 this is `(a * 1477838209 + 2^47) >> 48`; for ML-DSA-65/87 it is `(a * 1074791425 + 2^48) >> 49`. Both constants strictly under-approximate 1/(2*GAMMA2), so half-points round down, matching the original round-half-down semantics, and the result is exact for all 0 <= a < Q. This collapses a five-op dependency chain (add, asr, mul, add, asr) into a single signed multiply-add and one shift. On Graviton2 the scalar a1 step is ~31% faster (1.39 ns -> 0.96 ns per call) for both parameter sets; the loop is scalar in practice because the ct value-barriers in mld_decompose block auto-vectorization. AArch64 and x86_64 builds use their native asm backends and are unaffected. Update the Isabelle attribution in compress/ML-DSA_Compress.thy and neon_ntt/Barrett_Division_Even.thy to group the C reference with the AArch64 backend (direct Barrett division) rather than with AVX2 (divide-by-128 first). The corollary names keep their `aarch64` suffix for stability; a note in the prose explains they cover the C reference as well. Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
9bb8f8d to
e5044f8
Compare
Replace the two-step Barrett division (ceil(a/128) then Barrett-divide
by 2GAMMA2/128) with a direct high multiplication by floor(2^N /
2GAMMA2), mirroring the AArch64 backend. For ML-DSA-44 this is
(a * 1477838209 + 2^47) >> 48; for ML-DSA-65/87 it is(a * 1074791425 + 2^48) >> 49. Both constants strictlyunder-approximate 1/(2*GAMMA2), so half-points round down, matching
the original round-half-down semantics, and the result is exact for
all 0 <= a < Q.
Update the Isabelle attribution in compress/ML-DSA_Compress.thy and
neon_ntt/Barrett_Division_Even.thy.
decomposeC implementation #652