Skip to content

x86_64: Replace rej_uniform_eta2/eta4 intrinsics with hand-written assembly#1188

Draft
jakemas wants to merge 1 commit into
mainfrom
jakemas/rej-uniform-eta-asm
Draft

x86_64: Replace rej_uniform_eta2/eta4 intrinsics with hand-written assembly#1188
jakemas wants to merge 1 commit into
mainfrom
jakemas/rej-uniform-eta-asm

Conversation

@jakemas

@jakemas jakemas commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Draft — opened while the HOL-Light proofs are still in progress.

Replaces the AVX2 intrinsics implementations of rej_uniform_eta2 and rej_uniform_eta4 with hand-written x86_64 assembly, following the same approach as #1014:

  • Table passed as a parameter (consistent with the aarch64 approach), avoiding external symbol references for simpasm compatibility.
  • All constants constructed from immediates (no .rodata), enabling future HOL-Light formal verification.
  • __contract__ annotations on the asm entry points (CBMC), to be kept in sync with the HOL-Light specs.
  • meta.h wires both eta2 and eta4 to the new asm, so the functional test suite exercises both paths.
  • scripts/autogen and the x86_64 HOL-Light Makefile register the eta2/eta4 bytecode dump targets.
  • Adds a poly_uniform_eta_4x component benchmark.

Scope of this draft

This draft contains assembly + build/bytecode infrastructure only. It intentionally excludes the HOL-Light .ml proofs, which are still being developed. The proofs will be added before this is marked ready for review.

TODO before ready-for-review

  • Add HOL-Light CORRECT (and MEMSAFE) proofs for eta2/eta4.
  • Performance numbers.

@jakemas jakemas force-pushed the jakemas/rej-uniform-eta-asm branch 5 times, most recently from f789e8d to 1e78719 Compare June 16, 2026 23:56
@oqs-bot

oqs-bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-87, REDUCE-RAM)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1755s 1573s +11.6%
mld_invntt_layer 196s 161s +22%
polyvec_matrix_pointwise_montgomery_yvec 177s 146s +21%
poly_pointwise_montgomery_c 156s 130s +20%
rej_uniform_native 143s 121s +18%
mld_ct_memcmp 78s 62s +26%
fqmul 49s 38s +29%
mld_ntt_layer 48s 41s +17%
mld_attempt_signature_generation 35s 32s +9%
sign_verify_internal 25s 23s +9%
keccakf1600x4_permute_native 24s 23s +4%
mld_ntt_butterfly_block 23s 18s +28%
poly_chknorm_c 21s 17s +24%
polyt0_unpack 17s 16s +6%
polyveck_decompose 17s 19s -11%
rej_uniform_c 17s 14s +21%
poly_uniform_eta_4x 16s 12s +33%
polyeta_unpack 15s 14s +7%
compute_pack_t0_t1 14s 10s +40%
polyvecl_chknorm 12s 11s +9%
mld_check_pct 11s 10s +10%
poly_add 11s 11s +0%
keccak_absorb_once_x4 9s 10s -10%
poly_invntt_tomont_c 9s 10s -10%
poly_power2round 9s 5s +80%
mld_sample_s1_s2_serial 8s 6s +33%
polyveck_caddq 8s 5s +60%
polyveck_invntt_tomont 8s 7s +14%
polyvecl_ntt 8s 7s +14%
sign 8s 7s +14%
sign_keypair_internal 8s 3s +167%
pointwise_acc_native_aarch64 7s 5s +40%
pointwise_acc_native_x86_64 7s 7s +0%
polyveck_reduce 7s 5s +40%
keccak_absorb 6s 7s -14%
keccak_squeezeblocks_x4 6s 4s +50%
mld_compute_pack_z 6s 3s +100%
mld_keccakf1600_permute_c 6s 7s -14%
mld_keccakf1600x4_extract_bytes_c 6s 3s +100%
mld_sample_s1_s2 6s 5s +20%
poly_ntt_native 6s 6s +0%
poly_shiftl 6s 5s +20%
polyvec_matrix_pointwise_montgomery_row 6s 7s -14%
polyz_unpack_19_native_aarch64 6s 2s +200%
polyz_unpack_c 6s 5s +20%
rej_uniform 6s 7s -14%
sign_open 6s 6s +0%
sign_signature_internal 6s 5s +20%
keccakf1600_permute 5s 2s +150%
keccakf1600x4_extract_bytes_native 5s 1s +400%
mld_h 5s 3s +67%
mld_polymat_expand_entry 5s 3s +67%
ntt_native_x86_64 5s 3s +67%
poly_chknorm_native_aarch64 5s 4s +25%
poly_invntt_tomont_native 5s 2s +150%
poly_sub 5s 5s +0%
poly_uniform_eta 5s 4s +25%
polyveck_chknorm 5s 5s +0%
rej_uniform_eta_native_aarch64 5s 4s +25%
sign_pk_from_sk 5s 8s -38%
sign_verify_extmu 5s 6s -17%
sign_verify_pre_hash_shake256 5s 4s +25%
sk_t0hat_get_poly 5s 3s +67%
decompose 4s 3s +33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 3s +33%
keccak_finalize 4s 3s +33%
keccak_squeeze 4s 3s +33%
keccakf1600x4_extract_bytes 4s 2s +100%
mld_ct_abs_i32 4s 3s +33%
mld_keccakf1600x4_xor_bytes_c 4s 1s +300%
mld_prepare_domain_separation_prefix 4s 5s -20%
ntt_native_aarch64 4s 4s +0%
nttunpack_native_x86_64 4s 2s +100%
pack_sig_h 4s 6s -33%
poly_caddq_c 4s 4s +0%
poly_caddq_native 4s 4s +0%
poly_caddq_native_x86_64 4s 3s +33%
poly_challenge 4s 5s -20%
poly_decompose 4s 2s +100%
poly_decompose_c 4s 4s +0%
poly_reduce 4s 3s +33%
poly_uniform_4x 4s 2s +100%
polyveck_ntt 4s 5s -20%
polyveck_pack_w1 4s 2s +100%
polyvecl_uniform_gamma1 4s 2s +100%
polyz_pack 4s 4s +0%
rej_eta_c 4s 4s +0%
shake128_absorb 4s 3s +33%
shake128x4_absorb_once 4s 4s +0%
sign_keypair 4s 6s -33%
sign_signature_extmu 4s 3s +33%
sign_signature_pre_hash_shake256 4s 3s +33%
sign_verify 4s 4s +0%
sign_verify_pre_hash_internal 4s 5s -20%
sk_s1hat_get_poly 4s 3s +33%
unpack_pk_t1 4s 3s +33%
yvec_get_poly 4s 3s +33%
caddq 3s 2s +50%
intt_native_x86_64 3s 3s +0%
keccak_f1600_x4_native_aarch64_v84a 3s 1s +200%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600_permute_native 3s 3s +0%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
keccakf1600x4_permute 3s 1s +200%
keccakf1600x4_xor_bytes 3s 3s +0%
mld_ct_cmask_nonzero_u32 3s 4s -25%
mld_ct_cmask_nonzero_u8 3s 2s +50%
mld_keccakf1600_extract_bytes 3s 1s +200%
pack_sig_z 3s 2s +50%
pack_sk_rho_key_tr_s2 3s 2s +50%
pointwise_native_x86_64 3s 3s +0%
poly_caddq 3s 3s +0%
poly_caddq_native_aarch64 3s 2s +50%
poly_chknorm 3s 2s +50%
poly_decompose_32_native_aarch64 3s 3s +0%
poly_decompose_88_native_aarch64 3s 5s -40%
poly_invntt_tomont 3s 3s +0%
poly_ntt_c 3s 1s +200%
poly_uniform 3s 5s -40%
poly_uniform_gamma1_4x 3s 2s +50%
poly_use_hint 3s 3s +0%
poly_use_hint_c 3s 3s +0%
poly_use_hint_native 3s 3s +0%
polyt1_pack 3s 6s -50%
polyvec_matrix_expand 3s 2s +50%
polyveck_pack_eta 3s 1s +200%
polyvecl_pack_eta 3s 2s +50%
polyvecl_pointwise_acc_montgomery_c 3s 3s +0%
polyvecl_uniform_gamma1_serial 3s 3s +0%
polyvecl_unpack_eta 3s 3s +0%
polyw1_pack_88 3s 4s -25%
polyz_unpack_native 3s 5s -40%
polyz_unpack_native_x86_64 3s 3s +0%
reduce32 3s 3s +0%
rej_eta 3s 3s +0%
rej_eta_native 3s 3s +0%
shake128_init 3s 4s -25%
shake128_release 3s 3s +0%
shake128x4_squeezeblocks 3s 2s +50%
shake256 3s 3s +0%
shake256_release 3s 1s +200%
shake256x4_absorb_once 3s 4s -25%
sig_unpack_hints 3s 2s +50%
sign_signature 3s 7s -57%
sign_signature_pre_hash_internal 3s 3s +0%
unpack_sk 3s 4s -25%
unpack_sk_s1hat 3s 1s +200%
unpack_sk_s2hat 3s 4s -25%
unpack_sk_t0hat 3s 4s -25%
use_hint 3s 2s +50%
yvec_init 3s 2s +50%
fqscale 2s 4s -50%
intt_native_aarch64 2s 4s -50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_avx2 2s 3s -33%
keccakf1600_xor_bytes 2s 1s +100%
make_hint 2s 2s +0%
mld_ct_get_optblocker_i64 2s 2s +0%
mld_ct_get_optblocker_u8 2s 3s -33%
mld_ct_sel_int32 2s 2s +0%
mld_value_barrier_u32 2s 6s -67%
mld_value_barrier_u8 2s 2s +0%
montgomery_reduce 2s 3s -33%
pack_sig_c 2s 3s -33%
pack_sk_s1 2s 4s -50%
pointwise_native_aarch64 2s 3s -33%
poly_chknorm_native 2s 2s +0%
poly_chknorm_native_x86_64 2s 3s -33%
poly_decompose_native 2s 3s -33%
poly_permute_bitrev_to_custom_optional 2s 5s -60%
poly_pointwise_montgomery_native 2s 3s -33%
poly_uniform_gamma1 2s 3s -33%
poly_use_hint_native_aarch64 2s 5s -60%
polyeta_pack 2s 5s -60%
polyt0_pack 2s 4s -50%
polyt1_unpack 2s 2s +0%
polyvec_matrix_expand_serial 2s 2s +0%
polyveck_unpack_eta 2s 2s +0%
polyvecl_pointwise_acc_montgomery_native 2s 4s -50%
polyvecl_unpack_z 2s 5s -60%
polyw1_pack 2s 3s -33%
polyz_unpack 2s 3s -33%
polyz_unpack_17_native_aarch64 2s 3s -33%
power2round 2s 3s -33%
rej_uniform_native_aarch64 2s 3s -33%
shake128_finalize 2s 3s -33%
shake128_squeeze 2s 2s +0%
shake256_absorb 2s 5s -60%
shake256_finalize 2s 4s -50%
shake256_init 2s 2s +0%
shake256_squeeze 2s 3s -33%
shake256x4_squeezeblocks 2s 2s +0%
sk_s2hat_get_poly 2s 1s +100%
keccak_f1600_x1_native_aarch64 1s 3s -67%
keccak_f1600_x1_native_aarch64_v84a 1s 3s -67%
keccak_init 1s 2s -50%
keccakf1600x4_xor_bytes_native 1s 4s -75%
mld_ct_cmask_neg_i32 1s 2s -50%
mld_ct_get_optblocker_u32 1s 1s +0%
mld_value_barrier_i64 1s 2s -50%
poly_ntt 1s 3s -67%
poly_permute_bitrev_to_custom_optional_native 1s 2s -50%
poly_pointwise_montgomery 1s 4s -75%
polyvecl_pointwise_acc_montgomery 1s 6s -83%
polyw1_pack_32 1s 3s -67%
sys_check_capability 1s 4s -75%

@oqs-bot

oqs-bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-65, REDUCE-RAM)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1535s 1546s -0.7%
mld_invntt_layer 172s 170s +1%
poly_pointwise_montgomery_c 139s 131s +6%
rej_uniform_native 129s 122s +6%
polyvec_matrix_pointwise_montgomery_yvec 83s 79s +5%
mld_ct_memcmp 70s 68s +3%
mld_ntt_layer 46s 44s +5%
fqmul 40s 42s -5%
polyveck_chknorm 37s 39s -5%
mld_attempt_signature_generation 24s 24s +0%
keccakf1600x4_permute_native 21s 22s -5%
mld_ntt_butterfly_block 21s 24s -12%
mld_check_pct 18s 16s +12%
poly_chknorm_c 18s 21s -14%
polyt0_unpack 16s 16s +0%
polyveck_decompose 16s 18s -11%
polyvecl_chknorm 15s 14s +7%
rej_uniform_c 15s 17s -12%
sign_verify_internal 15s 14s +7%
keccak_absorb_once_x4 12s 12s +0%
poly_uniform_eta_4x 12s 12s +0%
poly_add 11s 10s +10%
polyvec_matrix_pointwise_montgomery_row 9s 8s +12%
polyveck_invntt_tomont 9s 6s +50%
mld_keccakf1600_permute_c 8s 7s +14%
rej_uniform 8s 8s +0%
sign_pk_from_sk 8s 4s +100%
mld_sample_s1_s2 7s 5s +40%
poly_invntt_tomont_c 7s 9s -22%
polyvecl_ntt 7s 5s +40%
sign 7s 6s +17%
compute_pack_t0_t1 6s 7s -14%
polyeta_unpack 6s 6s +0%
polyveck_caddq 6s 6s +0%
keccak_absorb 5s 6s -17%
keccak_squeezeblocks_x4 5s 3s +67%
mld_compute_pack_z 5s 6s -17%
mld_keccakf1600x4_xor_bytes_c 5s 2s +150%
pointwise_acc_native_aarch64 5s 5s +0%
pointwise_acc_native_x86_64 5s 6s -17%
poly_caddq_native_aarch64 5s 2s +150%
poly_reduce 5s 5s +0%
poly_uniform 5s 3s +67%
poly_uniform_gamma1 5s 2s +150%
poly_use_hint_c 5s 4s +25%
poly_use_hint_native_aarch64 5s 3s +67%
polyveck_pack_eta 5s 3s +67%
polyz_unpack_c 5s 5s +0%
rej_eta_native 5s 6s -17%
sign_signature_pre_hash_internal 5s 4s +25%
sign_signature_pre_hash_shake256 5s 4s +25%
intt_native_x86_64 4s 4s +0%
keccak_init 4s 3s +33%
keccakf1600_xor_bytes (big endian) 4s 3s +33%
mld_ct_cmask_nonzero_u32 4s 3s +33%
mld_ct_cmask_nonzero_u8 4s 6s -33%
mld_h 4s 5s -20%
mld_sample_s1_s2_serial 4s 4s +0%
mld_value_barrier_u8 4s 2s +100%
montgomery_reduce 4s 4s +0%
ntt_native_aarch64 4s 4s +0%
pack_sk_s1 4s 2s +100%
poly_caddq_c 4s 4s +0%
poly_chknorm_native 4s 2s +100%
poly_chknorm_native_x86_64 4s 3s +33%
poly_decompose_32_native_aarch64 4s 3s +33%
poly_pointwise_montgomery 4s 3s +33%
poly_pointwise_montgomery_native 4s 3s +33%
poly_power2round 4s 6s -33%
poly_shiftl 4s 4s +0%
poly_uniform_eta 4s 4s +0%
polyt1_pack 4s 3s +33%
polyveck_reduce 4s 4s +0%
polyvecl_pack_eta 4s 3s +33%
polyvecl_uniform_gamma1_serial 4s 3s +33%
polyvecl_unpack_z 4s 4s +0%
polyw1_pack_32 4s 3s +33%
polyz_pack 4s 5s -20%
polyz_unpack_17_native_aarch64 4s 2s +100%
rej_eta_c 4s 4s +0%
rej_uniform_eta_native_aarch64 4s 4s +0%
rej_uniform_native_aarch64 4s 2s +100%
shake128x4_squeezeblocks 4s 3s +33%
sign_keypair 4s 6s -33%
sign_keypair_internal 4s 3s +33%
sign_open 4s 6s -33%
sign_signature_internal 4s 5s -20%
sign_verify_pre_hash_internal 4s 3s +33%
sk_s2hat_get_poly 4s 5s -20%
unpack_sk 4s 2s +100%
unpack_sk_s1hat 4s 3s +33%
decompose 3s 3s +0%
fqscale 3s 3s +0%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccak_f1600_x4_native_aarch64_v84a 3s 1s +200%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 2s +50%
keccak_squeeze 3s 2s +50%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600_xor_bytes 3s 2s +50%
keccakf1600x4_extract_bytes 3s 2s +50%
keccakf1600x4_extract_bytes_native 3s 5s -40%
keccakf1600x4_xor_bytes_native 3s 2s +50%
mld_ct_get_optblocker_u32 3s 5s -40%
mld_ct_get_optblocker_u8 3s 4s -25%
mld_keccakf1600_extract_bytes 3s 2s +50%
mld_polymat_expand_entry 3s 4s -25%
nttunpack_native_x86_64 3s 3s +0%
pack_sig_h 3s 2s +50%
pack_sk_rho_key_tr_s2 3s 4s -25%
pointwise_native_x86_64 3s 3s +0%
poly_caddq_native 3s 3s +0%
poly_caddq_native_x86_64 3s 3s +0%
poly_challenge 3s 4s -25%
poly_chknorm_native_aarch64 3s 2s +50%
poly_decompose 3s 2s +50%
poly_decompose_native 3s 2s +50%
poly_invntt_tomont 3s 3s +0%
poly_invntt_tomont_native 3s 4s -25%
poly_ntt 3s 2s +50%
poly_permute_bitrev_to_custom_optional 3s 3s +0%
poly_sub 3s 4s -25%
poly_uniform_4x 3s 3s +0%
poly_uniform_gamma1_4x 3s 2s +50%
poly_use_hint_native 3s 3s +0%
polyt0_pack 3s 3s +0%
polyveck_pack_w1 3s 2s +50%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyvecl_unpack_eta 3s 5s -40%
polyz_unpack 3s 3s +0%
polyz_unpack_native 3s 3s +0%
polyz_unpack_native_x86_64 3s 2s +50%
power2round 3s 4s -25%
rej_eta 3s 2s +50%
shake128_finalize 3s 3s +0%
shake128_squeeze 3s 2s +50%
shake128x4_absorb_once 3s 2s +50%
shake256_absorb 3s 2s +50%
shake256_finalize 3s 2s +50%
shake256_release 3s 2s +50%
shake256x4_absorb_once 3s 2s +50%
shake256x4_squeezeblocks 3s 4s -25%
sign_signature 3s 5s -40%
sign_verify 3s 6s -50%
sign_verify_pre_hash_shake256 3s 5s -40%
sys_check_capability 3s 2s +50%
unpack_pk_t1 3s 4s -25%
use_hint 3s 3s +0%
yvec_get_poly 3s 4s -25%
caddq 2s 4s -50%
intt_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 4s -50%
keccak_finalize 2s 5s -60%
keccakf1600_permute 2s 1s +100%
keccakf1600_permute_native 2s 4s -50%
keccakf1600x4_permute 2s 1s +100%
keccakf1600x4_xor_bytes 2s 2s +0%
make_hint 2s 1s +100%
mld_ct_abs_i32 2s 4s -50%
mld_ct_cmask_neg_i32 2s 1s +100%
mld_prepare_domain_separation_prefix 2s 5s -60%
pack_sig_z 2s 2s +0%
pointwise_native_aarch64 2s 4s -50%
poly_chknorm 2s 3s -33%
poly_decompose_c 2s 5s -60%
poly_ntt_c 2s 3s -33%
poly_ntt_native 2s 3s -33%
poly_permute_bitrev_to_custom_optional_native 2s 4s -50%
polyeta_pack 2s 3s -33%
polyt1_unpack 2s 5s -60%
polyveck_unpack_eta 2s 4s -50%
polyvecl_pointwise_acc_montgomery 2s 4s -50%
polyvecl_pointwise_acc_montgomery_c 2s 8s -75%
polyvecl_uniform_gamma1 2s 3s -33%
polyw1_pack 2s 2s +0%
polyw1_pack_88 2s 1s +100%
polyz_unpack_19_native_aarch64 2s 4s -50%
reduce32 2s 2s +0%
shake128_absorb 2s 1s +100%
shake128_init 2s 2s +0%
shake128_release 2s 1s +100%
shake256_init 2s 3s -33%
shake256_squeeze 2s 3s -33%
sig_unpack_hints 2s 3s -33%
sign_signature_extmu 2s 4s -50%
sign_verify_extmu 2s 2s +0%
sk_t0hat_get_poly 2s 2s +0%
unpack_sk_s2hat 2s 2s +0%
unpack_sk_t0hat 2s 3s -33%
yvec_init 2s 2s +0%
mld_ct_get_optblocker_i64 1s 4s -75%
mld_ct_sel_int32 1s 2s -50%
mld_keccakf1600x4_extract_bytes_c 1s 3s -67%
mld_value_barrier_i64 1s 2s -50%
mld_value_barrier_u32 1s 1s +0%
ntt_native_x86_64 1s 4s -75%
pack_sig_c 1s 2s -50%
poly_caddq 1s 3s -67%
poly_decompose_88_native_aarch64 1s 2s -50%
poly_use_hint 1s 4s -75%
polyvec_matrix_expand 1s 4s -75%
polyvec_matrix_expand_serial 1s 3s -67%
polyveck_ntt 1s 4s -75%
shake256 1s 2s -50%
sk_s1hat_get_poly 1s 5s -80%

@oqs-bot

oqs-bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-44, REDUCE-RAM)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1503s 1535s -2.1%
mld_invntt_layer 167s 165s +1%
poly_pointwise_montgomery_c 128s 129s -1%
rej_uniform_native 124s 123s +1%
polyvec_matrix_pointwise_montgomery_yvec 114s 116s -2%
mld_ct_memcmp 68s 68s +0%
mld_ntt_layer 42s 44s -5%
fqmul 41s 43s -5%
mld_attempt_signature_generation 28s 25s +12%
keccakf1600x4_permute_native 21s 24s -12%
sign_verify_internal 21s 23s -9%
mld_ntt_butterfly_block 20s 24s -17%
poly_chknorm_c 18s 18s +0%
polyeta_unpack 15s 17s -12%
poly_uniform_eta_4x 14s 13s +8%
polyt0_unpack 14s 17s -18%
rej_uniform_c 14s 15s -7%
mld_check_pct 12s 13s -8%
polyz_unpack_c 12s 11s +9%
poly_add 10s 11s -9%
compute_pack_t0_t1 9s 11s -18%
keccak_absorb_once_x4 9s 12s -25%
keccak_absorb 8s 6s +33%
mld_keccakf1600_permute_c 8s 5s +60%
poly_invntt_tomont_c 8s 10s -20%
polyvec_matrix_pointwise_montgomery_row 8s 9s -11%
polyveck_decompose 8s 5s +60%
rej_uniform 8s 7s +14%
mld_compute_pack_z 7s 5s +40%
polyveck_chknorm 7s 8s -12%
pack_sig_h 6s 4s +50%
pointwise_acc_native_x86_64 6s 5s +20%
poly_decompose_c 6s 7s -14%
poly_permute_bitrev_to_custom_optional 6s 3s +100%
polyveck_invntt_tomont 6s 4s +50%
polyvecl_chknorm 6s 6s +0%
polyvecl_pointwise_acc_montgomery_native 6s 5s +20%
sign 6s 7s -14%
intt_native_aarch64 5s 2s +150%
keccakf1600_xor_bytes (big endian) 5s 4s +25%
mld_ct_get_optblocker_i64 5s 2s +150%
nttunpack_native_x86_64 5s 3s +67%
pointwise_acc_native_aarch64 5s 6s -17%
poly_challenge 5s 5s +0%
poly_ntt_c 5s 5s +0%
poly_power2round 5s 5s +0%
poly_reduce 5s 5s +0%
polyt0_pack 5s 4s +25%
polyt1_unpack 5s 2s +150%
polyveck_reduce 5s 4s +25%
polyveck_unpack_eta 5s 2s +150%
polyw1_pack_88 5s 2s +150%
shake128x4_squeezeblocks 5s 1s +400%
sig_unpack_hints 5s 3s +67%
sign_keypair_internal 5s 6s -17%
sign_pk_from_sk 5s 3s +67%
sign_signature_extmu 5s 5s +0%
sign_signature_internal 5s 7s -29%
sign_signature_pre_hash_shake256 5s 7s -29%
decompose 4s 3s +33%
fqscale 4s 4s +0%
keccak_f1600_x1_native_aarch64 4s 1s +300%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 4s +0%
keccak_f1600_x4_native_avx2 4s 1s +300%
mld_ct_get_optblocker_u8 4s 4s +0%
mld_prepare_domain_separation_prefix 4s 4s +0%
montgomery_reduce 4s 3s +33%
pointwise_native_x86_64 4s 3s +33%
poly_caddq_native_aarch64 4s 2s +100%
poly_chknorm 4s 3s +33%
poly_decompose_32_native_aarch64 4s 4s +0%
poly_ntt_native 4s 4s +0%
polyvec_matrix_expand_serial 4s 3s +33%
polyveck_caddq 4s 3s +33%
polyveck_ntt 4s 1s +300%
polyveck_pack_w1 4s 3s +33%
polyvecl_ntt 4s 7s -43%
polyvecl_uniform_gamma1_serial 4s 3s +33%
reduce32 4s 6s -33%
shake256_squeeze 4s 2s +100%
shake256x4_squeezeblocks 4s 3s +33%
sign_keypair 4s 5s -20%
sign_open 4s 4s +0%
sign_signature 4s 2s +100%
sign_verify 4s 3s +33%
sign_verify_extmu 4s 3s +33%
sign_verify_pre_hash_internal 4s 6s -33%
sys_check_capability 4s 5s -20%
yvec_init 4s 4s +0%
intt_native_x86_64 3s 3s +0%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
keccakf1600_xor_bytes 3s 3s +0%
keccakf1600x4_extract_bytes_native 3s 4s -25%
keccakf1600x4_permute 3s 2s +50%
keccakf1600x4_xor_bytes 3s 4s -25%
keccakf1600x4_xor_bytes_native 3s 3s +0%
make_hint 3s 4s -25%
mld_ct_cmask_neg_i32 3s 1s +200%
mld_ct_cmask_nonzero_u32 3s 5s -40%
mld_h 3s 3s +0%
mld_keccakf1600_extract_bytes 3s 2s +50%
mld_keccakf1600x4_extract_bytes_c 3s 1s +200%
mld_value_barrier_i64 3s 1s +200%
mld_value_barrier_u32 3s 3s +0%
mld_value_barrier_u8 3s 1s +200%
ntt_native_aarch64 3s 2s +50%
ntt_native_x86_64 3s 3s +0%
pack_sig_z 3s 4s -25%
poly_caddq 3s 1s +200%
poly_caddq_c 3s 4s -25%
poly_caddq_native 3s 2s +50%
poly_caddq_native_x86_64 3s 2s +50%
poly_decompose 3s 3s +0%
poly_invntt_tomont 3s 4s -25%
poly_invntt_tomont_native 3s 2s +50%
poly_permute_bitrev_to_custom_optional_native 3s 2s +50%
poly_pointwise_montgomery_native 3s 6s -50%
poly_shiftl 3s 4s -25%
poly_sub 3s 3s +0%
poly_uniform_4x 3s 2s +50%
poly_uniform_eta 3s 4s -25%
poly_use_hint 3s 5s -40%
poly_use_hint_native 3s 4s -25%
poly_use_hint_native_aarch64 3s 3s +0%
polyeta_pack 3s 2s +50%
polyt1_pack 3s 2s +50%
polyvec_matrix_expand 3s 1s +200%
polyveck_pack_eta 3s 3s +0%
polyvecl_pointwise_acc_montgomery_c 3s 2s +50%
polyvecl_uniform_gamma1 3s 4s -25%
polyvecl_unpack_eta 3s 2s +50%
polyvecl_unpack_z 3s 2s +50%
polyz_unpack_17_native_aarch64 3s 4s -25%
polyz_unpack_19_native_aarch64 3s 4s -25%
rej_eta 3s 2s +50%
rej_eta_c 3s 4s -25%
rej_uniform_eta_native_aarch64 3s 4s -25%
shake256_absorb 3s 2s +50%
sign_verify_pre_hash_shake256 3s 3s +0%
sk_t0hat_get_poly 3s 3s +0%
unpack_sk_s2hat 3s 3s +0%
unpack_sk_t0hat 3s 3s +0%
caddq 2s 3s -33%
keccak_finalize 2s 2s +0%
keccak_init 2s 3s -33%
keccak_squeezeblocks_x4 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 3s -33%
keccakf1600_permute 2s 2s +0%
keccakf1600_permute_native 2s 3s -33%
mld_ct_get_optblocker_u32 2s 1s +100%
mld_ct_sel_int32 2s 1s +100%
mld_sample_s1_s2 2s 4s -50%
mld_sample_s1_s2_serial 2s 5s -60%
pack_sig_c 2s 2s +0%
pack_sk_s1 2s 2s +0%
pointwise_native_aarch64 2s 7s -71%
poly_chknorm_native 2s 2s +0%
poly_chknorm_native_x86_64 2s 3s -33%
poly_decompose_88_native_aarch64 2s 4s -50%
poly_decompose_native 2s 4s -50%
poly_ntt 2s 3s -33%
poly_uniform_gamma1_4x 2s 3s -33%
poly_use_hint_c 2s 4s -50%
polyvecl_pack_eta 2s 1s +100%
polyw1_pack 2s 2s +0%
polyw1_pack_32 2s 4s -50%
polyz_pack 2s 3s -33%
polyz_unpack 2s 1s +100%
polyz_unpack_native 2s 3s -33%
polyz_unpack_native_x86_64 2s 4s -50%
power2round 2s 2s +0%
rej_eta_native 2s 3s -33%
rej_uniform_native_aarch64 2s 4s -50%
shake128_finalize 2s 3s -33%
shake128_release 2s 1s +100%
shake128_squeeze 2s 2s +0%
shake256 2s 2s +0%
shake256_finalize 2s 1s +100%
shake256_init 2s 2s +0%
shake256x4_absorb_once 2s 1s +100%
sign_signature_pre_hash_internal 2s 3s -33%
sk_s2hat_get_poly 2s 2s +0%
unpack_pk_t1 2s 3s -33%
unpack_sk_s1hat 2s 3s -33%
use_hint 2s 7s -71%
yvec_get_poly 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 1s 2s -50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 1s 2s -50%
keccak_squeeze 1s 2s -50%
keccakf1600x4_extract_bytes 1s 2s -50%
mld_ct_abs_i32 1s 4s -75%
mld_ct_cmask_nonzero_u8 1s 2s -50%
mld_keccakf1600x4_xor_bytes_c 1s 2s -50%
mld_polymat_expand_entry 1s 5s -80%
pack_sk_rho_key_tr_s2 1s 3s -67%
poly_chknorm_native_aarch64 1s 2s -50%
poly_pointwise_montgomery 1s 3s -67%
poly_uniform 1s 3s -67%
poly_uniform_gamma1 1s 4s -75%
polyvecl_pointwise_acc_montgomery 1s 2s -50%
shake128_absorb 1s 4s -75%
shake128_init 1s 3s -67%
shake128x4_absorb_once 1s 2s -50%
shake256_release 1s 4s -75%
sk_s1hat_get_poly 1s 2s -50%
unpack_sk 1s 2s -50%

@oqs-bot

oqs-bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-65)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 2134s 2103s +1.5%
mld_invntt_layer 304s 302s +1%
polyvecl_pointwise_acc_montgomery_c 218s 204s +7%
rej_uniform_native 152s 145s +5%
polyvec_matrix_expand 130s 132s -2%
poly_pointwise_montgomery_c 102s 97s +5%
mld_ct_memcmp 72s 64s +12%
mld_attempt_signature_generation 68s 67s +1%
sign_verify_internal 67s 68s -1%
sign_signature_internal 47s 47s +0%
mld_ntt_layer 45s 44s +2%
fqmul 43s 42s +2%
polyvec_matrix_expand_serial 27s 25s +8%
mld_ntt_butterfly_block 23s 23s +0%
keccakf1600x4_permute_native 21s 22s -5%
poly_chknorm_c 21s 20s +5%
rej_uniform 20s 22s -9%
polyt0_unpack 18s 15s +20%
polyvecl_chknorm 18s 19s -5%
mld_check_pct 16s 16s +0%
compute_pack_t0_t1 15s 16s -6%
polyveck_decompose 14s 16s -12%
rej_uniform_c 14s 13s +8%
poly_uniform_eta_4x 13s 14s -7%
poly_uniform_4x 12s 13s -8%
poly_add 11s 12s -8%
polyvec_matrix_pointwise_montgomery_yvec 11s 10s +10%
mld_compute_pack_z 10s 8s +25%
pointwise_acc_native_x86_64 10s 5s +100%
polyveck_chknorm 10s 10s +0%
polyveck_invntt_tomont 10s 8s +25%
keccak_absorb_once_x4 9s 8s +12%
polyveck_ntt 9s 9s +0%
poly_invntt_tomont_c 8s 9s -11%
mld_keccakf1600_permute_c 7s 8s -12%
poly_challenge 7s 3s +133%
sign 7s 7s +0%
keccak_absorb 6s 7s -14%
keccakf1600x4_permute 6s 2s +200%
poly_caddq 6s 3s +100%
poly_uniform 6s 5s +20%
poly_uniform_gamma1_4x 6s 3s +100%
polyveck_pack_w1 6s 2s +200%
sign_keypair 6s 3s +100%
sign_signature_pre_hash_shake256 6s 2s +200%
keccak_squeezeblocks_x4 5s 6s -17%
pointwise_acc_native_aarch64 5s 4s +25%
poly_caddq_c 5s 4s +25%
poly_caddq_native_aarch64 5s 2s +150%
poly_chknorm_native_aarch64 5s 2s +150%
poly_decompose_native 5s 3s +67%
poly_uniform_gamma1 5s 2s +150%
polyveck_caddq 5s 8s -38%
polyvecl_ntt 5s 8s -38%
polyz_unpack_19_native_aarch64 5s 4s +25%
polyz_unpack_c 5s 7s -29%
polyz_unpack_native_x86_64 5s 2s +150%
rej_eta 5s 1s +400%
shake256x4_squeezeblocks 5s 3s +67%
sign_pk_from_sk 5s 6s -17%
sign_verify 5s 5s +0%
sign_verify_pre_hash_shake256 5s 3s +67%
unpack_sk_t0hat 5s 8s -38%
yvec_get_poly 5s 2s +150%
yvec_init 5s 4s +25%
keccakf1600_xor_bytes 4s 3s +33%
keccakf1600_xor_bytes (big endian) 4s 3s +33%
keccakf1600x4_extract_bytes 4s 2s +100%
keccakf1600x4_extract_bytes_native 4s 2s +100%
keccakf1600x4_xor_bytes_native 4s 3s +33%
mld_ct_cmask_neg_i32 4s 4s +0%
mld_ct_sel_int32 4s 3s +33%
mld_sample_s1_s2 4s 4s +0%
mld_sample_s1_s2_serial 4s 3s +33%
ntt_native_x86_64 4s 2s +100%
pack_sk_s1 4s 3s +33%
pointwise_native_aarch64 4s 2s +100%
poly_chknorm_native 4s 2s +100%
poly_decompose_c 4s 2s +100%
poly_invntt_tomont_native 4s 4s +0%
poly_ntt_c 4s 1s +300%
poly_uniform_eta 4s 3s +33%
poly_use_hint_native 4s 4s +0%
polyt1_unpack 4s 5s -20%
polyveck_reduce 4s 2s +100%
polyvecl_pointwise_acc_montgomery_native 4s 2s +100%
polyw1_pack_32 4s 5s -20%
polyw1_pack_88 4s 3s +33%
rej_uniform_native_aarch64 4s 4s +0%
shake256_finalize 4s 3s +33%
sig_unpack_hints 4s 3s +33%
sign_signature 4s 3s +33%
sign_verify_extmu 4s 4s +0%
sign_verify_pre_hash_internal 4s 4s +0%
sk_s2hat_get_poly 4s 3s +33%
sys_check_capability 4s 4s +0%
unpack_sk 4s 3s +33%
unpack_sk_s1hat 4s 3s +33%
caddq 3s 4s -25%
decompose 3s 2s +50%
fqscale 3s 7s -57%
intt_native_x86_64 3s 3s +0%
keccak_f1600_x1_native_aarch64 3s 3s +0%
keccak_f1600_x1_native_aarch64_v84a 3s 4s -25%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 4s -25%
make_hint 3s 4s -25%
mld_ct_abs_i32 3s 2s +50%
mld_ct_cmask_nonzero_u32 3s 1s +200%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_keccakf1600x4_extract_bytes_c 3s 3s +0%
mld_prepare_domain_separation_prefix 3s 5s -40%
ntt_native_aarch64 3s 3s +0%
nttunpack_native_x86_64 3s 2s +50%
pack_sig_c 3s 2s +50%
pack_sig_h 3s 4s -25%
pack_sig_z 3s 4s -25%
pointwise_native_x86_64 3s 5s -40%
poly_chknorm 3s 2s +50%
poly_chknorm_native_x86_64 3s 3s +0%
poly_decompose_32_native_aarch64 3s 2s +50%
poly_ntt 3s 6s -50%
poly_permute_bitrev_to_custom_optional 3s 3s +0%
poly_pointwise_montgomery_native 3s 3s +0%
poly_power2round 3s 6s -50%
poly_reduce 3s 7s -57%
poly_use_hint_c 3s 2s +50%
polyeta_unpack 3s 3s +0%
polyt0_pack 3s 6s -50%
polyvec_matrix_pointwise_montgomery_row 3s 3s +0%
polyveck_unpack_eta 3s 3s +0%
polyvecl_pack_eta 3s 4s -25%
polyvecl_uniform_gamma1 3s 4s -25%
polyw1_pack 3s 4s -25%
polyz_unpack_17_native_aarch64 3s 5s -40%
polyz_unpack_native 3s 3s +0%
power2round 3s 4s -25%
reduce32 3s 3s +0%
rej_uniform_eta_native_aarch64 3s 5s -40%
shake128_init 3s 3s +0%
shake128_squeeze 3s 2s +50%
shake256_absorb 3s 3s +0%
shake256_release 3s 3s +0%
shake256x4_absorb_once 3s 2s +50%
sign_keypair_internal 3s 3s +0%
sign_open 3s 5s -40%
sign_signature_extmu 3s 7s -57%
sign_signature_pre_hash_internal 3s 3s +0%
sk_s1hat_get_poly 3s 3s +0%
sk_t0hat_get_poly 3s 2s +50%
unpack_sk_s2hat 3s 4s -25%
use_hint 3s 4s -25%
intt_native_aarch64 2s 4s -50%
keccak_f1600_x4_native_aarch64_v84a 2s 5s -60%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_avx2 2s 1s +100%
keccak_finalize 2s 1s +100%
keccak_init 2s 2s +0%
keccak_squeeze 2s 4s -50%
keccakf1600_extract_bytes (big endian) 2s 3s -33%
keccakf1600_permute 2s 3s -33%
keccakf1600_permute_native 2s 6s -67%
keccakf1600x4_xor_bytes 2s 3s -33%
mld_ct_cmask_nonzero_u8 2s 3s -33%
mld_h 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_keccakf1600x4_xor_bytes_c 2s 3s -33%
mld_value_barrier_i64 2s 2s +0%
mld_value_barrier_u32 2s 1s +100%
mld_value_barrier_u8 2s 2s +0%
montgomery_reduce 2s 3s -33%
pack_sk_rho_key_tr_s2 2s 4s -50%
poly_caddq_native 2s 3s -33%
poly_caddq_native_x86_64 2s 3s -33%
poly_decompose 2s 4s -50%
poly_decompose_88_native_aarch64 2s 3s -33%
poly_invntt_tomont 2s 1s +100%
poly_ntt_native 2s 3s -33%
poly_permute_bitrev_to_custom_optional_native 2s 2s +0%
poly_pointwise_montgomery 2s 4s -50%
poly_shiftl 2s 2s +0%
poly_sub 2s 4s -50%
poly_use_hint 2s 3s -33%
poly_use_hint_native_aarch64 2s 4s -50%
polyeta_pack 2s 2s +0%
polyt1_pack 2s 3s -33%
polyveck_pack_eta 2s 4s -50%
polyvecl_pointwise_acc_montgomery 2s 5s -60%
polyvecl_unpack_eta 2s 2s +0%
polyvecl_unpack_z 2s 2s +0%
polyz_pack 2s 4s -50%
polyz_unpack 2s 4s -50%
rej_eta_c 2s 3s -33%
rej_eta_native 2s 3s -33%
shake128_absorb 2s 2s +0%
shake128_finalize 2s 3s -33%
shake128_release 2s 3s -33%
shake128x4_absorb_once 2s 1s +100%
shake128x4_squeezeblocks 2s 3s -33%
shake256_init 2s 3s -33%
unpack_pk_t1 2s 2s +0%
mld_ct_get_optblocker_u32 1s 2s -50%
mld_polymat_expand_entry 1s 2s -50%
polyvecl_uniform_gamma1_serial 1s 3s -67%
shake256 1s 2s -50%
shake256_squeeze 1s 3s -67%

@oqs-bot

oqs-bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-44)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1877s 1754s +7.0%
mld_invntt_layer 305s 287s +6%
rej_uniform_native 150s 144s +4%
polyvecl_pointwise_acc_montgomery_c 125s 111s +13%
poly_pointwise_montgomery_c 107s 94s +14%
mld_ct_memcmp 72s 67s +7%
mld_attempt_signature_generation 64s 62s +3%
mld_ntt_layer 45s 42s +7%
fqmul 42s 42s +0%
polyvec_matrix_expand 28s 27s +4%
mld_ntt_butterfly_block 25s 21s +19%
sign_verify_internal 25s 28s -11%
keccakf1600x4_permute_native 24s 23s +4%
rej_uniform 24s 20s +20%
sign_signature_internal 20s 21s -5%
poly_chknorm_c 19s 20s -5%
polyt0_unpack 19s 17s +12%
mld_check_pct 16s 15s +7%
polyeta_unpack 16s 15s +7%
rej_uniform_c 16s 13s +23%
compute_pack_t0_t1 13s 15s -13%
polyveck_chknorm 13s 10s +30%
polyvec_matrix_pointwise_montgomery_yvec 12s 11s +9%
polyz_unpack_c 12s 12s +0%
poly_add 11s 10s +10%
poly_uniform_4x 11s 13s -15%
poly_uniform_eta_4x 11s 11s +0%
polyveck_decompose 11s 8s +38%
keccak_absorb_once_x4 9s 8s +12%
mld_compute_pack_z 9s 8s +12%
poly_invntt_tomont_c 8s 10s -20%
mld_keccakf1600_permute_c 7s 7s +0%
poly_uniform_gamma1_4x 7s 3s +133%
poly_use_hint_c 7s 5s +40%
polyvec_matrix_expand_serial 7s 7s +0%
polyveck_ntt 7s 4s +75%
sign 7s 7s +0%
keccak_squeezeblocks_x4 6s 2s +200%
pointwise_acc_native_x86_64 6s 6s +0%
poly_caddq_c 6s 8s -25%
poly_pointwise_montgomery_native 6s 2s +200%
poly_use_hint_native 6s 3s +100%
polyvecl_ntt 6s 7s -14%
shake128_finalize 6s 2s +200%
sign_pk_from_sk 6s 6s +0%
sign_signature 6s 3s +100%
yvec_get_poly 6s 2s +200%
keccak_absorb 5s 8s -38%
mld_keccakf1600x4_extract_bytes_c 5s 2s +150%
ntt_native_x86_64 5s 3s +67%
pack_sig_h 5s 2s +150%
pack_sig_z 5s 3s +67%
pointwise_acc_native_aarch64 5s 7s -29%
poly_caddq 5s 2s +150%
poly_caddq_native 5s 3s +67%
poly_caddq_native_aarch64 5s 3s +67%
poly_caddq_native_x86_64 5s 4s +25%
poly_challenge 5s 5s +0%
poly_ntt 5s 3s +67%
poly_shiftl 5s 3s +67%
poly_uniform_gamma1 5s 4s +25%
polyt0_pack 5s 4s +25%
polyvecl_unpack_z 5s 2s +150%
polyw1_pack_32 5s 2s +150%
polyz_unpack_native 5s 3s +67%
rej_eta_c 5s 1s +400%
rej_uniform_eta_native_aarch64 5s 5s +0%
sign_keypair 5s 5s +0%
sign_verify_extmu 5s 8s -38%
unpack_sk_s1hat 5s 2s +150%
yvec_init 5s 2s +150%
caddq 4s 2s +100%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 4s +0%
keccakf1600x4_xor_bytes 4s 2s +100%
keccakf1600x4_xor_bytes_native 4s 4s +0%
mld_ct_cmask_nonzero_u32 4s 4s +0%
mld_h 4s 4s +0%
mld_keccakf1600x4_xor_bytes_c 4s 2s +100%
mld_polymat_expand_entry 4s 3s +33%
mld_prepare_domain_separation_prefix 4s 4s +0%
mld_sample_s1_s2_serial 4s 4s +0%
montgomery_reduce 4s 3s +33%
poly_chknorm 4s 2s +100%
poly_chknorm_native_x86_64 4s 1s +300%
poly_decompose 4s 4s +0%
poly_permute_bitrev_to_custom_optional 4s 3s +33%
poly_power2round 4s 4s +0%
poly_sub 4s 4s +0%
poly_uniform 4s 4s +0%
poly_uniform_eta 4s 5s -20%
poly_use_hint 4s 3s +33%
polyvec_matrix_pointwise_montgomery_row 4s 3s +33%
polyveck_caddq 4s 4s +0%
polyveck_invntt_tomont 4s 6s -33%
polyveck_pack_eta 4s 4s +0%
polyveck_pack_w1 4s 3s +33%
polyvecl_chknorm 4s 2s +100%
polyvecl_pointwise_acc_montgomery_native 4s 2s +100%
power2round 4s 4s +0%
rej_eta_native 4s 4s +0%
shake128_squeeze 4s 2s +100%
shake128x4_absorb_once 4s 2s +100%
shake256_absorb 4s 1s +300%
sign_open 4s 5s -20%
sign_signature_pre_hash_shake256 4s 5s -20%
sys_check_capability 4s 2s +100%
unpack_pk_t1 4s 2s +100%
fqscale 3s 4s -25%
intt_native_aarch64 3s 4s -25%
intt_native_x86_64 3s 3s +0%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 2s +50%
keccak_finalize 3s 1s +200%
keccak_squeeze 3s 3s +0%
keccakf1600_permute 3s 1s +200%
keccakf1600_xor_bytes (big endian) 3s 1s +200%
keccakf1600x4_extract_bytes 3s 2s +50%
keccakf1600x4_extract_bytes_native 3s 2s +50%
make_hint 3s 1s +200%
mld_ct_abs_i32 3s 3s +0%
mld_ct_cmask_neg_i32 3s 4s -25%
mld_ct_cmask_nonzero_u8 3s 2s +50%
mld_ct_get_optblocker_i64 3s 1s +200%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_ct_sel_int32 3s 1s +200%
mld_sample_s1_s2 3s 4s -25%
nttunpack_native_x86_64 3s 4s -25%
pack_sig_c 3s 3s +0%
pack_sk_s1 3s 3s +0%
pointwise_native_x86_64 3s 4s -25%
poly_chknorm_native 3s 4s -25%
poly_decompose_88_native_aarch64 3s 2s +50%
poly_decompose_native 3s 3s +0%
poly_ntt_native 3s 2s +50%
poly_pointwise_montgomery 3s 4s -25%
poly_reduce 3s 2s +50%
poly_use_hint_native_aarch64 3s 2s +50%
polyeta_pack 3s 3s +0%
polyt1_pack 3s 4s -25%
polyveck_unpack_eta 3s 3s +0%
polyvecl_uniform_gamma1_serial 3s 2s +50%
polyvecl_unpack_eta 3s 3s +0%
polyw1_pack 3s 6s -50%
polyw1_pack_88 3s 3s +0%
polyz_pack 3s 2s +50%
polyz_unpack 3s 4s -25%
polyz_unpack_17_native_aarch64 3s 3s +0%
reduce32 3s 3s +0%
rej_uniform_native_aarch64 3s 4s -25%
shake128_release 3s 3s +0%
shake128x4_squeezeblocks 3s 2s +50%
shake256_finalize 3s 3s +0%
shake256_init 3s 5s -40%
sig_unpack_hints 3s 5s -40%
sign_keypair_internal 3s 5s -40%
sign_signature_extmu 3s 5s -40%
sign_signature_pre_hash_internal 3s 3s +0%
sign_verify 3s 4s -25%
sign_verify_pre_hash_internal 3s 3s +0%
sk_s1hat_get_poly 3s 4s -25%
unpack_sk 3s 3s +0%
use_hint 3s 4s -25%
decompose 2s 6s -67%
keccak_f1600_x1_native_aarch64_v84a 2s 1s +100%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_init 2s 2s +0%
keccakf1600_extract_bytes (big endian) 2s 2s +0%
keccakf1600_permute_native 2s 2s +0%
keccakf1600x4_permute 2s 2s +0%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_value_barrier_i64 2s 2s +0%
mld_value_barrier_u8 2s 2s +0%
ntt_native_aarch64 2s 3s -33%
pointwise_native_aarch64 2s 4s -50%
poly_chknorm_native_aarch64 2s 2s +0%
poly_decompose_32_native_aarch64 2s 3s -33%
poly_decompose_c 2s 2s +0%
poly_invntt_tomont 2s 2s +0%
poly_invntt_tomont_native 2s 3s -33%
poly_ntt_c 2s 2s +0%
poly_permute_bitrev_to_custom_optional_native 2s 4s -50%
polyt1_unpack 2s 2s +0%
polyveck_reduce 2s 4s -50%
polyvecl_pack_eta 2s 3s -33%
polyvecl_pointwise_acc_montgomery 2s 4s -50%
polyvecl_uniform_gamma1 2s 3s -33%
polyz_unpack_19_native_aarch64 2s 4s -50%
polyz_unpack_native_x86_64 2s 4s -50%
rej_eta 2s 2s +0%
shake128_absorb 2s 3s -33%
shake128_init 2s 2s +0%
shake256 2s 3s -33%
shake256_release 2s 3s -33%
shake256_squeeze 2s 3s -33%
shake256x4_absorb_once 2s 3s -33%
shake256x4_squeezeblocks 2s 2s +0%
sign_verify_pre_hash_shake256 2s 7s -71%
sk_s2hat_get_poly 2s 3s -33%
sk_t0hat_get_poly 2s 2s +0%
unpack_sk_s2hat 2s 4s -50%
unpack_sk_t0hat 2s 3s -33%
keccak_f1600_x4_native_avx2 1s 3s -67%
keccakf1600_xor_bytes 1s 3s -67%
mld_value_barrier_u32 1s 3s -67%
pack_sk_rho_key_tr_s2 1s 2s -50%

@oqs-bot

oqs-bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-87)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 2490s 2450s +1.6%
polyvecl_pointwise_acc_montgomery_c 337s 323s +4%
mld_invntt_layer 308s 307s +0%
polyvec_matrix_expand 234s 228s +3%
rej_uniform_native 151s 151s +0%
mld_attempt_signature_generation 111s 105s +6%
poly_pointwise_montgomery_c 95s 104s -9%
mld_ct_memcmp 70s 71s -1%
sign_signature_internal 70s 66s +6%
sign_verify_internal 60s 60s +0%
polyvec_matrix_expand_serial 50s 48s +4%
mld_ntt_layer 45s 45s +0%
fqmul 42s 43s -2%
polyvec_matrix_pointwise_montgomery_yvec 32s 30s +7%
compute_pack_t0_t1 31s 37s -16%
mld_ntt_butterfly_block 23s 24s -4%
keccakf1600x4_permute_native 22s 21s +5%
rej_uniform 22s 22s +0%
poly_chknorm_c 21s 23s -9%
polyeta_unpack 18s 16s +12%
polyt0_unpack 17s 18s -6%
rej_uniform_c 16s 14s +14%
mld_check_pct 15s 14s +7%
poly_add 11s 11s +0%
poly_uniform_4x 11s 11s +0%
poly_uniform_eta_4x 11s 15s -27%
poly_decompose_c 10s 5s +100%
polyveck_decompose 10s 10s +0%
keccak_absorb_once_x4 9s 9s +0%
polyvecl_ntt 9s 9s +0%
mld_compute_pack_z 8s 8s +0%
mld_keccakf1600_permute_c 8s 6s +33%
mld_sample_s1_s2 8s 7s +14%
pointwise_acc_native_x86_64 8s 8s +0%
sign 8s 10s -20%
sign_pk_from_sk 8s 7s +14%
keccak_absorb 7s 8s -12%
poly_caddq_c 7s 7s +0%
poly_invntt_tomont_c 7s 11s -36%
polyveck_invntt_tomont 7s 6s +17%
polyveck_ntt 7s 6s +17%
keccak_squeezeblocks_x4 6s 5s +20%
keccakf1600x4_permute 6s 4s +50%
mld_sample_s1_s2_serial 6s 8s -25%
pointwise_acc_native_aarch64 6s 6s +0%
poly_caddq_native_x86_64 6s 5s +20%
poly_decompose 6s 4s +50%
poly_invntt_tomont 6s 3s +100%
poly_power2round 6s 3s +100%
polyveck_caddq 6s 6s +0%
polyveck_chknorm 6s 6s +0%
polyvecl_chknorm 6s 6s +0%
sign_signature 6s 3s +100%
sign_verify_pre_hash_shake256 6s 3s +100%
unpack_sk_t0hat 6s 7s -14%
decompose 5s 4s +25%
mld_ct_cmask_nonzero_u8 5s 3s +67%
ntt_native_x86_64 5s 4s +25%
poly_ntt_c 5s 2s +150%
poly_reduce 5s 3s +67%
polyt0_pack 5s 4s +25%
polyvecl_unpack_z 5s 2s +150%
polyz_unpack 5s 4s +25%
polyz_unpack_17_native_aarch64 5s 3s +67%
polyz_unpack_19_native_aarch64 5s 4s +25%
polyz_unpack_c 5s 7s -29%
rej_eta 5s 1s +400%
rej_uniform_eta_native_aarch64 5s 3s +67%
sign_open 5s 2s +150%
sign_signature_extmu 5s 2s +150%
sign_signature_pre_hash_shake256 5s 6s -17%
keccak_f1600_x1_native_aarch64 4s 3s +33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 4s 3s +33%
keccak_finalize 4s 2s +100%
keccak_squeeze 4s 2s +100%
keccakf1600x4_xor_bytes 4s 3s +33%
make_hint 4s 4s +0%
mld_h 4s 3s +33%
montgomery_reduce 4s 2s +100%
nttunpack_native_x86_64 4s 5s -20%
pack_sig_c 4s 2s +100%
pack_sig_z 4s 4s +0%
pack_sk_rho_key_tr_s2 4s 2s +100%
pack_sk_s1 4s 2s +100%
pointwise_native_aarch64 4s 3s +33%
pointwise_native_x86_64 4s 5s -20%
poly_caddq_native 4s 2s +100%
poly_challenge 4s 5s -20%
poly_chknorm 4s 2s +100%
poly_chknorm_native 4s 3s +33%
poly_decompose_32_native_aarch64 4s 3s +33%
poly_decompose_88_native_aarch64 4s 2s +100%
poly_decompose_native 4s 5s -20%
poly_uniform 4s 3s +33%
poly_use_hint_c 4s 2s +100%
polyeta_pack 4s 3s +33%
polyt1_unpack 4s 4s +0%
polyveck_pack_eta 4s 4s +0%
polyvecl_uniform_gamma1 4s 4s +0%
polyw1_pack_88 4s 3s +33%
polyz_unpack_native_x86_64 4s 3s +33%
rej_uniform_native_aarch64 4s 3s +33%
shake128_finalize 4s 2s +100%
shake128x4_absorb_once 4s 4s +0%
shake256 4s 3s +33%
sign_keypair_internal 4s 5s -20%
sign_signature_pre_hash_internal 4s 2s +100%
sk_t0hat_get_poly 4s 5s -20%
unpack_sk_s2hat 4s 4s +0%
yvec_get_poly 4s 3s +33%
intt_native_x86_64 3s 4s -25%
keccak_f1600_x4_native_aarch64_v84a 3s 5s -40%
keccak_f1600_x4_native_avx2 3s 2s +50%
keccakf1600_permute 3s 3s +0%
keccakf1600_xor_bytes 3s 1s +200%
keccakf1600x4_extract_bytes_native 3s 2s +50%
keccakf1600x4_xor_bytes_native 3s 2s +50%
mld_ct_cmask_nonzero_u32 3s 5s -40%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_value_barrier_u32 3s 3s +0%
ntt_native_aarch64 3s 4s -25%
poly_caddq 3s 2s +50%
poly_chknorm_native_aarch64 3s 3s +0%
poly_chknorm_native_x86_64 3s 2s +50%
poly_invntt_tomont_native 3s 3s +0%
poly_permute_bitrev_to_custom_optional 3s 3s +0%
poly_permute_bitrev_to_custom_optional_native 3s 4s -25%
poly_pointwise_montgomery_native 3s 4s -25%
poly_shiftl 3s 3s +0%
poly_sub 3s 3s +0%
poly_uniform_eta 3s 3s +0%
poly_uniform_gamma1 3s 2s +50%
poly_use_hint 3s 3s +0%
poly_use_hint_native_aarch64 3s 3s +0%
polyt1_pack 3s 2s +50%
polyveck_pack_w1 3s 2s +50%
polyvecl_pack_eta 3s 2s +50%
polyvecl_pointwise_acc_montgomery 3s 3s +0%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyw1_pack 3s 3s +0%
polyw1_pack_32 3s 4s -25%
polyz_unpack_native 3s 3s +0%
power2round 3s 4s -25%
shake128_release 3s 2s +50%
shake256_release 3s 3s +0%
shake256_squeeze 3s 1s +200%
shake256x4_squeezeblocks 3s 2s +50%
sign_keypair 3s 6s -50%
sign_verify_extmu 3s 4s -25%
unpack_sk 3s 6s -50%
unpack_sk_s1hat 3s 2s +50%
yvec_init 3s 3s +0%
caddq 2s 4s -50%
fqscale 2s 1s +100%
intt_native_aarch64 2s 2s +0%
keccak_f1600_x1_native_aarch64_v84a 2s 1s +100%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 3s -33%
keccak_init 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 4s -50%
keccakf1600_permute_native 2s 3s -33%
keccakf1600x4_extract_bytes 2s 2s +0%
mld_ct_abs_i32 2s 3s -33%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_ct_sel_int32 2s 1s +100%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_keccakf1600x4_extract_bytes_c 2s 2s +0%
mld_keccakf1600x4_xor_bytes_c 2s 2s +0%
mld_polymat_expand_entry 2s 2s +0%
mld_prepare_domain_separation_prefix 2s 3s -33%
mld_value_barrier_u8 2s 3s -33%
poly_caddq_native_aarch64 2s 4s -50%
poly_ntt 2s 3s -33%
poly_ntt_native 2s 3s -33%
poly_pointwise_montgomery 2s 5s -60%
poly_use_hint_native 2s 4s -50%
polyvec_matrix_pointwise_montgomery_row 2s 6s -67%
polyveck_unpack_eta 2s 2s +0%
polyvecl_uniform_gamma1_serial 2s 4s -50%
polyvecl_unpack_eta 2s 2s +0%
polyz_pack 2s 4s -50%
rej_eta_c 2s 4s -50%
rej_eta_native 2s 4s -50%
shake128_absorb 2s 1s +100%
shake128_init 2s 2s +0%
shake128_squeeze 2s 1s +100%
shake128x4_squeezeblocks 2s 2s +0%
shake256_finalize 2s 1s +100%
shake256_init 2s 2s +0%
shake256x4_absorb_once 2s 4s -50%
sign_verify 2s 4s -50%
sign_verify_pre_hash_internal 2s 2s +0%
sk_s2hat_get_poly 2s 2s +0%
sys_check_capability 2s 3s -33%
unpack_pk_t1 2s 3s -33%
use_hint 2s 5s -60%
keccakf1600_xor_bytes (big endian) 1s 2s -50%
mld_value_barrier_i64 1s 4s -75%
pack_sig_h 1s 3s -67%
poly_uniform_gamma1_4x 1s 2s -50%
polyveck_reduce 1s 4s -75%
reduce32 1s 1s +0%
shake256_absorb 1s 5s -80%
sig_unpack_hints 1s 3s -67%
sk_s1hat_get_poly 1s 3s -67%

…sembly

Add hand-written x86_64 AVX2 assembly for rej_uniform_eta2 and
rej_uniform_eta4 and remove the AVX2 intrinsics implementations they
replace, following the rej_uniform approach in #1014: the table is
passed as a parameter and all constants are built from immediates (no
.rodata), enabling future HOL-Light verification. Both eta2 and eta4 are
wired to the new asm in meta.h, with contracts in arith_native_x86_64.h,
bytecode dump targets in autogen and the Makefile, and a
poly_uniform_eta_4x component benchmark.

The asm entry points are declared MLD_SYSV_ABI (like the other x86_64 asm
routines) so they are called with the System V register convention on all
platforms, including Windows/MinGW. The endbr64 is emitted via
MLD_ASM_FN_SYMBOL (CET-gated) rather than a raw mnemonic, so older
assemblers (e.g. clang-6) build cleanly.

The eta2 vector path applies the centered mod-5 reduction to (2 - nibble)
directly (matching the reference), rather than reducing the raw nibble and
subtracting afterwards; the two are not equivalent because vpmulhrsw rounds
to nearest. Verified against the ACVP keyGen vectors for all parameter sets.

Signed-off-by: jake massimo <jakemas@amazon.com>
@jakemas jakemas force-pushed the jakemas/rej-uniform-eta-asm branch from 1e78719 to 7c13c63 Compare June 17, 2026 01:36

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanno-becker / @mkannwischer would it be possible to get review on dev/x86_64/src/rej_uniform_eta{2/4}_avx2_asm.S as a preliminary pass, before I start the hol-light proofs. This should help prevent too much proof churn, should any large changes be requested in review. Ideally, the asm would be finalized in review first, and then I'll mark the PR ready once the proofs are complete.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Thanks! Will look later today.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 46503 cycles 46534 cycles 1.00
ML-DSA-44 sign 131074 cycles 131065 cycles 1.00
ML-DSA-44 verify 47313 cycles 47344 cycles 1.00
ML-DSA-65 keypair 81687 cycles 81689 cycles 1.00
ML-DSA-65 sign 215293 cycles 215304 cycles 1.00
ML-DSA-65 verify 79298 cycles 79301 cycles 1.00
ML-DSA-87 keypair 132402 cycles 132414 cycles 1.00
ML-DSA-87 sign 277403 cycles 277299 cycles 1.00
ML-DSA-87 verify 134049 cycles 134048 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 112805 cycles 112744 cycles 1.00
ML-DSA-44 sign 401076 cycles 400857 cycles 1.00
ML-DSA-44 verify 119486 cycles 119422 cycles 1.00
ML-DSA-65 keypair 193004 cycles 192931 cycles 1.00
ML-DSA-65 sign 650034 cycles 649964 cycles 1.00
ML-DSA-65 verify 192887 cycles 192858 cycles 1.00
ML-DSA-87 keypair 318732 cycles 318783 cycles 1.00
ML-DSA-87 sign 828739 cycles 828685 cycles 1.00
ML-DSA-87 verify 326671 cycles 326704 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 227496 cycles 219616 cycles 1.04
ML-DSA-44 sign 621038 cycles 608061 cycles 1.02
ML-DSA-44 verify 235419 cycles 214195 cycles 1.10
ML-DSA-65 keypair 388555 cycles 394028 cycles 0.99
ML-DSA-65 sign 1008940 cycles 1005239 cycles 1.00
ML-DSA-65 verify 369157 cycles 378176 cycles 0.98
ML-DSA-87 keypair 661599 cycles 635525 cycles 1.04
ML-DSA-87 sign 1369715 cycles 1306530 cycles 1.05
ML-DSA-87 verify 645122 cycles 616298 cycles 1.05

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 227496 cycles 219616 cycles 1.04
ML-DSA-44 verify 235419 cycles 214195 cycles 1.10
ML-DSA-87 keypair 661599 cycles 635525 cycles 1.04
ML-DSA-87 sign 1369715 cycles 1306530 cycles 1.05
ML-DSA-87 verify 645122 cycles 616298 cycles 1.05

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 43382 cycles 43353 cycles 1.00
ML-DSA-44 sign 130806 cycles 130679 cycles 1.00
ML-DSA-44 verify 45264 cycles 45263 cycles 1.00
ML-DSA-65 keypair 75789 cycles 75765 cycles 1.00
ML-DSA-65 sign 214746 cycles 214616 cycles 1.00
ML-DSA-65 verify 74389 cycles 74462 cycles 1.00
ML-DSA-87 keypair 123059 cycles 123089 cycles 1.00
ML-DSA-87 sign 271462 cycles 270931 cycles 1.00
ML-DSA-87 verify 120590 cycles 120473 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 91550 cycles 91474 cycles 1.00
ML-DSA-44 sign 352057 cycles 352402 cycles 1.00
ML-DSA-44 verify 99895 cycles 99810 cycles 1.00
ML-DSA-65 keypair 153878 cycles 153977 cycles 1.00
ML-DSA-65 sign 571455 cycles 571817 cycles 1.00
ML-DSA-65 verify 159829 cycles 159916 cycles 1.00
ML-DSA-87 keypair 255292 cycles 255126 cycles 1.00
ML-DSA-87 sign 726015 cycles 725242 cycles 1.00
ML-DSA-87 verify 263880 cycles 263738 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 55826 cycles 55161 cycles 1.01
ML-DSA-44 sign 159302 cycles 159029 cycles 1.00
ML-DSA-44 verify 57714 cycles 57785 cycles 1.00
ML-DSA-65 keypair 96776 cycles 95694 cycles 1.01
ML-DSA-65 sign 261086 cycles 260702 cycles 1.00
ML-DSA-65 verify 96269 cycles 95964 cycles 1.00
ML-DSA-87 keypair 155272 cycles 154489 cycles 1.01
ML-DSA-87 sign 323671 cycles 322750 cycles 1.00
ML-DSA-87 verify 152704 cycles 151028 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 312772 cycles 294913 cycles 1.06
ML-DSA-44 sign 1193382 cycles 1147432 cycles 1.04
ML-DSA-44 verify 332634 cycles 329463 cycles 1.01
ML-DSA-65 keypair 560095 cycles 553429 cycles 1.01
ML-DSA-65 sign 1913706 cycles 1874143 cycles 1.02
ML-DSA-65 verify 536753 cycles 533010 cycles 1.01
ML-DSA-87 keypair 904673 cycles 847930 cycles 1.07
ML-DSA-87 sign 2468732 cycles 2393137 cycles 1.03
ML-DSA-87 verify 906571 cycles 874118 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 312772 cycles 294913 cycles 1.06
ML-DSA-44 sign 1193382 cycles 1147432 cycles 1.04
ML-DSA-87 keypair 904673 cycles 847930 cycles 1.07
ML-DSA-87 sign 2468732 cycles 2393137 cycles 1.03
ML-DSA-87 verify 906571 cycles 874118 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 133408 cycles 133059 cycles 1.00
ML-DSA-44 sign 520236 cycles 518626 cycles 1.00
ML-DSA-44 verify 147060 cycles 146355 cycles 1.00
ML-DSA-65 keypair 225079 cycles 225679 cycles 1.00
ML-DSA-65 sign 847135 cycles 848010 cycles 1.00
ML-DSA-65 verify 235456 cycles 235882 cycles 1.00
ML-DSA-87 keypair 370475 cycles 367342 cycles 1.01
ML-DSA-87 sign 1069404 cycles 1059451 cycles 1.01
ML-DSA-87 verify 384222 cycles 381076 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 112426 cycles 112391 cycles 1.00
ML-DSA-44 sign 353829 cycles 353871 cycles 1.00
ML-DSA-44 verify 117232 cycles 117222 cycles 1.00
ML-DSA-65 keypair 194641 cycles 194637 cycles 1.00
ML-DSA-65 sign 584354 cycles 584454 cycles 1.00
ML-DSA-65 verify 193206 cycles 193179 cycles 1.00
ML-DSA-87 keypair 320931 cycles 320866 cycles 1.00
ML-DSA-87 sign 746952 cycles 746603 cycles 1.00
ML-DSA-87 verify 318576 cycles 318613 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 67239 cycles 67494 cycles 1.00
ML-DSA-44 sign 198398 cycles 198222 cycles 1.00
ML-DSA-44 verify 70268 cycles 70154 cycles 1.00
ML-DSA-65 keypair 119318 cycles 119283 cycles 1.00
ML-DSA-65 sign 325812 cycles 326057 cycles 1.00
ML-DSA-65 verify 116825 cycles 116834 cycles 1.00
ML-DSA-87 keypair 196716 cycles 196570 cycles 1.00
ML-DSA-87 sign 422203 cycles 421915 cycles 1.00
ML-DSA-87 verify 193443 cycles 193341 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 46849 cycles 46887 cycles 1.00
ML-DSA-44 sign 139191 cycles 138924 cycles 1.00
ML-DSA-44 verify 49276 cycles 49405 cycles 1.00
ML-DSA-65 keypair 82925 cycles 83264 cycles 1.00
ML-DSA-65 sign 227212 cycles 227398 cycles 1.00
ML-DSA-65 verify 82484 cycles 82718 cycles 1.00
ML-DSA-87 keypair 129318 cycles 129835 cycles 1.00
ML-DSA-87 sign 281021 cycles 284510 cycles 0.99
ML-DSA-87 verify 128778 cycles 130913 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 127640 cycles 127662 cycles 1.00
ML-DSA-44 sign 441226 cycles 441153 cycles 1.00
ML-DSA-44 verify 136368 cycles 136392 cycles 1.00
ML-DSA-65 keypair 220745 cycles 220751 cycles 1.00
ML-DSA-65 sign 713952 cycles 713856 cycles 1.00
ML-DSA-65 verify 220714 cycles 220762 cycles 1.00
ML-DSA-87 keypair 365220 cycles 365145 cycles 1.00
ML-DSA-87 sign 915561 cycles 921328 cycles 0.99
ML-DSA-87 verify 370865 cycles 370840 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 61670 cycles 61568 cycles 1.00
ML-DSA-44 sign 189150 cycles 188965 cycles 1.00
ML-DSA-44 verify 66433 cycles 66389 cycles 1.00
ML-DSA-65 keypair 108153 cycles 108294 cycles 1.00
ML-DSA-65 sign 310773 cycles 312120 cycles 1.00
ML-DSA-65 verify 108909 cycles 109456 cycles 1.00
ML-DSA-87 keypair 170970 cycles 171678 cycles 1.00
ML-DSA-87 sign 378610 cycles 379290 cycles 1.00
ML-DSA-87 verify 169701 cycles 169549 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 211514 cycles 211645 cycles 1.00
ML-DSA-44 sign 759499 cycles 759768 cycles 1.00
ML-DSA-44 verify 229074 cycles 229224 cycles 1.00
ML-DSA-65 keypair 377633 cycles 377412 cycles 1.00
ML-DSA-65 sign 1247628 cycles 1247442 cycles 1.00
ML-DSA-65 verify 373031 cycles 371695 cycles 1.00
ML-DSA-87 keypair 600257 cycles 601065 cycles 1.00
ML-DSA-87 sign 1585029 cycles 1584744 cycles 1.00
ML-DSA-87 verify 616424 cycles 616529 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 118967 cycles 118751 cycles 1.00
ML-DSA-44 sign 458258 cycles 458504 cycles 1.00
ML-DSA-44 verify 130577 cycles 130683 cycles 1.00
ML-DSA-65 keypair 201105 cycles 201607 cycles 1.00
ML-DSA-65 sign 745177 cycles 743209 cycles 1.00
ML-DSA-65 verify 209237 cycles 209226 cycles 1.00
ML-DSA-87 keypair 330130 cycles 330027 cycles 1.00
ML-DSA-87 sign 937192 cycles 939348 cycles 1.00
ML-DSA-87 verify 343610 cycles 343318 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 153955 cycles 153922 cycles 1.00
ML-DSA-44 sign 586125 cycles 586291 cycles 1.00
ML-DSA-44 verify 168817 cycles 168698 cycles 1.00
ML-DSA-65 keypair 262391 cycles 261670 cycles 1.00
ML-DSA-65 sign 966005 cycles 961560 cycles 1.00
ML-DSA-65 verify 272367 cycles 271431 cycles 1.00
ML-DSA-87 keypair 431520 cycles 432139 cycles 1.00
ML-DSA-87 sign 1214437 cycles 1208816 cycles 1.00
ML-DSA-87 verify 447189 cycles 446664 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 71506 cycles 71555 cycles 1.00
ML-DSA-44 sign 209012 cycles 209006 cycles 1.00
ML-DSA-44 verify 74740 cycles 74747 cycles 1.00
ML-DSA-65 keypair 125928 cycles 125942 cycles 1.00
ML-DSA-65 sign 345460 cycles 345438 cycles 1.00
ML-DSA-65 verify 123996 cycles 124189 cycles 1.00
ML-DSA-87 keypair 206638 cycles 206597 cycles 1.00
ML-DSA-87 sign 439693 cycles 439813 cycles 1.00
ML-DSA-87 verify 204443 cycles 204460 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 112122 cycles 112118 cycles 1.00
ML-DSA-44 sign 353881 cycles 353767 cycles 1.00
ML-DSA-44 verify 117195 cycles 117191 cycles 1.00
ML-DSA-65 keypair 194355 cycles 194374 cycles 1.00
ML-DSA-65 sign 583731 cycles 583730 cycles 1.00
ML-DSA-65 verify 193133 cycles 193093 cycles 1.00
ML-DSA-87 keypair 320002 cycles 320119 cycles 1.00
ML-DSA-87 sign 747427 cycles 747165 cycles 1.00
ML-DSA-87 verify 317882 cycles 318002 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 138025 cycles 138072 cycles 1.00
ML-DSA-44 sign 485995 cycles 486271 cycles 1.00
ML-DSA-44 verify 149089 cycles 149116 cycles 1.00
ML-DSA-65 keypair 241809 cycles 241864 cycles 1.00
ML-DSA-65 sign 791628 cycles 791723 cycles 1.00
ML-DSA-65 verify 241527 cycles 241299 cycles 1.00
ML-DSA-87 keypair 396331 cycles 396324 cycles 1.00
ML-DSA-87 sign 1013226 cycles 1019414 cycles 0.99
ML-DSA-87 verify 403783 cycles 403735 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 211752 cycles 211724 cycles 1.00
ML-DSA-44 sign 760132 cycles 760180 cycles 1.00
ML-DSA-44 verify 229569 cycles 229569 cycles 1
ML-DSA-65 keypair 378185 cycles 378138 cycles 1.00
ML-DSA-65 sign 1247309 cycles 1247214 cycles 1.00
ML-DSA-65 verify 372234 cycles 371998 cycles 1.00
ML-DSA-87 keypair 601566 cycles 601823 cycles 1.00
ML-DSA-87 sign 1582270 cycles 1582328 cycles 1.00
ML-DSA-87 verify 617531 cycles 617758 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 271669 cycles 268067 cycles 1.01
ML-DSA-44 sign 810367 cycles 814806 cycles 0.99
ML-DSA-44 verify 273132 cycles 271238 cycles 1.01
ML-DSA-65 keypair 466736 cycles 462202 cycles 1.01
ML-DSA-65 sign 1356642 cycles 1331273 cycles 1.02
ML-DSA-65 verify 455030 cycles 448838 cycles 1.01
ML-DSA-87 keypair 799623 cycles 792168 cycles 1.01
ML-DSA-87 sign 1848570 cycles 1802141 cycles 1.03
ML-DSA-87 verify 776020 cycles 771477 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 465420 cycles 465524 cycles 1.00
ML-DSA-44 sign 2136883 cycles 2145613 cycles 1.00
ML-DSA-44 verify 557374 cycles 557314 cycles 1.00
ML-DSA-65 keypair 784496 cycles 785558 cycles 1.00
ML-DSA-65 sign 3494793 cycles 3501121 cycles 1.00
ML-DSA-65 verify 868115 cycles 871709 cycles 1.00
ML-DSA-87 keypair 1273557 cycles 1268515 cycles 1.00
ML-DSA-87 sign 4353626 cycles 4307540 cycles 1.01
ML-DSA-87 verify 1395248 cycles 1395476 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Details
Benchmark suite Current: 7c13c63 Previous: 70fdba0 Ratio
ML-DSA-44 keypair 760220 cycles 759843 cycles 1.00
ML-DSA-44 sign 3140735 cycles 3139003 cycles 1.00
ML-DSA-44 verify 859523 cycles 859050 cycles 1.00
ML-DSA-65 keypair 1285550 cycles 1286052 cycles 1.00
ML-DSA-65 sign 5074741 cycles 5077105 cycles 1.00
ML-DSA-65 verify 1363668 cycles 1364403 cycles 1.00
ML-DSA-87 keypair 2111679 cycles 2110480 cycles 1.00
ML-DSA-87 sign 6348351 cycles 6365186 cycles 1.00
ML-DSA-87 verify 2227517 cycles 2229232 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants