Skip to content

x86_64 + HOL-Light: Replace polyz_unpack AVX2 intrinsics with hand-written assembly and HOL-Light proofs#1182

Merged
hanno-becker merged 1 commit into
mainfrom
jakemas/polyz-unpack-asm
Jun 16, 2026
Merged

x86_64 + HOL-Light: Replace polyz_unpack AVX2 intrinsics with hand-written assembly and HOL-Light proofs#1182
hanno-becker merged 1 commit into
mainfrom
jakemas/polyz-unpack-asm

Conversation

@jakemas

@jakemas jakemas commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Resolves #925
Resolves #915

Replace the AVX2 C intrinsics for polyz_unpack with fully-unrolled hand-written assembly, mirroring the existing AArch64 conversion, and add HOL-Light functional-correctness and memory-safety proofs together with CBMC contracts. Both variants are covered:

  • polyz_unpack_17 (GAMMA1 = 2^17, 18-bit packed, ML-DSA-44)
  • polyz_unpack_19 (GAMMA1 = 2^19, 20-bit packed, ML-DSA-65/87)

Performance

polyz_unpack component benchmark, median cycles on AMD EPYC (c6a), OPT=1 CYCLES=PMU:

Variant AVX2 intrinsics (baseline, main) Hand-written AVX2 asm (this PR)
polyz_unpack ML-DSA-44 ~816 ~816
polyz_unpack ML-DSA-65/87 ~824 ~824

With this, just 2: use_hint (conversion in PR), uniform_rej_eta{2/4} (I've done the conversion, now working on proof) remain as x86 intrinsics.

@jakemas jakemas force-pushed the jakemas/polyz-unpack-asm branch from 923c582 to bbb1114 Compare June 15, 2026 20:17
@jakemas jakemas changed the base branch from jakemas/poly-decompose-asm to main June 15, 2026 20:26
@jakemas jakemas marked this pull request as ready for review June 15, 2026 20:27
@jakemas jakemas requested a review from a team as a code owner June 15, 2026 20:27
@jakemas jakemas marked this pull request as draft June 15, 2026 20:27
@jakemas jakemas force-pushed the jakemas/polyz-unpack-asm branch 2 times, most recently from 6db58be to 5fa4415 Compare June 15, 2026 20:47
@oqs-bot

oqs-bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-65, REDUCE-RAM)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1486s 1504s -1.2%
mld_invntt_layer 158s 163s -3%
poly_pointwise_montgomery_c 123s 126s -2%
rej_uniform_native 115s 119s -3%
polyvec_matrix_pointwise_montgomery_yvec 80s 81s -1%
mld_ct_memcmp 68s 62s +10%
mld_ntt_layer 42s 42s +0%
fqmul 40s 37s +8%
polyveck_chknorm 37s 39s -5%
mld_attempt_signature_generation 24s 24s +0%
keccakf1600x4_permute_native 22s 22s +0%
mld_ntt_butterfly_block 21s 21s +0%
polyveck_decompose 17s 20s -15%
rej_uniform_c 17s 17s +0%
poly_chknorm_c 16s 21s -24%
mld_check_pct 15s 13s +15%
sign_verify_internal 15s 13s +15%
polyt0_unpack 14s 17s -18%
poly_uniform_eta_4x 13s 11s +18%
polyvecl_chknorm 12s 13s -8%
poly_add 11s 12s -8%
keccak_absorb_once_x4 10s 10s +0%
polyz_unpack_c 9s 6s +50%
mld_keccakf1600_permute_c 8s 7s +14%
poly_invntt_tomont_c 8s 6s +33%
rej_uniform 8s 8s +0%
sign 8s 8s +0%
polyvec_matrix_pointwise_montgomery_row 7s 7s +0%
polyvecl_ntt 7s 7s +0%
compute_pack_t0_t1 6s 8s -25%
keccak_absorb 6s 7s -14%
mld_compute_pack_z 6s 6s +0%
pack_sig_h 6s 5s +20%
pointwise_acc_native_aarch64 6s 6s +0%
poly_challenge 6s 3s +100%
polyveck_caddq 6s 9s -33%
polyveck_invntt_tomont 6s 6s +0%
polyveck_pack_w1 6s 6s +0%
polyveck_reduce 6s 4s +50%
polyw1_pack_32 6s 4s +50%
sign_pk_from_sk 6s 6s +0%
fqscale 5s 2s +150%
mld_ct_get_optblocker_u32 5s 3s +67%
mld_h 5s 4s +25%
ntt_native_aarch64 5s 3s +67%
pointwise_acc_native_x86_64 5s 6s -17%
poly_caddq_native_aarch64 5s 2s +150%
poly_decompose_native 5s 4s +25%
poly_pointwise_montgomery 5s 3s +67%
poly_shiftl 5s 6s -17%
polyeta_unpack 5s 4s +25%
polyt1_pack 5s 2s +150%
polyvec_matrix_expand 5s 4s +25%
polyz_unpack_19_native_aarch64 5s 2s +150%
power2round 5s 3s +67%
reduce32 5s 4s +25%
sign_verify_pre_hash_shake256 5s 5s +0%
sk_t0hat_get_poly 5s 2s +150%
keccak_squeezeblocks_x4 4s 5s -20%
keccakf1600_permute 4s 2s +100%
make_hint 4s 2s +100%
mld_ct_abs_i32 4s 3s +33%
mld_ct_get_optblocker_i64 4s 2s +100%
mld_sample_s1_s2_serial 4s 2s +100%
mld_value_barrier_u32 4s 2s +100%
nttunpack_native_x86_64 4s 2s +100%
pointwise_native_x86_64 4s 3s +33%
poly_caddq_c 4s 4s +0%
poly_caddq_native_x86_64 4s 2s +100%
poly_chknorm_native 4s 3s +33%
poly_chknorm_native_aarch64 4s 1s +300%
poly_decompose_c 4s 2s +100%
poly_ntt_native 4s 3s +33%
poly_permute_bitrev_to_custom_optional 4s 2s +100%
poly_permute_bitrev_to_custom_optional_native 4s 2s +100%
poly_pointwise_montgomery_native 4s 4s +0%
poly_power2round 4s 7s -43%
poly_uniform_eta 4s 3s +33%
poly_use_hint_native 4s 2s +100%
poly_use_hint_native_aarch64 4s 2s +100%
polyt0_pack 4s 4s +0%
polyvecl_pack_eta 4s 4s +0%
polyz_unpack_17_native_aarch64 4s 1s +300%
shake128_absorb 4s 4s +0%
shake128x4_absorb_once 4s 6s -33%
shake256 4s 3s +33%
sign_signature_internal 4s 3s +33%
sign_signature_pre_hash_shake256 4s 6s -33%
sign_verify_pre_hash_internal 4s 5s -20%
unpack_sk 4s 2s +100%
caddq 3s 4s -25%
intt_native_aarch64 3s 2s +50%
keccak_f1600_x4_native_aarch64_v84a 3s 3s +0%
keccak_init 3s 3s +0%
keccakf1600_xor_bytes 3s 1s +200%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
keccakf1600x4_xor_bytes 3s 3s +0%
keccakf1600x4_xor_bytes_native 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 5s -40%
mld_keccakf1600_extract_bytes 3s 5s -40%
mld_polymat_expand_entry 3s 3s +0%
mld_prepare_domain_separation_prefix 3s 3s +0%
mld_sample_s1_s2 3s 4s -25%
mld_value_barrier_i64 3s 4s -25%
mld_value_barrier_u8 3s 2s +50%
ntt_native_x86_64 3s 3s +0%
pack_sk_s1 3s 4s -25%
pointwise_native_aarch64 3s 4s -25%
poly_caddq 3s 2s +50%
poly_caddq_native 3s 3s +0%
poly_chknorm_native_x86_64 3s 2s +50%
poly_decompose_32_native_aarch64 3s 3s +0%
poly_decompose_88_native_aarch64 3s 1s +200%
poly_invntt_tomont 3s 2s +50%
poly_invntt_tomont_native 3s 4s -25%
poly_uniform_4x 3s 2s +50%
poly_use_hint 3s 2s +50%
polyvec_matrix_expand_serial 3s 2s +50%
polyveck_ntt 3s 2s +50%
polyveck_pack_eta 3s 3s +0%
polyvecl_pointwise_acc_montgomery_c 3s 4s -25%
polyvecl_uniform_gamma1_serial 3s 3s +0%
polyvecl_unpack_eta 3s 2s +50%
polyz_pack 3s 3s +0%
rej_eta_c 3s 2s +50%
rej_eta_native 3s 2s +50%
rej_uniform_eta_native_aarch64 3s 4s -25%
shake128_init 3s 4s -25%
shake128_release 3s 1s +200%
shake256_absorb 3s 5s -40%
shake256_finalize 3s 2s +50%
shake256x4_absorb_once 3s 4s -25%
sig_unpack_hints 3s 3s +0%
sign_keypair 3s 4s -25%
sign_open 3s 2s +50%
sign_signature 3s 4s -25%
sign_signature_extmu 3s 3s +0%
sign_signature_pre_hash_internal 3s 4s -25%
sign_verify 3s 3s +0%
sign_verify_extmu 3s 3s +0%
sk_s1hat_get_poly 3s 1s +200%
unpack_pk_t1 3s 4s -25%
unpack_sk_s2hat 3s 3s +0%
unpack_sk_t0hat 3s 3s +0%
use_hint 3s 3s +0%
yvec_init 3s 3s +0%
decompose 2s 2s +0%
intt_native_x86_64 2s 4s -50%
keccak_f1600_x1_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 2s +0%
keccak_finalize 2s 5s -60%
keccakf1600_extract_bytes (big endian) 2s 3s -33%
keccakf1600_permute_native 2s 1s +100%
keccakf1600x4_extract_bytes 2s 5s -60%
keccakf1600x4_extract_bytes_native 2s 3s -33%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_sel_int32 2s 1s +100%
mld_keccakf1600x4_extract_bytes_c 2s 2s +0%
mld_keccakf1600x4_xor_bytes_c 2s 3s -33%
montgomery_reduce 2s 3s -33%
pack_sig_c 2s 5s -60%
pack_sig_z 2s 5s -60%
poly_decompose 2s 3s -33%
poly_ntt 2s 2s +0%
poly_ntt_c 2s 4s -50%
poly_reduce 2s 5s -60%
poly_sub 2s 3s -33%
poly_uniform 2s 5s -60%
poly_uniform_gamma1 2s 2s +0%
poly_uniform_gamma1_4x 2s 2s +0%
poly_use_hint_c 2s 5s -60%
polyeta_pack 2s 6s -67%
polyt1_unpack 2s 4s -50%
polyveck_unpack_eta 2s 4s -50%
polyvecl_unpack_z 2s 2s +0%
polyw1_pack 2s 7s -71%
polyw1_pack_88 2s 4s -50%
polyz_unpack_native 2s 2s +0%
polyz_unpack_native_x86_64 2s - new
rej_eta 2s 3s -33%
rej_uniform_native_aarch64 2s 2s +0%
shake128_finalize 2s 2s +0%
shake128_squeeze 2s 4s -50%
shake128x4_squeezeblocks 2s 3s -33%
shake256_release 2s 3s -33%
shake256x4_squeezeblocks 2s 3s -33%
sign_keypair_internal 2s 3s -33%
sk_s2hat_get_poly 2s 3s -33%
sys_check_capability 2s 4s -50%
unpack_sk_s1hat 2s 3s -33%
yvec_get_poly 2s 1s +100%
keccak_squeeze 1s 2s -50%
keccakf1600x4_permute 1s 2s -50%
mld_ct_cmask_nonzero_u32 1s 3s -67%
mld_ct_get_optblocker_u8 1s 3s -67%
pack_sk_rho_key_tr_s2 1s 4s -75%
poly_chknorm 1s 1s +0%
polyvecl_pointwise_acc_montgomery 1s 2s -50%
polyvecl_pointwise_acc_montgomery_native 1s 3s -67%
polyvecl_uniform_gamma1 1s 2s -50%
polyz_unpack 1s 3s -67%
shake256_init 1s 2s -50%
shake256_squeeze 1s 3s -67%

@oqs-bot

oqs-bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-44, REDUCE-RAM)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1558s 1453s +7.2%
mld_invntt_layer 174s 147s +18%
poly_pointwise_montgomery_c 140s 125s +12%
rej_uniform_native 127s 119s +7%
polyvec_matrix_pointwise_montgomery_yvec 115s 112s +3%
mld_ct_memcmp 66s 65s +2%
mld_ntt_layer 45s 42s +7%
fqmul 42s 40s +5%
mld_attempt_signature_generation 26s 25s +4%
mld_ntt_butterfly_block 25s 21s +19%
keccakf1600x4_permute_native 23s 23s +0%
sign_verify_internal 23s 20s +15%
poly_chknorm_c 17s 18s -6%
rej_uniform_c 17s 15s +13%
polyt0_unpack 15s 16s -6%
poly_uniform_eta_4x 14s 14s +0%
polyeta_unpack 14s 16s -12%
mld_check_pct 13s 12s +8%
poly_add 11s 9s +22%
polyz_unpack_c 11s 11s +0%
compute_pack_t0_t1 10s 9s +11%
keccak_absorb_once_x4 9s 10s -10%
mld_compute_pack_z 9s 5s +80%
keccak_absorb 8s 6s +33%
keccak_squeezeblocks_x4 8s 4s +100%
polyveck_chknorm 8s 6s +33%
polyveck_decompose 8s 6s +33%
rej_uniform 8s 10s -20%
sign 8s 7s +14%
sign_pk_from_sk 8s 7s +14%
mld_keccakf1600_permute_c 7s 8s -12%
poly_decompose_c 7s 7s +0%
poly_invntt_tomont_c 7s 8s -12%
poly_power2round 7s 6s +17%
poly_uniform_eta 7s 5s +40%
polyvec_matrix_pointwise_montgomery_row 7s 8s -12%
polyvecl_chknorm 7s 9s -22%
intt_native_x86_64 6s 2s +200%
polyveck_pack_eta 6s 3s +100%
montgomery_reduce 5s 2s +150%
pointwise_acc_native_x86_64 5s 5s +0%
poly_challenge 5s 5s +0%
poly_uniform 5s 5s +0%
poly_uniform_4x 5s 3s +67%
polyveck_invntt_tomont 5s 3s +67%
polyveck_reduce 5s 6s -17%
polyvecl_ntt 5s 3s +67%
polyvecl_unpack_z 5s 2s +150%
polyz_unpack_native 5s 3s +67%
sign_keypair_internal 5s 3s +67%
sign_open 5s 5s +0%
sign_signature_internal 5s 4s +25%
sign_verify_pre_hash_shake256 5s 4s +25%
unpack_sk_t0hat 5s 1s +400%
decompose 4s 1s +300%
intt_native_aarch64 4s 2s +100%
keccakf1600_xor_bytes 4s 3s +33%
keccakf1600_xor_bytes (big endian) 4s 2s +100%
mld_ct_get_optblocker_u32 4s 2s +100%
mld_keccakf1600_extract_bytes 4s 2s +100%
mld_keccakf1600x4_extract_bytes_c 4s 1s +300%
mld_prepare_domain_separation_prefix 4s 4s +0%
mld_sample_s1_s2_serial 4s 3s +33%
mld_value_barrier_u8 4s 2s +100%
pack_sig_c 4s 3s +33%
pointwise_acc_native_aarch64 4s 4s +0%
pointwise_native_x86_64 4s 3s +33%
poly_caddq_native_x86_64 4s 1s +300%
poly_chknorm_native_x86_64 4s 4s +0%
poly_ntt 4s 3s +33%
poly_shiftl 4s 4s +0%
poly_sub 4s 4s +0%
poly_use_hint_c 4s 3s +33%
polyeta_pack 4s 2s +100%
polyveck_caddq 4s 6s -33%
polyvecl_pointwise_acc_montgomery 4s 3s +33%
polyw1_pack_88 4s 3s +33%
polyz_unpack 4s 3s +33%
polyz_unpack_19_native_aarch64 4s 2s +100%
shake128_init 4s 6s -33%
shake256 4s 2s +100%
shake256_release 4s 1s +300%
sign_keypair 4s 4s +0%
sign_signature 4s 3s +33%
sign_signature_pre_hash_internal 4s 4s +0%
sign_verify 4s 3s +33%
sign_verify_pre_hash_internal 4s 3s +33%
sys_check_capability 4s 3s +33%
yvec_get_poly 4s 1s +300%
caddq 3s 1s +200%
fqscale 3s 2s +50%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 3s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 1s +200%
keccakf1600x4_permute 3s 2s +50%
keccakf1600x4_xor_bytes_native 3s 3s +0%
make_hint 3s 4s -25%
mld_ct_cmask_neg_i32 3s 1s +200%
mld_ct_cmask_nonzero_u32 3s 1s +200%
mld_ct_cmask_nonzero_u8 3s 1s +200%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_value_barrier_u32 3s 1s +200%
ntt_native_x86_64 3s 1s +200%
pack_sig_h 3s 5s -40%
pack_sig_z 3s 3s +0%
pack_sk_rho_key_tr_s2 3s 4s -25%
pointwise_native_aarch64 3s 3s +0%
poly_caddq_c 3s 4s -25%
poly_caddq_native 3s 2s +50%
poly_chknorm_native 3s 4s -25%
poly_chknorm_native_aarch64 3s 5s -40%
poly_decompose 3s 3s +0%
poly_decompose_native 3s 5s -40%
poly_ntt_c 3s 3s +0%
poly_permute_bitrev_to_custom_optional_native 3s 5s -40%
poly_pointwise_montgomery 3s 3s +0%
poly_reduce 3s 4s -25%
poly_use_hint 3s 4s -25%
poly_use_hint_native 3s 4s -25%
poly_use_hint_native_aarch64 3s 4s -25%
polyt0_pack 3s 4s -25%
polyt1_pack 3s 5s -40%
polyt1_unpack 3s 2s +50%
polyvec_matrix_expand_serial 3s 5s -40%
polyveck_pack_w1 3s 2s +50%
polyvecl_pointwise_acc_montgomery_native 3s 1s +200%
polyvecl_uniform_gamma1 3s 2s +50%
polyz_unpack_17_native_aarch64 3s 2s +50%
polyz_unpack_native_x86_64 3s - new
power2round 3s 3s +0%
reduce32 3s 3s +0%
rej_eta_c 3s 3s +0%
rej_eta_native 3s 3s +0%
rej_uniform_eta_native_aarch64 3s 2s +50%
shake128_absorb 3s 3s +0%
shake128_release 3s 2s +50%
shake256_finalize 3s 2s +50%
sig_unpack_hints 3s 2s +50%
sign_signature_pre_hash_shake256 3s 3s +0%
sign_verify_extmu 3s 2s +50%
sk_s1hat_get_poly 3s 3s +0%
sk_s2hat_get_poly 3s 4s -25%
unpack_sk 3s 2s +50%
unpack_sk_s2hat 3s 3s +0%
use_hint 3s 1s +200%
keccak_f1600_x1_native_aarch64 2s 3s -33%
keccak_f1600_x4_native_avx2 2s 4s -50%
keccak_finalize 2s 3s -33%
keccak_init 2s 4s -50%
keccak_squeeze 2s 1s +100%
keccakf1600_permute 2s 4s -50%
keccakf1600_permute_native 2s 1s +100%
keccakf1600x4_extract_bytes 2s 4s -50%
keccakf1600x4_extract_bytes_native 2s 1s +100%
keccakf1600x4_xor_bytes 2s 2s +0%
mld_ct_abs_i32 2s 2s +0%
mld_ct_get_optblocker_u8 2s 3s -33%
mld_ct_sel_int32 2s 3s -33%
mld_h 2s 3s -33%
mld_keccakf1600x4_xor_bytes_c 2s 1s +100%
mld_polymat_expand_entry 2s 2s +0%
mld_sample_s1_s2 2s 8s -75%
ntt_native_aarch64 2s 3s -33%
pack_sk_s1 2s 3s -33%
poly_caddq_native_aarch64 2s 1s +100%
poly_chknorm 2s 2s +0%
poly_decompose_32_native_aarch64 2s 1s +100%
poly_decompose_88_native_aarch64 2s 4s -50%
poly_invntt_tomont 2s 3s -33%
poly_invntt_tomont_native 2s 2s +0%
poly_ntt_native 2s 2s +0%
poly_uniform_gamma1 2s 3s -33%
poly_uniform_gamma1_4x 2s 3s -33%
polyvec_matrix_expand 2s 4s -50%
polyveck_ntt 2s 4s -50%
polyveck_unpack_eta 2s 2s +0%
polyvecl_pack_eta 2s 2s +0%
polyvecl_pointwise_acc_montgomery_c 2s 3s -33%
polyvecl_uniform_gamma1_serial 2s 4s -50%
polyvecl_unpack_eta 2s 2s +0%
polyw1_pack 2s 1s +100%
polyw1_pack_32 2s 3s -33%
polyz_pack 2s 2s +0%
rej_eta 2s 4s -50%
rej_uniform_native_aarch64 2s 5s -60%
shake128_finalize 2s 2s +0%
shake128_squeeze 2s 2s +0%
shake128x4_squeezeblocks 2s 3s -33%
shake256x4_absorb_once 2s 2s +0%
shake256x4_squeezeblocks 2s 1s +100%
sign_signature_extmu 2s 4s -50%
sk_t0hat_get_poly 2s 2s +0%
unpack_pk_t1 2s 2s +0%
yvec_init 2s 4s -50%
keccakf1600_extract_bytes (big endian) 1s 4s -75%
mld_value_barrier_i64 1s 2s -50%
nttunpack_native_x86_64 1s 2s -50%
poly_caddq 1s 3s -67%
poly_permute_bitrev_to_custom_optional 1s 2s -50%
poly_pointwise_montgomery_native 1s 2s -50%
shake128x4_absorb_once 1s 3s -67%
shake256_absorb 1s 3s -67%
shake256_init 1s 5s -80%
shake256_squeeze 1s 4s -75%
unpack_sk_s1hat 1s 2s -50%

@oqs-bot

oqs-bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-87, REDUCE-RAM)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1690s 1624s +4.1%
mld_invntt_layer 187s 172s +9%
polyvec_matrix_pointwise_montgomery_yvec 167s 155s +8%
poly_pointwise_montgomery_c 143s 136s +5%
rej_uniform_native 134s 125s +7%
mld_ct_memcmp 69s 65s +6%
mld_ntt_layer 44s 43s +2%
fqmul 41s 43s -5%
mld_attempt_signature_generation 36s 37s -3%
keccakf1600x4_permute_native 24s 26s -8%
mld_ntt_butterfly_block 24s 23s +4%
sign_verify_internal 24s 22s +9%
poly_chknorm_c 21s 18s +17%
polyveck_decompose 19s 18s +6%
mld_check_pct 18s 13s +38%
polyt0_unpack 18s 17s +6%
rej_uniform_c 17s 16s +6%
polyeta_unpack 16s 14s +14%
compute_pack_t0_t1 14s 11s +27%
poly_uniform_eta_4x 14s 13s +8%
polyvecl_chknorm 12s 10s +20%
keccak_absorb_once_x4 10s 10s +0%
poly_add 10s 11s -9%
poly_invntt_tomont_c 9s 9s +0%
polyz_unpack_c 9s 6s +50%
rej_uniform 9s 8s +12%
mld_keccakf1600_permute_c 8s 6s +33%
mld_sample_s1_s2_serial 8s 5s +60%
pointwise_acc_native_aarch64 8s 8s +0%
pointwise_acc_native_x86_64 8s 6s +33%
polyveck_invntt_tomont 8s 7s +14%
polyvecl_ntt 8s 9s -11%
sign 8s 8s +0%
keccak_absorb 6s 7s -14%
mld_compute_pack_z 6s 3s +100%
poly_caddq_c 6s 3s +100%
poly_challenge 6s 6s +0%
poly_decompose_c 6s 5s +20%
poly_power2round 6s 5s +20%
polyvec_matrix_pointwise_montgomery_row 6s 8s -25%
sign_verify_pre_hash_shake256 6s 3s +100%
mld_sample_s1_s2 5s 4s +25%
pack_sig_h 5s 3s +67%
pointwise_native_aarch64 5s 3s +67%
poly_caddq 5s 4s +25%
poly_chknorm_native_x86_64 5s 4s +25%
poly_ntt_native 5s 4s +25%
poly_pointwise_montgomery_native 5s 5s +0%
poly_uniform 5s 4s +25%
poly_use_hint_native_aarch64 5s 2s +150%
polyt0_pack 5s 4s +25%
polyveck_caddq 5s 5s +0%
polyvecl_pack_eta 5s 2s +150%
reduce32 5s 5s +0%
shake128x4_absorb_once 5s 2s +150%
sign_signature_extmu 5s 4s +25%
sign_signature_pre_hash_shake256 5s 5s +0%
sign_verify 5s 6s -17%
keccak_f1600_x4_native_avx2 4s 4s +0%
keccakf1600x4_xor_bytes_native 4s 1s +300%
mld_ct_cmask_nonzero_u32 4s 5s -20%
mld_prepare_domain_separation_prefix 4s 3s +33%
nttunpack_native_x86_64 4s 4s +0%
pack_sig_c 4s 3s +33%
pack_sig_z 4s 3s +33%
poly_chknorm 4s 2s +100%
poly_chknorm_native 4s 4s +0%
poly_invntt_tomont 4s 6s -33%
poly_invntt_tomont_native 4s 2s +100%
poly_permute_bitrev_to_custom_optional 4s 2s +100%
poly_use_hint_c 4s 2s +100%
polyt1_unpack 4s 3s +33%
polyveck_chknorm 4s 3s +33%
polyveck_ntt 4s 2s +100%
polyveck_pack_w1 4s 3s +33%
polyveck_reduce 4s 6s -33%
polyvecl_unpack_eta 4s 2s +100%
polyz_unpack_17_native_aarch64 4s 2s +100%
power2round 4s 3s +33%
rej_eta_native 4s 3s +33%
sign_keypair 4s 3s +33%
sign_keypair_internal 4s 7s -43%
sign_open 4s 5s -20%
sign_pk_from_sk 4s 6s -33%
sign_signature_internal 4s 5s -20%
sign_verify_extmu 4s 3s +33%
unpack_pk_t1 4s 3s +33%
unpack_sk 4s 4s +0%
unpack_sk_s1hat 4s 3s +33%
use_hint 4s 4s +0%
yvec_get_poly 4s 2s +100%
caddq 3s 2s +50%
decompose 3s 3s +0%
fqscale 3s 3s +0%
intt_native_aarch64 3s 3s +0%
intt_native_x86_64 3s 4s -25%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccak_squeezeblocks_x4 3s 4s -25%
keccakf1600_extract_bytes (big endian) 3s 4s -25%
keccakf1600_permute_native 3s 3s +0%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
keccakf1600x4_extract_bytes 3s 2s +50%
keccakf1600x4_xor_bytes 3s 2s +50%
mld_ct_abs_i32 3s 3s +0%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_get_optblocker_i64 3s 1s +200%
mld_h 3s 4s -25%
mld_keccakf1600x4_xor_bytes_c 3s 3s +0%
mld_polymat_expand_entry 3s 5s -40%
mld_value_barrier_u8 3s 2s +50%
ntt_native_aarch64 3s 8s -62%
ntt_native_x86_64 3s 6s -50%
pack_sk_rho_key_tr_s2 3s 3s +0%
pointwise_native_x86_64 3s 4s -25%
poly_caddq_native 3s 4s -25%
poly_caddq_native_aarch64 3s 3s +0%
poly_chknorm_native_aarch64 3s 4s -25%
poly_decompose 3s 2s +50%
poly_decompose_32_native_aarch64 3s 2s +50%
poly_decompose_88_native_aarch64 3s 4s -25%
poly_decompose_native 3s 2s +50%
poly_ntt 3s 4s -25%
poly_reduce 3s 4s -25%
poly_shiftl 3s 3s +0%
poly_uniform_4x 3s 3s +0%
poly_uniform_gamma1 3s 3s +0%
poly_uniform_gamma1_4x 3s 5s -40%
poly_use_hint 3s 3s +0%
poly_use_hint_native 3s 3s +0%
polyt1_pack 3s 4s -25%
polyvec_matrix_expand 3s 2s +50%
polyveck_pack_eta 3s 2s +50%
polyveck_unpack_eta 3s 4s -25%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyz_pack 3s 1s +200%
polyz_unpack_19_native_aarch64 3s 3s +0%
rej_eta_c 3s 4s -25%
rej_uniform_native_aarch64 3s 4s -25%
shake128_init 3s 2s +50%
shake128_squeeze 3s 1s +200%
shake128x4_squeezeblocks 3s 2s +50%
shake256_absorb 3s 2s +50%
shake256_finalize 3s 1s +200%
shake256_squeeze 3s 3s +0%
shake256x4_squeezeblocks 3s 3s +0%
sig_unpack_hints 3s 3s +0%
sign_signature 3s 3s +0%
sign_signature_pre_hash_internal 3s 3s +0%
sign_verify_pre_hash_internal 3s 4s -25%
unpack_sk_s2hat 3s 3s +0%
unpack_sk_t0hat 3s 7s -57%
yvec_init 3s 2s +50%
keccak_f1600_x4_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 3s -33%
keccak_finalize 2s 2s +0%
keccak_init 2s 3s -33%
keccak_squeeze 2s 3s -33%
keccakf1600_permute 2s 2s +0%
keccakf1600x4_extract_bytes_native 2s 2s +0%
keccakf1600x4_permute 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 3s -33%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_keccakf1600x4_extract_bytes_c 2s 2s +0%
mld_value_barrier_i64 2s 3s -33%
mld_value_barrier_u32 2s 2s +0%
montgomery_reduce 2s 2s +0%
poly_ntt_c 2s 3s -33%
poly_permute_bitrev_to_custom_optional_native 2s 4s -50%
poly_pointwise_montgomery 2s 3s -33%
poly_uniform_eta 2s 4s -50%
polyeta_pack 2s 4s -50%
polyvec_matrix_expand_serial 2s 4s -50%
polyvecl_pointwise_acc_montgomery_c 2s 3s -33%
polyvecl_uniform_gamma1_serial 2s 5s -60%
polyvecl_unpack_z 2s 3s -33%
polyw1_pack 2s 2s +0%
polyw1_pack_32 2s 3s -33%
polyw1_pack_88 2s 2s +0%
polyz_unpack 2s 3s -33%
polyz_unpack_native 2s 3s -33%
polyz_unpack_native_x86_64 2s - new
rej_eta 2s 3s -33%
rej_uniform_eta_native_aarch64 2s 5s -60%
shake128_absorb 2s 2s +0%
shake128_release 2s 2s +0%
shake256 2s 4s -50%
shake256_init 2s 4s -50%
shake256_release 2s 1s +100%
shake256x4_absorb_once 2s 2s +0%
sk_s1hat_get_poly 2s 2s +0%
sk_t0hat_get_poly 2s 2s +0%
sys_check_capability 2s 4s -50%
keccak_f1600_x1_native_aarch64_v84a 1s 3s -67%
keccakf1600_xor_bytes 1s 3s -67%
make_hint 1s 2s -50%
mld_ct_get_optblocker_u8 1s 2s -50%
mld_ct_sel_int32 1s 2s -50%
mld_keccakf1600_extract_bytes 1s 3s -67%
pack_sk_s1 1s 2s -50%
poly_caddq_native_x86_64 1s 3s -67%
poly_sub 1s 1s +0%
polyvecl_uniform_gamma1 1s 3s -67%
shake128_finalize 1s 2s -50%
sk_s2hat_get_poly 1s 3s -67%

@oqs-bot

oqs-bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-44)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1819s 1752s +3.8%
mld_invntt_layer 294s 284s +4%
rej_uniform_native 155s 143s +8%
polyvecl_pointwise_acc_montgomery_c 125s 123s +2%
poly_pointwise_montgomery_c 100s 88s +14%
mld_ct_memcmp 71s 67s +6%
mld_attempt_signature_generation 63s 62s +2%
mld_ntt_layer 46s 47s -2%
fqmul 44s 43s +2%
polyvec_matrix_expand 27s 26s +4%
sign_verify_internal 26s 27s -4%
rej_uniform 25s 23s +9%
keccakf1600x4_permute_native 24s 24s +0%
mld_ntt_butterfly_block 22s 22s +0%
sign_signature_internal 22s 18s +22%
poly_chknorm_c 19s 19s +0%
compute_pack_t0_t1 16s 15s +7%
polyt0_unpack 16s 15s +7%
polyeta_unpack 15s 15s +0%
poly_uniform_eta_4x 14s 12s +17%
rej_uniform_c 14s 12s +17%
mld_check_pct 13s 14s -7%
poly_uniform_4x 13s 10s +30%
polyveck_chknorm 13s 11s +18%
polyz_unpack_c 12s 10s +20%
mld_compute_pack_z 11s 11s +0%
keccak_absorb_once_x4 10s 14s -29%
polyvec_matrix_pointwise_montgomery_yvec 10s 10s +0%
poly_add 9s 10s -10%
keccak_absorb 8s 6s +33%
poly_invntt_tomont_c 8s 7s +14%
poly_power2round 7s 5s +40%
polyvec_matrix_expand_serial 7s 6s +17%
polyveck_decompose 7s 10s -30%
sig_unpack_hints 7s 3s +133%
sign_keypair_internal 7s 4s +75%
mld_keccakf1600_permute_c 6s 8s -25%
pack_sk_rho_key_tr_s2 6s 2s +200%
pointwise_acc_native_aarch64 6s 8s -25%
pointwise_acc_native_x86_64 6s 4s +50%
poly_caddq_c 6s 4s +50%
poly_invntt_tomont_native 6s 5s +20%
poly_use_hint_c 6s 6s +0%
rej_eta_native 6s 4s +50%
sign 6s 7s -14%
sign_signature_pre_hash_shake256 6s 5s +20%
decompose 5s 3s +67%
fqscale 5s 3s +67%
mld_ct_cmask_nonzero_u8 5s 2s +150%
pack_sig_z 5s 3s +67%
poly_challenge 5s 4s +25%
poly_ntt 5s 4s +25%
polyt0_pack 5s 6s -17%
polyveck_invntt_tomont 5s 6s -17%
polyvecl_pointwise_acc_montgomery 5s 3s +67%
rej_eta 5s 1s +400%
shake256x4_absorb_once 5s 4s +25%
sign_open 5s 6s -17%
sign_verify 5s 6s -17%
keccak_squeeze 4s 2s +100%
keccakf1600_extract_bytes (big endian) 4s 2s +100%
keccakf1600_xor_bytes 4s 1s +300%
keccakf1600x4_extract_bytes 4s 4s +0%
mld_ct_cmask_nonzero_u32 4s 2s +100%
mld_h 4s 6s -33%
mld_keccakf1600_extract_bytes 4s 2s +100%
mld_polymat_expand_entry 4s 2s +100%
mld_sample_s1_s2_serial 4s 3s +33%
ntt_native_aarch64 4s 4s +0%
ntt_native_x86_64 4s 3s +33%
pack_sig_c 4s 2s +100%
pack_sk_s1 4s 2s +100%
poly_caddq_native 4s 4s +0%
poly_chknorm_native_x86_64 4s 3s +33%
poly_decompose_32_native_aarch64 4s 1s +300%
poly_decompose_88_native_aarch64 4s 2s +100%
poly_decompose_c 4s 4s +0%
poly_ntt_c 4s 3s +33%
poly_reduce 4s 4s +0%
poly_uniform_gamma1_4x 4s 4s +0%
poly_use_hint 4s 3s +33%
poly_use_hint_native 4s 2s +100%
polyvec_matrix_pointwise_montgomery_row 4s 3s +33%
polyveck_caddq 4s 2s +100%
polyveck_ntt 4s 3s +33%
polyveck_pack_eta 4s 2s +100%
polyvecl_chknorm 4s 5s -20%
polyvecl_ntt 4s 5s -20%
polyvecl_pointwise_acc_montgomery_native 4s 2s +100%
polyvecl_uniform_gamma1 4s 3s +33%
polyvecl_uniform_gamma1_serial 4s 3s +33%
polyw1_pack_32 4s 4s +0%
polyz_unpack_19_native_aarch64 4s 4s +0%
power2round 4s 2s +100%
reduce32 4s 1s +300%
shake256_absorb 4s 2s +100%
shake256_release 4s 5s -20%
sign_keypair 4s 4s +0%
sign_pk_from_sk 4s 6s -33%
sign_signature_pre_hash_internal 4s 4s +0%
sign_verify_extmu 4s 4s +0%
sign_verify_pre_hash_shake256 4s 4s +0%
sk_s2hat_get_poly 4s 3s +33%
yvec_get_poly 4s 2s +100%
caddq 3s 2s +50%
intt_native_x86_64 3s 4s -25%
keccak_f1600_x1_native_aarch64 3s 3s +0%
keccak_squeezeblocks_x4 3s 5s -40%
keccakf1600x4_extract_bytes_native 3s 3s +0%
mld_ct_abs_i32 3s 2s +50%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_keccakf1600x4_extract_bytes_c 3s 2s +50%
mld_prepare_domain_separation_prefix 3s 4s -25%
mld_sample_s1_s2 3s 3s +0%
mld_value_barrier_u32 3s 3s +0%
montgomery_reduce 3s 2s +50%
nttunpack_native_x86_64 3s 5s -40%
pack_sig_h 3s 1s +200%
pointwise_native_aarch64 3s 3s +0%
poly_caddq 3s 3s +0%
poly_caddq_native_x86_64 3s 4s -25%
poly_chknorm_native_aarch64 3s 4s -25%
poly_pointwise_montgomery 3s 2s +50%
poly_pointwise_montgomery_native 3s 1s +200%
poly_shiftl 3s 2s +50%
poly_sub 3s 2s +50%
poly_uniform 3s 7s -57%
poly_uniform_gamma1 3s 4s -25%
polyt1_unpack 3s 2s +50%
polyveck_pack_w1 3s 4s -25%
polyveck_unpack_eta 3s 2s +50%
polyvecl_pack_eta 3s 3s +0%
polyw1_pack 3s 4s -25%
polyz_unpack_17_native_aarch64 3s 2s +50%
polyz_unpack_native 3s 4s -25%
rej_eta_c 3s 5s -40%
rej_uniform_eta_native_aarch64 3s 2s +50%
shake128_init 3s 3s +0%
shake128_squeeze 3s 2s +50%
shake128x4_squeezeblocks 3s 3s +0%
shake256_init 3s 1s +200%
sk_s1hat_get_poly 3s 1s +200%
sys_check_capability 3s 2s +50%
unpack_sk 3s 3s +0%
unpack_sk_s1hat 3s 3s +0%
unpack_sk_s2hat 3s 1s +200%
intt_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 2s +0%
keccakf1600_permute 2s 1s +100%
keccakf1600_permute_native 2s 6s -67%
keccakf1600x4_permute 2s 3s -33%
make_hint 2s 4s -50%
mld_ct_cmask_neg_i32 2s 4s -50%
mld_ct_get_optblocker_i64 2s 2s +0%
mld_ct_sel_int32 2s 2s +0%
mld_keccakf1600x4_xor_bytes_c 2s 4s -50%
mld_value_barrier_i64 2s 1s +100%
pointwise_native_x86_64 2s 2s +0%
poly_caddq_native_aarch64 2s 2s +0%
poly_chknorm_native 2s 2s +0%
poly_decompose_native 2s 2s +0%
poly_ntt_native 2s 1s +100%
poly_permute_bitrev_to_custom_optional 2s 6s -67%
poly_permute_bitrev_to_custom_optional_native 2s 6s -67%
poly_uniform_eta 2s 3s -33%
poly_use_hint_native_aarch64 2s 4s -50%
polyeta_pack 2s 4s -50%
polyt1_pack 2s 4s -50%
polyveck_reduce 2s 4s -50%
polyvecl_unpack_eta 2s 4s -50%
polyvecl_unpack_z 2s 3s -33%
polyw1_pack_88 2s 4s -50%
polyz_pack 2s 4s -50%
polyz_unpack 2s 2s +0%
polyz_unpack_native_x86_64 2s - new
rej_uniform_native_aarch64 2s 2s +0%
shake128_absorb 2s 2s +0%
shake128_finalize 2s 2s +0%
shake256 2s 1s +100%
shake256_finalize 2s 3s -33%
shake256_squeeze 2s 1s +100%
shake256x4_squeezeblocks 2s 1s +100%
sign_signature 2s 4s -50%
sign_signature_extmu 2s 3s -33%
sign_verify_pre_hash_internal 2s 6s -67%
sk_t0hat_get_poly 2s 4s -50%
unpack_pk_t1 2s 2s +0%
unpack_sk_t0hat 2s 4s -50%
use_hint 2s 2s +0%
yvec_init 2s 3s -33%
keccak_finalize 1s 5s -80%
keccak_init 1s 1s +0%
keccakf1600_xor_bytes (big endian) 1s 3s -67%
keccakf1600x4_xor_bytes 1s 2s -50%
keccakf1600x4_xor_bytes_native 1s 2s -50%
mld_ct_get_optblocker_u32 1s 2s -50%
mld_value_barrier_u8 1s 4s -75%
poly_chknorm 1s 4s -75%
poly_decompose 1s 4s -75%
poly_invntt_tomont 1s 3s -67%
shake128_release 1s 3s -67%
shake128x4_absorb_once 1s 3s -67%

@oqs-bot

oqs-bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-65)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 2080s 2040s +2.0%
mld_invntt_layer 291s 286s +2%
polyvecl_pointwise_acc_montgomery_c 208s 193s +8%
rej_uniform_native 151s 149s +1%
polyvec_matrix_expand 134s 130s +3%
poly_pointwise_montgomery_c 102s 99s +3%
mld_attempt_signature_generation 71s 66s +8%
mld_ct_memcmp 66s 67s -1%
sign_verify_internal 63s 64s -2%
sign_signature_internal 47s 48s -2%
mld_ntt_layer 46s 43s +7%
fqmul 45s 43s +5%
polyvec_matrix_expand_serial 25s 25s +0%
mld_ntt_butterfly_block 23s 22s +5%
keccakf1600x4_permute_native 21s 23s -9%
rej_uniform 21s 22s -5%
poly_chknorm_c 18s 19s -5%
polyt0_unpack 18s 18s +0%
polyvecl_chknorm 18s 16s +12%
compute_pack_t0_t1 15s 16s -6%
polyveck_decompose 14s 15s -7%
rej_uniform_c 14s 12s +17%
mld_check_pct 11s 13s -15%
poly_add 11s 10s +10%
poly_uniform_eta_4x 11s 12s -8%
polyveck_chknorm 11s 11s +0%
keccak_absorb_once_x4 10s 12s -17%
poly_uniform_4x 10s 11s -9%
polyveck_invntt_tomont 9s 8s +12%
mld_compute_pack_z 8s 10s -20%
pointwise_acc_native_x86_64 8s 6s +33%
pointwise_native_x86_64 8s 3s +167%
poly_invntt_tomont_c 8s 8s +0%
sign 8s 7s +14%
mld_sample_s1_s2 7s 4s +75%
poly_decompose_c 7s 5s +40%
polyvec_matrix_pointwise_montgomery_yvec 7s 10s -30%
polyveck_ntt 7s 9s -22%
polyz_unpack_c 7s 7s +0%
keccak_absorb 6s 5s +20%
keccak_f1600_x1_native_aarch64 6s 1s +500%
mld_keccakf1600_permute_c 6s 7s -14%
poly_caddq_c 6s 7s -14%
poly_chknorm 6s 2s +200%
poly_power2round 6s 4s +50%
poly_uniform_eta 6s 3s +100%
poly_uniform_gamma1_4x 6s 3s +100%
poly_use_hint 6s 3s +100%
polyvecl_ntt 6s 6s +0%
sign_keypair_internal 6s 2s +200%
sign_pk_from_sk 6s 7s -14%
sign_signature_pre_hash_internal 6s 7s -14%
unpack_sk 6s 1s +500%
unpack_sk_t0hat 6s 6s +0%
mld_sample_s1_s2_serial 5s 9s -44%
polyt0_pack 5s 6s -17%
polyveck_caddq 5s 7s -29%
polyveck_reduce 5s 3s +67%
polyz_unpack_native_x86_64 5s - new
shake256x4_absorb_once 5s 2s +150%
sign_verify_pre_hash_shake256 5s 4s +25%
use_hint 5s 2s +150%
fqscale 4s 5s -20%
intt_native_aarch64 4s 2s +100%
keccak_squeezeblocks_x4 4s 5s -20%
keccakf1600_extract_bytes (big endian) 4s 2s +100%
keccakf1600_xor_bytes (big endian) 4s 3s +33%
keccakf1600x4_xor_bytes_native 4s 2s +100%
mld_h 4s 3s +33%
mld_prepare_domain_separation_prefix 4s 3s +33%
mld_value_barrier_u8 4s 2s +100%
ntt_native_aarch64 4s 5s -20%
pack_sig_c 4s 4s +0%
pack_sig_h 4s 4s +0%
pack_sig_z 4s 2s +100%
pack_sk_rho_key_tr_s2 4s 5s -20%
pack_sk_s1 4s 3s +33%
pointwise_acc_native_aarch64 4s 6s -33%
poly_challenge 4s 6s -33%
poly_chknorm_native 4s 4s +0%
poly_decompose_88_native_aarch64 4s 2s +100%
poly_invntt_tomont_native 4s 2s +100%
poly_ntt_native 4s 2s +100%
poly_reduce 4s 2s +100%
poly_uniform 4s 3s +33%
poly_use_hint_c 4s 3s +33%
poly_use_hint_native 4s 3s +33%
polyt1_pack 4s 2s +100%
polyvecl_pack_eta 4s 5s -20%
polyvecl_uniform_gamma1_serial 4s 3s +33%
polyw1_pack_32 4s 3s +33%
polyw1_pack_88 4s 3s +33%
polyz_unpack_17_native_aarch64 4s 3s +33%
rej_eta_c 4s 2s +100%
rej_eta_native 4s 4s +0%
shake128_squeeze 4s 1s +300%
shake256_init 4s 1s +300%
shake256x4_squeezeblocks 4s 3s +33%
sign_keypair 4s 3s +33%
sign_signature 4s 3s +33%
sign_signature_extmu 4s 2s +100%
sign_verify 4s 5s -20%
sk_s2hat_get_poly 4s 2s +100%
sk_t0hat_get_poly 4s 3s +33%
sys_check_capability 4s 2s +100%
caddq 3s 6s -50%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_avx2 3s 2s +50%
keccakf1600_permute 3s 3s +0%
keccakf1600x4_permute 3s 2s +50%
make_hint 3s 3s +0%
mld_ct_abs_i32 3s 2s +50%
mld_ct_cmask_nonzero_u32 3s 4s -25%
mld_ct_get_optblocker_u8 3s 1s +200%
nttunpack_native_x86_64 3s 6s -50%
poly_caddq 3s 2s +50%
poly_caddq_native_aarch64 3s 2s +50%
poly_caddq_native_x86_64 3s 2s +50%
poly_invntt_tomont 3s 4s -25%
poly_ntt_c 3s 2s +50%
poly_permute_bitrev_to_custom_optional_native 3s 3s +0%
poly_pointwise_montgomery 3s 2s +50%
polyeta_pack 3s 3s +0%
polyeta_unpack 3s 1s +200%
polyt1_unpack 3s 4s -25%
polyveck_pack_eta 3s 1s +200%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyvecl_unpack_z 3s 3s +0%
polyz_unpack_native 3s 2s +50%
rej_eta 3s 3s +0%
rej_uniform_eta_native_aarch64 3s 4s -25%
shake128_release 3s 2s +50%
shake128x4_squeezeblocks 3s 2s +50%
shake256_absorb 3s 2s +50%
shake256_squeeze 3s 3s +0%
sign_open 3s 3s +0%
sign_signature_pre_hash_shake256 3s 4s -25%
sign_verify_extmu 3s 4s -25%
unpack_pk_t1 3s 3s +0%
unpack_sk_s1hat 3s 3s +0%
yvec_get_poly 3s 4s -25%
decompose 2s 4s -50%
intt_native_x86_64 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 1s +100%
keccak_finalize 2s 2s +0%
keccak_init 2s 3s -33%
keccak_squeeze 2s 1s +100%
keccakf1600_xor_bytes 2s 2s +0%
keccakf1600x4_extract_bytes_native 2s 7s -71%
keccakf1600x4_xor_bytes 2s 3s -33%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_get_optblocker_u32 2s 4s -50%
mld_ct_sel_int32 2s 1s +100%
mld_keccakf1600_extract_bytes 2s 2s +0%
mld_keccakf1600x4_extract_bytes_c 2s 2s +0%
mld_polymat_expand_entry 2s 2s +0%
mld_value_barrier_i64 2s 3s -33%
mld_value_barrier_u32 2s 2s +0%
ntt_native_x86_64 2s 3s -33%
pointwise_native_aarch64 2s 3s -33%
poly_caddq_native 2s 3s -33%
poly_chknorm_native_aarch64 2s 4s -50%
poly_chknorm_native_x86_64 2s 6s -67%
poly_decompose_32_native_aarch64 2s 3s -33%
poly_decompose_native 2s 2s +0%
poly_pointwise_montgomery_native 2s 3s -33%
poly_shiftl 2s 7s -71%
poly_sub 2s 3s -33%
poly_uniform_gamma1 2s 2s +0%
poly_use_hint_native_aarch64 2s 2s +0%
polyvec_matrix_pointwise_montgomery_row 2s 2s +0%
polyveck_unpack_eta 2s 4s -50%
polyvecl_pointwise_acc_montgomery 2s 2s +0%
polyvecl_uniform_gamma1 2s 4s -50%
polyz_pack 2s 4s -50%
polyz_unpack 2s 2s +0%
polyz_unpack_19_native_aarch64 2s 3s -33%
reduce32 2s 3s -33%
rej_uniform_native_aarch64 2s 4s -50%
shake128_absorb 2s 3s -33%
shake128_finalize 2s 3s -33%
shake128_init 2s 5s -60%
shake128x4_absorb_once 2s 2s +0%
shake256_finalize 2s 3s -33%
sig_unpack_hints 2s 3s -33%
sign_verify_pre_hash_internal 2s 5s -60%
sk_s1hat_get_poly 2s 2s +0%
keccak_f1600_x1_native_aarch64_v84a 1s 2s -50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 2s -50%
keccakf1600_permute_native 1s 3s -67%
keccakf1600x4_extract_bytes 1s 2s -50%
mld_ct_cmask_neg_i32 1s 3s -67%
mld_ct_get_optblocker_i64 1s 1s +0%
mld_keccakf1600x4_xor_bytes_c 1s 3s -67%
montgomery_reduce 1s 2s -50%
poly_decompose 1s 2s -50%
poly_ntt 1s 3s -67%
poly_permute_bitrev_to_custom_optional 1s 3s -67%
polyveck_pack_w1 1s 3s -67%
polyvecl_unpack_eta 1s 3s -67%
polyw1_pack 1s 3s -67%
power2round 1s 2s -50%
shake256 1s 3s -67%
shake256_release 1s 2s -50%
unpack_sk_s2hat 1s 4s -75%
yvec_init 1s 1s +0%

@oqs-bot

oqs-bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-87)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 2517s 2406s +4.6%
polyvecl_pointwise_acc_montgomery_c 342s 319s +7%
mld_invntt_layer 313s 291s +8%
polyvec_matrix_expand 236s 223s +6%
rej_uniform_native 157s 153s +3%
mld_attempt_signature_generation 106s 105s +1%
poly_pointwise_montgomery_c 106s 94s +13%
mld_ct_memcmp 72s 64s +12%
sign_signature_internal 69s 68s +1%
sign_verify_internal 61s 61s +0%
polyvec_matrix_expand_serial 49s 48s +2%
mld_ntt_layer 46s 44s +5%
fqmul 42s 40s +5%
compute_pack_t0_t1 35s 30s +17%
polyvec_matrix_pointwise_montgomery_yvec 30s 30s +0%
mld_ntt_butterfly_block 24s 22s +9%
keccakf1600x4_permute_native 22s 25s -12%
rej_uniform 22s 23s -4%
poly_chknorm_c 21s 17s +24%
polyt0_unpack 18s 19s -5%
mld_check_pct 16s 15s +7%
polyeta_unpack 16s 14s +14%
poly_add 14s 11s +27%
poly_uniform_eta_4x 14s 12s +17%
rej_uniform_c 14s 13s +8%
poly_uniform_4x 12s 12s +0%
polyveck_invntt_tomont 12s 9s +33%
polyveck_decompose 11s 8s +38%
mld_compute_pack_z 10s 7s +43%
polyvecl_ntt 9s 8s +12%
keccak_absorb_once_x4 8s 10s -20%
mld_sample_s1_s2 8s 8s +0%
poly_invntt_tomont_c 8s 8s +0%
sign 8s 11s -27%
mld_keccakf1600_permute_c 7s 7s +0%
pointwise_acc_native_x86_64 7s 8s -12%
poly_caddq_c 7s 7s +0%
poly_decompose_c 7s 8s -12%
polyveck_ntt 7s 7s +0%
keccak_absorb 6s 6s +0%
mld_sample_s1_s2_serial 6s 5s +20%
pointwise_acc_native_aarch64 6s 7s -14%
poly_use_hint 6s 2s +200%
polyveck_caddq 6s 8s -25%
polyveck_pack_w1 6s 3s +100%
sign_pk_from_sk 6s 5s +20%
sign_verify_extmu 6s 5s +20%
sign_verify_pre_hash_internal 6s 5s +20%
sys_check_capability 6s 3s +100%
unpack_sk_t0hat 6s 5s +20%
keccak_finalize 5s 1s +400%
keccak_squeezeblocks_x4 5s 5s +0%
keccakf1600x4_permute 5s 3s +67%
ntt_native_aarch64 5s 3s +67%
poly_chknorm_native 5s 4s +25%
poly_pointwise_montgomery_native 5s 3s +67%
poly_sub 5s 4s +25%
poly_uniform 5s 4s +25%
poly_use_hint_native_aarch64 5s 2s +150%
polyvecl_pointwise_acc_montgomery_native 5s 4s +25%
polyvecl_uniform_gamma1 5s 3s +67%
polyw1_pack_32 5s 3s +67%
polyz_unpack_c 5s 4s +25%
rej_eta 5s 3s +67%
rej_eta_native 5s 5s +0%
shake128_squeeze 5s 3s +67%
shake128x4_absorb_once 5s 1s +400%
shake128x4_squeezeblocks 5s 3s +67%
sign_keypair 5s 5s +0%
sign_keypair_internal 5s 5s +0%
yvec_init 5s 3s +67%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 1s +300%
keccak_init 4s 2s +100%
keccakf1600x4_extract_bytes 4s 2s +100%
make_hint 4s 4s +0%
mld_h 4s 2s +100%
mld_keccakf1600x4_extract_bytes_c 4s 2s +100%
mld_polymat_expand_entry 4s 2s +100%
mld_prepare_domain_separation_prefix 4s 4s +0%
poly_caddq 4s 2s +100%
poly_chknorm_native_x86_64 4s 3s +33%
poly_decompose_32_native_aarch64 4s 6s -33%
poly_invntt_tomont 4s 3s +33%
poly_invntt_tomont_native 4s 4s +0%
poly_permute_bitrev_to_custom_optional 4s 2s +100%
poly_power2round 4s 4s +0%
poly_uniform_gamma1_4x 4s 4s +0%
polyt0_pack 4s 3s +33%
polyt1_unpack 4s 2s +100%
polyvec_matrix_pointwise_montgomery_row 4s 7s -43%
polyveck_chknorm 4s 5s -20%
polyvecl_chknorm 4s 4s +0%
polyvecl_uniform_gamma1_serial 4s 2s +100%
polyvecl_unpack_z 4s 4s +0%
polyw1_pack 4s 1s +300%
polyw1_pack_88 4s 3s +33%
polyz_pack 4s 3s +33%
polyz_unpack_native_x86_64 4s - new
power2round 4s 2s +100%
shake256_finalize 4s 2s +100%
sig_unpack_hints 4s 3s +33%
sign_open 4s 3s +33%
sign_signature 4s 4s +0%
sign_signature_pre_hash_internal 4s 5s -20%
sign_signature_pre_hash_shake256 4s 7s -43%
sign_verify 4s 3s +33%
sign_verify_pre_hash_shake256 4s 4s +0%
sk_t0hat_get_poly 4s 4s +0%
use_hint 4s 2s +100%
fqscale 3s 3s +0%
intt_native_x86_64 3s 4s -25%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600_permute_native 3s 5s -40%
keccakf1600_xor_bytes 3s 3s +0%
keccakf1600x4_xor_bytes_native 3s 3s +0%
mld_ct_abs_i32 3s 3s +0%
mld_ct_cmask_nonzero_u32 3s 4s -25%
mld_ct_get_optblocker_u8 3s 3s +0%
mld_value_barrier_i64 3s 4s -25%
mld_value_barrier_u32 3s 3s +0%
ntt_native_x86_64 3s 3s +0%
nttunpack_native_x86_64 3s 5s -40%
pack_sig_c 3s 5s -40%
pack_sig_h 3s 3s +0%
pack_sk_rho_key_tr_s2 3s 1s +200%
pack_sk_s1 3s 3s +0%
pointwise_native_aarch64 3s 4s -25%
pointwise_native_x86_64 3s 3s +0%
poly_caddq_native_aarch64 3s 4s -25%
poly_challenge 3s 5s -40%
poly_chknorm_native_aarch64 3s 4s -25%
poly_decompose_native 3s 5s -40%
poly_ntt 3s 5s -40%
poly_ntt_native 3s 3s +0%
poly_permute_bitrev_to_custom_optional_native 3s 4s -25%
poly_pointwise_montgomery 3s 2s +50%
poly_shiftl 3s 3s +0%
poly_use_hint_c 3s 3s +0%
poly_use_hint_native 3s 2s +50%
polyt1_pack 3s 3s +0%
polyvecl_pointwise_acc_montgomery 3s 3s +0%
polyvecl_unpack_eta 3s 3s +0%
polyz_unpack_17_native_aarch64 3s 1s +200%
rej_eta_c 3s 2s +50%
rej_uniform_eta_native_aarch64 3s 2s +50%
rej_uniform_native_aarch64 3s 3s +0%
shake128_absorb 3s 3s +0%
shake128_finalize 3s 4s -25%
shake128_init 3s 3s +0%
shake256_squeeze 3s 4s -25%
sign_signature_extmu 3s 4s -25%
sk_s1hat_get_poly 3s 3s +0%
sk_s2hat_get_poly 3s 4s -25%
unpack_sk 3s 4s -25%
unpack_sk_s2hat 3s 3s +0%
caddq 2s 3s -33%
decompose 2s 2s +0%
keccak_f1600_x1_native_aarch64 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 1s +100%
keccak_squeeze 2s 5s -60%
keccakf1600_permute 2s 2s +0%
keccakf1600_xor_bytes (big endian) 2s 3s -33%
keccakf1600x4_extract_bytes_native 2s 2s +0%
keccakf1600x4_xor_bytes 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_get_optblocker_i64 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 5s -60%
mld_value_barrier_u8 2s 2s +0%
montgomery_reduce 2s 2s +0%
poly_caddq_native_x86_64 2s 2s +0%
poly_chknorm 2s 3s -33%
poly_decompose 2s 3s -33%
poly_decompose_88_native_aarch64 2s 4s -50%
poly_ntt_c 2s 2s +0%
poly_uniform_eta 2s 4s -50%
poly_uniform_gamma1 2s 4s -50%
polyeta_pack 2s 3s -33%
polyveck_pack_eta 2s 2s +0%
polyveck_reduce 2s 4s -50%
polyveck_unpack_eta 2s 5s -60%
polyvecl_pack_eta 2s 3s -33%
polyz_unpack 2s 3s -33%
polyz_unpack_19_native_aarch64 2s 3s -33%
polyz_unpack_native 2s 2s +0%
reduce32 2s 3s -33%
shake128_release 2s 3s -33%
shake256 2s 2s +0%
shake256_release 2s 2s +0%
shake256x4_absorb_once 2s 2s +0%
shake256x4_squeezeblocks 2s 3s -33%
unpack_pk_t1 2s 3s -33%
yvec_get_poly 2s 4s -50%
intt_native_aarch64 1s 3s -67%
keccak_f1600_x4_native_aarch64_v84a 1s 3s -67%
keccak_f1600_x4_native_avx2 1s 3s -67%
mld_ct_cmask_neg_i32 1s 3s -67%
mld_ct_get_optblocker_u32 1s 1s +0%
mld_ct_sel_int32 1s 5s -80%
mld_keccakf1600x4_xor_bytes_c 1s 3s -67%
pack_sig_z 1s 2s -50%
poly_caddq_native 1s 5s -80%
poly_reduce 1s 3s -67%
shake256_absorb 1s 3s -67%
shake256_init 1s 3s -67%
unpack_sk_s1hat 1s 3s -67%

Replace the AVX2 C intrinsics for polyz_unpack with fully-unrolled
hand-written assembly, mirroring the existing AArch64 conversion, and
add HOL-Light functional-correctness and memory-safety proofs together
with CBMC contracts.

Both variants are covered:
- polyz_unpack_17 (GAMMA1 = 2^17, 18-bit packed, ML-DSA-44)
- polyz_unpack_19 (GAMMA1 = 2^19, 20-bit packed, ML-DSA-65/87)

Each routine unpacks 8 coefficients per block (VPSHUFB/VPSRLVD/VPAND/
VPSUBD) and builds the shuffle, shift, mask and gamma1 constants inline,
so it takes no table argument.

Resolves #925.
Resolves #915.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
@jakemas jakemas force-pushed the jakemas/polyz-unpack-asm branch from 5fa4415 to 59d66bc Compare June 15, 2026 21:33
@jakemas jakemas marked this pull request as ready for review June 15, 2026 22:08

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 46538 cycles 46504 cycles 1.00
ML-DSA-44 sign 131056 cycles 131079 cycles 1.00
ML-DSA-44 verify 47344 cycles 47310 cycles 1.00
ML-DSA-65 keypair 81686 cycles 81680 cycles 1.00
ML-DSA-65 sign 215338 cycles 215311 cycles 1.00
ML-DSA-65 verify 79299 cycles 79299 cycles 1
ML-DSA-87 keypair 132402 cycles 132405 cycles 1.00
ML-DSA-87 sign 277420 cycles 277328 cycles 1.00
ML-DSA-87 verify 134053 cycles 134051 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 113005 cycles 112758 cycles 1.00
ML-DSA-44 sign 401889 cycles 400845 cycles 1.00
ML-DSA-44 verify 119695 cycles 119413 cycles 1.00
ML-DSA-65 keypair 193028 cycles 192933 cycles 1.00
ML-DSA-65 sign 650224 cycles 649924 cycles 1.00
ML-DSA-65 verify 192946 cycles 192850 cycles 1.00
ML-DSA-87 keypair 318796 cycles 318753 cycles 1.00
ML-DSA-87 sign 828689 cycles 828716 cycles 1.00
ML-DSA-87 verify 326818 cycles 326677 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 112162 cycles 112128 cycles 1.00
ML-DSA-44 sign 353476 cycles 353788 cycles 1.00
ML-DSA-44 verify 117003 cycles 117189 cycles 1.00
ML-DSA-65 keypair 194780 cycles 194358 cycles 1.00
ML-DSA-65 sign 583867 cycles 583733 cycles 1.00
ML-DSA-65 verify 192717 cycles 193111 cycles 1.00
ML-DSA-87 keypair 320898 cycles 320083 cycles 1.00
ML-DSA-87 sign 747284 cycles 747201 cycles 1.00
ML-DSA-87 verify 318777 cycles 317895 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 211481 cycles 211527 cycles 1.00
ML-DSA-44 sign 758522 cycles 759858 cycles 1.00
ML-DSA-44 verify 228897 cycles 229351 cycles 1.00
ML-DSA-65 keypair 377221 cycles 378548 cycles 1.00
ML-DSA-65 sign 1247439 cycles 1247648 cycles 1.00
ML-DSA-65 verify 371392 cycles 372375 cycles 1.00
ML-DSA-87 keypair 600368 cycles 601887 cycles 1.00
ML-DSA-87 sign 1582562 cycles 1582472 cycles 1.00
ML-DSA-87 verify 616028 cycles 617767 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 269223 cycles 266639 cycles 1.01
ML-DSA-44 sign 810461 cycles 808808 cycles 1.00
ML-DSA-44 verify 269978 cycles 270153 cycles 1.00
ML-DSA-65 keypair 462109 cycles 461300 cycles 1.00
ML-DSA-65 sign 1320714 cycles 1323365 cycles 1.00
ML-DSA-65 verify 446492 cycles 447313 cycles 1.00
ML-DSA-87 keypair 788628 cycles 789934 cycles 1.00
ML-DSA-87 sign 1804002 cycles 1827662 cycles 0.99
ML-DSA-87 verify 769951 cycles 769879 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 43380 cycles 43415 cycles 1.00
ML-DSA-44 sign 130483 cycles 130832 cycles 1.00
ML-DSA-44 verify 45056 cycles 45316 cycles 0.99
ML-DSA-65 keypair 75565 cycles 75430 cycles 1.00
ML-DSA-65 sign 214566 cycles 215011 cycles 1.00
ML-DSA-65 verify 74423 cycles 74350 cycles 1.00
ML-DSA-87 keypair 123348 cycles 123299 cycles 1.00
ML-DSA-87 sign 271867 cycles 271370 cycles 1.00
ML-DSA-87 verify 120820 cycles 120802 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 91440 cycles 91498 cycles 1.00
ML-DSA-44 sign 351947 cycles 352373 cycles 1.00
ML-DSA-44 verify 99663 cycles 99791 cycles 1.00
ML-DSA-65 keypair 153951 cycles 153828 cycles 1.00
ML-DSA-65 sign 571505 cycles 571106 cycles 1.00
ML-DSA-65 verify 159919 cycles 159761 cycles 1.00
ML-DSA-87 keypair 255278 cycles 255664 cycles 1.00
ML-DSA-87 sign 726078 cycles 726163 cycles 1.00
ML-DSA-87 verify 264097 cycles 263879 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 55459 cycles 55320 cycles 1.00
ML-DSA-44 sign 158876 cycles 159183 cycles 1.00
ML-DSA-44 verify 57447 cycles 57873 cycles 0.99
ML-DSA-65 keypair 96676 cycles 95830 cycles 1.01
ML-DSA-65 sign 261462 cycles 263504 cycles 0.99
ML-DSA-65 verify 96066 cycles 96058 cycles 1.00
ML-DSA-87 keypair 154936 cycles 154555 cycles 1.00
ML-DSA-87 sign 322625 cycles 322765 cycles 1.00
ML-DSA-87 verify 151396 cycles 150846 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 133301 cycles 133213 cycles 1.00
ML-DSA-44 sign 518520 cycles 518708 cycles 1.00
ML-DSA-44 verify 146594 cycles 146419 cycles 1.00
ML-DSA-65 keypair 224120 cycles 225658 cycles 0.99
ML-DSA-65 sign 843541 cycles 848883 cycles 0.99
ML-DSA-65 verify 234447 cycles 236008 cycles 0.99
ML-DSA-87 keypair 370009 cycles 366727 cycles 1.01
ML-DSA-87 sign 1065537 cycles 1060028 cycles 1.01
ML-DSA-87 verify 383425 cycles 380316 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 462528 cycles 462947 cycles 1.00
ML-DSA-44 sign 2133159 cycles 2132913 cycles 1.00
ML-DSA-44 verify 554944 cycles 555065 cycles 1.00
ML-DSA-65 keypair 781336 cycles 781178 cycles 1.00
ML-DSA-65 sign 3478730 cycles 3495803 cycles 1.00
ML-DSA-65 verify 864668 cycles 863872 cycles 1.00
ML-DSA-87 keypair 1261932 cycles 1265586 cycles 1.00
ML-DSA-87 sign 4306755 cycles 4308959 cycles 1.00
ML-DSA-87 verify 1387384 cycles 1390453 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 67305 cycles 67251 cycles 1.00
ML-DSA-44 sign 198288 cycles 198318 cycles 1.00
ML-DSA-44 verify 70209 cycles 70240 cycles 1.00
ML-DSA-65 keypair 119599 cycles 119362 cycles 1.00
ML-DSA-65 sign 326513 cycles 325982 cycles 1.00
ML-DSA-65 verify 116914 cycles 116934 cycles 1.00
ML-DSA-87 keypair 196427 cycles 196595 cycles 1.00
ML-DSA-87 sign 421396 cycles 421892 cycles 1.00
ML-DSA-87 verify 193328 cycles 193379 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 46677 cycles 46860 cycles 1.00
ML-DSA-44 sign 138850 cycles 139486 cycles 1.00
ML-DSA-44 verify 49381 cycles 49318 cycles 1.00
ML-DSA-65 keypair 82168 cycles 81920 cycles 1.00
ML-DSA-65 sign 227285 cycles 227729 cycles 1.00
ML-DSA-65 verify 82051 cycles 81877 cycles 1.00
ML-DSA-87 keypair 129777 cycles 131127 cycles 0.99
ML-DSA-87 sign 280956 cycles 281288 cycles 1.00
ML-DSA-87 verify 128748 cycles 129015 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 61569 cycles 61685 cycles 1.00
ML-DSA-44 sign 188918 cycles 188995 cycles 1.00
ML-DSA-44 verify 66333 cycles 66273 cycles 1.00
ML-DSA-65 keypair 108150 cycles 111473 cycles 0.97
ML-DSA-65 sign 311564 cycles 312617 cycles 1.00
ML-DSA-65 verify 108710 cycles 110558 cycles 0.98
ML-DSA-87 keypair 170916 cycles 171460 cycles 1.00
ML-DSA-87 sign 378529 cycles 377513 cycles 1.00
ML-DSA-87 verify 169949 cycles 169102 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 127582 cycles 127647 cycles 1.00
ML-DSA-44 sign 441135 cycles 441052 cycles 1.00
ML-DSA-44 verify 136422 cycles 136340 cycles 1.00
ML-DSA-65 keypair 220535 cycles 220707 cycles 1.00
ML-DSA-65 sign 714612 cycles 713810 cycles 1.00
ML-DSA-65 verify 220992 cycles 220735 cycles 1.00
ML-DSA-87 keypair 364622 cycles 365112 cycles 1.00
ML-DSA-87 sign 915759 cycles 921310 cycles 0.99
ML-DSA-87 verify 370898 cycles 370798 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 118937 cycles 118568 cycles 1.00
ML-DSA-44 sign 458463 cycles 458777 cycles 1.00
ML-DSA-44 verify 130553 cycles 131112 cycles 1.00
ML-DSA-65 keypair 201113 cycles 200686 cycles 1.00
ML-DSA-65 sign 747063 cycles 743736 cycles 1.00
ML-DSA-65 verify 209161 cycles 209264 cycles 1.00
ML-DSA-87 keypair 330712 cycles 330171 cycles 1.00
ML-DSA-87 sign 940688 cycles 935697 cycles 1.01
ML-DSA-87 verify 343632 cycles 343489 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 154262 cycles 154753 cycles 1.00
ML-DSA-44 sign 587223 cycles 591158 cycles 0.99
ML-DSA-44 verify 169382 cycles 169947 cycles 1.00
ML-DSA-65 keypair 261747 cycles 261726 cycles 1.00
ML-DSA-65 sign 962117 cycles 963022 cycles 1.00
ML-DSA-65 verify 271696 cycles 271733 cycles 1.00
ML-DSA-87 keypair 431494 cycles 431775 cycles 1.00
ML-DSA-87 sign 1210792 cycles 1213443 cycles 1.00
ML-DSA-87 verify 446908 cycles 447885 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 71378 cycles 71561 cycles 1.00
ML-DSA-44 sign 208977 cycles 209019 cycles 1.00
ML-DSA-44 verify 74785 cycles 74745 cycles 1.00
ML-DSA-65 keypair 125939 cycles 125918 cycles 1.00
ML-DSA-65 sign 345606 cycles 345393 cycles 1.00
ML-DSA-65 verify 124103 cycles 124184 cycles 1.00
ML-DSA-87 keypair 207057 cycles 206649 cycles 1.00
ML-DSA-87 sign 444011 cycles 439798 cycles 1.01
ML-DSA-87 verify 204091 cycles 204451 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 137909 cycles 138038 cycles 1.00
ML-DSA-44 sign 485980 cycles 486082 cycles 1.00
ML-DSA-44 verify 149036 cycles 149093 cycles 1.00
ML-DSA-65 keypair 241505 cycles 241929 cycles 1.00
ML-DSA-65 sign 791913 cycles 791534 cycles 1.00
ML-DSA-65 verify 242154 cycles 241280 cycles 1.00
ML-DSA-87 keypair 395744 cycles 396309 cycles 1.00
ML-DSA-87 sign 1013505 cycles 1019346 cycles 0.99
ML-DSA-87 verify 403614 cycles 403741 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 112384 cycles 112529 cycles 1.00
ML-DSA-44 sign 354309 cycles 354040 cycles 1.00
ML-DSA-44 verify 117483 cycles 117389 cycles 1.00
ML-DSA-65 keypair 194487 cycles 194738 cycles 1.00
ML-DSA-65 sign 584490 cycles 584565 cycles 1.00
ML-DSA-65 verify 193511 cycles 193297 cycles 1.00
ML-DSA-87 keypair 320904 cycles 321000 cycles 1.00
ML-DSA-87 sign 747799 cycles 746870 cycles 1.00
ML-DSA-87 verify 318013 cycles 318737 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 224455 cycles 214312 cycles 1.05
ML-DSA-44 sign 634826 cycles 605496 cycles 1.05
ML-DSA-44 verify 231700 cycles 222932 cycles 1.04
ML-DSA-65 keypair 393071 cycles 390142 cycles 1.01
ML-DSA-65 sign 1021382 cycles 1004679 cycles 1.02
ML-DSA-65 verify 371512 cycles 374059 cycles 0.99
ML-DSA-87 keypair 668383 cycles 652450 cycles 1.02
ML-DSA-87 sign 1390293 cycles 1336385 cycles 1.04
ML-DSA-87 verify 648013 cycles 633167 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 224455 cycles 214312 cycles 1.05
ML-DSA-44 sign 634826 cycles 605496 cycles 1.05
ML-DSA-44 verify 231700 cycles 222932 cycles 1.04
ML-DSA-87 sign 1390293 cycles 1336385 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 212278 cycles 211802 cycles 1.00
ML-DSA-44 sign 761115 cycles 759783 cycles 1.00
ML-DSA-44 verify 230009 cycles 229307 cycles 1.00
ML-DSA-65 keypair 378733 cycles 377288 cycles 1.00
ML-DSA-65 sign 1248066 cycles 1247192 cycles 1.00
ML-DSA-65 verify 373449 cycles 371605 cycles 1.00
ML-DSA-87 keypair 602576 cycles 601230 cycles 1.00
ML-DSA-87 sign 1584726 cycles 1584827 cycles 1.00
ML-DSA-87 verify 618395 cycles 616717 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 308287 cycles 312767 cycles 0.99
ML-DSA-44 sign 1183932 cycles 1207539 cycles 0.98
ML-DSA-44 verify 345058 cycles 342752 cycles 1.01
ML-DSA-65 keypair 574462 cycles 567893 cycles 1.01
ML-DSA-65 sign 1953652 cycles 1961247 cycles 1.00
ML-DSA-65 verify 551106 cycles 560470 cycles 0.98
ML-DSA-87 keypair 880489 cycles 849145 cycles 1.04
ML-DSA-87 sign 2480605 cycles 2376735 cycles 1.04
ML-DSA-87 verify 925298 cycles 890655 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-87 keypair 880489 cycles 849145 cycles 1.04
ML-DSA-87 sign 2480605 cycles 2376735 cycles 1.04
ML-DSA-87 verify 925298 cycles 890655 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Details
Benchmark suite Current: 59d66bc Previous: 9c5eceb Ratio
ML-DSA-44 keypair 759819 cycles 759567 cycles 1.00
ML-DSA-44 sign 3138270 cycles 3137937 cycles 1.00
ML-DSA-44 verify 858910 cycles 858870 cycles 1.00
ML-DSA-65 keypair 1285508 cycles 1285661 cycles 1.00
ML-DSA-65 sign 5072844 cycles 5075651 cycles 1.00
ML-DSA-65 verify 1363435 cycles 1364449 cycles 1.00
ML-DSA-87 keypair 2112235 cycles 2111749 cycles 1.00
ML-DSA-87 sign 6345694 cycles 6355811 cycles 1.00
ML-DSA-87 verify 2227450 cycles 2228276 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer mkannwischer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakemas! This looks great!

@hanno-becker, could take a quick look, too?

@hanno-becker hanno-becker left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, thank you @jakemas!

@hanno-becker hanno-becker merged commit c787849 into main Jun 16, 2026
442 checks passed
@hanno-becker hanno-becker deleted the jakemas/polyz-unpack-asm branch June 16, 2026 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AVX2: Replace intrinsics implementation of polyz_unpack with assembly HOL-Light: Prove AVX2 polyz_unpack

4 participants