Skip to content

x86_64 + HOL-Light: Replace poly_decompose AVX2 intrinsics with hand-written assembly and HOL-Light proofs#1163

Closed
jakemas wants to merge 1 commit into
mainfrom
jakemas/poly-decompose-asm
Closed

x86_64 + HOL-Light: Replace poly_decompose AVX2 intrinsics with hand-written assembly and HOL-Light proofs#1163
jakemas wants to merge 1 commit into
mainfrom
jakemas/poly-decompose-asm

Conversation

@jakemas

@jakemas jakemas commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Resolves #420
Resolves #914

Performance

poly_decompose component benchmark, median cycles on AMD EPYC (c6a), OPT=1 CYCLES=PMU:

Variant AVX2 intrinsics (baseline, main) Hand-written AVX2 asm (this PR)
decompose_32 (ML-DSA-65/87) ~1606 ~1607
decompose_88 (ML-DSA-44) ~1869 ~1870

@jakemas jakemas force-pushed the jakemas/poly-decompose-asm branch 4 times, most recently from 753befe to 460553d Compare June 10, 2026 05:28
@oqs-bot

oqs-bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-44, REDUCE-RAM)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1570s 1507s +4.2%
mld_invntt_layer 171s 157s +9%
poly_pointwise_montgomery_c 131s 125s +5%
rej_uniform_native 128s 123s +4%
polyvec_matrix_pointwise_montgomery_yvec 120s 112s +7%
mld_ct_memcmp 65s 63s +3%
mld_ntt_layer 45s 44s +2%
fqmul 44s 40s +10%
sign_verify_internal 25s 22s +14%
mld_attempt_signature_generation 24s 23s +4%
mld_ntt_butterfly_block 24s 20s +20%
keccakf1600x4_permute_native 23s 21s +10%
poly_chknorm_c 19s 19s +0%
polyt0_unpack 18s 15s +20%
rej_uniform_c 17s 14s +21%
mld_check_pct 14s 13s +8%
polyeta_unpack 14s 14s +0%
poly_uniform_eta_4x 13s 15s -13%
poly_add 11s 13s -15%
polyz_unpack_c 11s 10s +10%
compute_pack_t0_t1 10s 9s +11%
keccak_absorb_once_x4 10s 10s +0%
polyveck_chknorm 10s 11s -9%
polyvec_matrix_pointwise_montgomery_row 9s 6s +50%
polyveck_decompose 8s 7s +14%
mld_keccakf1600_permute_c 7s 6s +17%
pointwise_acc_native_x86_64 7s 4s +75%
poly_invntt_tomont_c 7s 9s -22%
rej_uniform 7s 9s -22%
sign 7s 6s +17%
keccak_absorb 6s 5s +20%
keccakf1600_xor_bytes (big endian) 6s 5s +20%
mld_compute_pack_z 6s 7s -14%
mld_sample_s1_s2 6s 2s +200%
poly_challenge 6s 2s +200%
poly_decompose_c 6s 7s -14%
poly_power2round 6s 6s +0%
poly_uniform_eta 6s 3s +100%
polyveck_invntt_tomont 6s 5s +20%
polyvecl_chknorm 6s 5s +20%
keccakf1600x4_permute 5s 3s +67%
keccakf1600x4_xor_bytes_native 5s 3s +67%
mld_h 5s 3s +67%
ntt_native_x86_64 5s 4s +25%
pack_sig_z 5s 2s +150%
pointwise_acc_native_aarch64 5s 4s +25%
poly_permute_bitrev_to_custom_optional 5s 2s +150%
poly_reduce 5s 6s -17%
polyveck_caddq 5s 5s +0%
polyvecl_pack_eta 5s 3s +67%
polyvecl_pointwise_acc_montgomery_native 5s 4s +25%
polyz_pack 5s 3s +67%
rej_eta_c 5s 3s +67%
shake256_absorb 5s 2s +150%
sign_open 5s 7s -29%
sign_signature_pre_hash_internal 5s 4s +25%
decompose 4s 3s +33%
intt_native_aarch64 4s 3s +33%
keccak_f1600_x1_native_aarch64 4s 1s +300%
keccak_f1600_x1_native_aarch64_v84a 4s 3s +33%
keccak_f1600_x4_native_aarch64_v84a 4s 4s +0%
keccak_squeezeblocks_x4 4s 5s -20%
mld_prepare_domain_separation_prefix 4s 3s +33%
mld_sample_s1_s2_serial 4s 3s +33%
mld_value_barrier_u32 4s 2s +100%
pointwise_native_aarch64 4s 3s +33%
poly_caddq_c 4s 5s -20%
poly_caddq_native_aarch64 4s 3s +33%
poly_chknorm_native 4s 6s -33%
poly_chknorm_native_aarch64 4s 1s +300%
poly_decompose 4s 4s +0%
poly_invntt_tomont_native 4s 2s +100%
poly_ntt_c 4s 3s +33%
poly_ntt_native 4s 4s +0%
poly_permute_bitrev_to_custom_optional_native 4s 5s -20%
poly_pointwise_montgomery 4s 5s -20%
poly_pointwise_montgomery_native 4s 3s +33%
poly_uniform 4s 4s +0%
poly_uniform_4x 4s 3s +33%
poly_use_hint 4s 2s +100%
poly_use_hint_native 4s 4s +0%
poly_use_hint_native_aarch64 4s 3s +33%
polyt1_pack 4s 2s +100%
polyt1_unpack 4s 4s +0%
polyvec_matrix_expand 4s 3s +33%
polyvecl_pointwise_acc_montgomery 4s 2s +100%
polyvecl_uniform_gamma1 4s 2s +100%
polyvecl_unpack_eta 4s 4s +0%
polyz_unpack_17_native_aarch64 4s 2s +100%
rej_eta 4s 4s +0%
shake128x4_squeezeblocks 4s 3s +33%
sign_keypair_internal 4s 4s +0%
sign_pk_from_sk 4s 6s -33%
sign_signature_internal 4s 5s -20%
sign_signature_pre_hash_shake256 4s 2s +100%
sign_verify_extmu 4s 3s +33%
sign_verify_pre_hash_internal 4s 3s +33%
sign_verify_pre_hash_shake256 4s 4s +0%
sk_s1hat_get_poly 4s 2s +100%
sk_s2hat_get_poly 4s 4s +0%
sys_check_capability 4s 2s +100%
unpack_sk_s1hat 4s 3s +33%
unpack_sk_s2hat 4s 4s +0%
yvec_get_poly 4s 5s -20%
fqscale 3s 5s -40%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 2s +50%
keccak_f1600_x4_native_avx2 3s 2s +50%
keccak_finalize 3s 4s -25%
keccak_init 3s 2s +50%
keccakf1600_permute_native 3s 2s +50%
keccakf1600x4_extract_bytes_native 3s 2s +50%
mld_ct_cmask_nonzero_u32 3s 3s +0%
mld_polymat_expand_entry 3s 3s +0%
nttunpack_native_x86_64 3s 3s +0%
pack_sig_c 3s 3s +0%
pack_sig_h 3s 2s +50%
pack_sk_rho_key_tr_s2 3s 3s +0%
poly_caddq 3s 3s +0%
poly_caddq_native 3s 6s -50%
poly_caddq_native_x86_64 3s 4s -25%
poly_decompose_32_native_aarch64 3s 2s +50%
poly_decompose_native 3s 4s -25%
poly_ntt 3s 3s +0%
poly_shiftl 3s 5s -40%
poly_sub 3s 3s +0%
poly_uniform_gamma1 3s 2s +50%
poly_uniform_gamma1_4x 3s 5s -40%
poly_use_hint_c 3s 2s +50%
polyeta_pack 3s 2s +50%
polyt0_pack 3s 5s -40%
polyveck_pack_eta 3s 2s +50%
polyveck_reduce 3s 3s +0%
polyvecl_ntt 3s 5s -40%
polyvecl_unpack_z 3s 4s -25%
polyw1_pack 3s 3s +0%
power2round 3s 3s +0%
reduce32 3s 3s +0%
rej_eta_native 3s 4s -25%
rej_uniform_eta_native_aarch64 3s 3s +0%
shake128_absorb 3s 4s -25%
shake128_finalize 3s 3s +0%
shake256_init 3s 1s +200%
shake256_squeeze 3s 3s +0%
shake256x4_absorb_once 3s 2s +50%
sign_keypair 3s 6s -50%
sign_signature 3s 4s -25%
sign_signature_extmu 3s 4s -25%
unpack_pk_t1 3s 1s +200%
yvec_init 3s 3s +0%
caddq 2s 4s -50%
intt_native_x86_64 2s 3s -33%
keccak_squeeze 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 5s -60%
keccakf1600_permute 2s 4s -50%
keccakf1600_xor_bytes 2s 4s -50%
keccakf1600x4_extract_bytes 2s 1s +100%
keccakf1600x4_xor_bytes 2s 2s +0%
make_hint 2s 2s +0%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 4s -50%
mld_ct_get_optblocker_u32 2s 1s +100%
mld_ct_sel_int32 2s 1s +100%
mld_keccakf1600_extract_bytes 2s 2s +0%
mld_keccakf1600x4_extract_bytes_c 2s 1s +100%
mld_keccakf1600x4_xor_bytes_c 2s 3s -33%
mld_value_barrier_i64 2s 1s +100%
mld_value_barrier_u8 2s 2s +0%
ntt_native_aarch64 2s 3s -33%
pack_sk_s1 2s 2s +0%
pointwise_native_x86_64 2s 6s -67%
poly_chknorm 2s 3s -33%
poly_chknorm_native_x86_64 2s 3s -33%
poly_decompose_88_native_aarch64 2s 3s -33%
poly_decompose_native_x86_64 2s - new
poly_invntt_tomont 2s 3s -33%
polyvec_matrix_expand_serial 2s 3s -33%
polyveck_ntt 2s 1s +100%
polyveck_pack_w1 2s 2s +0%
polyvecl_uniform_gamma1_serial 2s 3s -33%
polyw1_pack_32 2s 3s -33%
polyz_unpack 2s 2s +0%
polyz_unpack_19_native_aarch64 2s 3s -33%
polyz_unpack_native 2s 2s +0%
rej_uniform_native_aarch64 2s 4s -50%
shake128_squeeze 2s 2s +0%
shake128x4_absorb_once 2s 3s -33%
shake256 2s 4s -50%
shake256_release 2s 1s +100%
sig_unpack_hints 2s 3s -33%
sign_verify 2s 7s -71%
sk_t0hat_get_poly 2s 3s -33%
unpack_sk_t0hat 2s 4s -50%
use_hint 2s 4s -50%
mld_ct_get_optblocker_i64 1s 2s -50%
mld_ct_get_optblocker_u8 1s 3s -67%
montgomery_reduce 1s 4s -75%
polyveck_unpack_eta 1s 4s -75%
polyvecl_pointwise_acc_montgomery_c 1s 5s -80%
polyw1_pack_88 1s 4s -75%
shake128_init 1s 3s -67%
shake128_release 1s 2s -50%
shake256_finalize 1s 2s -50%
shake256x4_squeezeblocks 1s 2s -50%
unpack_sk 1s 2s -50%

@oqs-bot

oqs-bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-87, REDUCE-RAM)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1643s 1654s -0.7%
mld_invntt_layer 173s 179s -3%
polyvec_matrix_pointwise_montgomery_yvec 157s 163s -4%
poly_pointwise_montgomery_c 133s 140s -5%
rej_uniform_native 128s 131s -2%
mld_ct_memcmp 68s 74s -8%
mld_ntt_layer 45s 44s +2%
fqmul 42s 43s -2%
mld_attempt_signature_generation 36s 35s +3%
mld_ntt_butterfly_block 23s 26s -12%
keccakf1600x4_permute_native 22s 22s +0%
sign_verify_internal 22s 22s +0%
poly_chknorm_c 20s 20s +0%
poly_uniform_eta_4x 18s 15s +20%
polyt0_unpack 18s 18s +0%
polyveck_decompose 16s 16s +0%
rej_uniform_c 16s 17s -6%
polyeta_unpack 15s 14s +7%
mld_check_pct 14s 16s -12%
compute_pack_t0_t1 12s 9s +33%
polyvecl_chknorm 11s 10s +10%
poly_add 10s 12s -17%
poly_invntt_tomont_c 10s 8s +25%
keccak_absorb_once_x4 9s 10s -10%
mld_prepare_domain_separation_prefix 9s 5s +80%
sign 9s 8s +12%
polyvecl_ntt 8s 8s +0%
keccak_absorb 7s 8s -12%
keccak_squeeze 7s 3s +133%
poly_caddq_c 7s 3s +133%
poly_power2round 7s 9s -22%
polyveck_caddq 7s 8s -12%
polyveck_invntt_tomont 7s 8s -12%
mld_keccakf1600_permute_c 6s 7s -14%
mld_sample_s1_s2 6s 5s +20%
pointwise_acc_native_aarch64 6s 7s -14%
pointwise_acc_native_x86_64 6s 8s -25%
poly_shiftl 6s 5s +20%
polyveck_chknorm 6s 5s +20%
polyz_unpack_c 6s 8s -25%
rej_uniform 6s 6s +0%
sign_keypair_internal 6s 3s +100%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 5s 2s +150%
keccak_f1600_x4_native_avx2 5s 3s +67%
keccak_finalize 5s 3s +67%
keccak_squeezeblocks_x4 5s 5s +0%
mld_compute_pack_z 5s 5s +0%
mld_h 5s 6s -17%
poly_chknorm_native_aarch64 5s 4s +25%
poly_reduce 5s 5s +0%
poly_uniform_eta 5s 5s +0%
polyvec_matrix_expand_serial 5s 4s +25%
polyvec_matrix_pointwise_montgomery_row 5s 6s -17%
polyveck_reduce 5s 5s +0%
polyw1_pack 5s 4s +25%
power2round 5s 2s +150%
reduce32 5s 3s +67%
rej_eta_c 5s 2s +150%
sign_open 5s 3s +67%
sign_pk_from_sk 5s 4s +25%
sign_signature_internal 5s 7s -29%
yvec_get_poly 5s 4s +25%
keccak_f1600_x1_native_aarch64_v84a 4s 2s +100%
keccak_init 4s 4s +0%
keccakf1600_xor_bytes 4s 3s +33%
mld_ct_cmask_nonzero_u32 4s 3s +33%
mld_keccakf1600x4_xor_bytes_c 4s 1s +300%
mld_polymat_expand_entry 4s 3s +33%
mld_sample_s1_s2_serial 4s 6s -33%
ntt_native_aarch64 4s 4s +0%
ntt_native_x86_64 4s 4s +0%
poly_caddq_native_aarch64 4s 3s +33%
poly_challenge 4s 4s +0%
poly_chknorm_native 4s 3s +33%
poly_decompose_32_native_aarch64 4s 1s +300%
poly_decompose_88_native_aarch64 4s 4s +0%
poly_decompose_native 4s 3s +33%
poly_invntt_tomont 4s 2s +100%
poly_uniform_4x 4s 2s +100%
poly_use_hint_c 4s 4s +0%
polyveck_pack_eta 4s 4s +0%
polyveck_pack_w1 4s 3s +33%
polyvecl_pointwise_acc_montgomery 4s 4s +0%
polyvecl_pointwise_acc_montgomery_c 4s 1s +300%
polyvecl_pointwise_acc_montgomery_native 4s 3s +33%
polyvecl_uniform_gamma1_serial 4s 2s +100%
polyvecl_unpack_z 4s 2s +100%
polyw1_pack_32 4s 1s +300%
polyz_pack 4s 3s +33%
polyz_unpack_native 4s 2s +100%
rej_eta 4s 3s +33%
shake256x4_squeezeblocks 4s 2s +100%
sign_keypair 4s 2s +100%
sign_signature_pre_hash_shake256 4s 5s -20%
sign_verify 4s 4s +0%
sign_verify_pre_hash_internal 4s 3s +33%
sign_verify_pre_hash_shake256 4s 2s +100%
sys_check_capability 4s 5s -20%
unpack_sk 4s 2s +100%
unpack_sk_s2hat 4s 3s +33%
decompose 3s 1s +200%
intt_native_aarch64 3s 2s +50%
intt_native_x86_64 3s 3s +0%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccakf1600x4_extract_bytes 3s 3s +0%
keccakf1600x4_extract_bytes_native 3s 1s +200%
keccakf1600x4_xor_bytes 3s 4s -25%
keccakf1600x4_xor_bytes_native 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 3s +0%
mld_ct_get_optblocker_i64 3s 4s -25%
mld_ct_get_optblocker_u8 3s 3s +0%
mld_keccakf1600_extract_bytes 3s 2s +50%
mld_value_barrier_u32 3s 2s +50%
mld_value_barrier_u8 3s 2s +50%
pack_sig_c 3s 5s -40%
pack_sig_h 3s 4s -25%
pack_sk_rho_key_tr_s2 3s 1s +200%
pack_sk_s1 3s 3s +0%
pointwise_native_x86_64 3s 4s -25%
poly_caddq_native 3s 2s +50%
poly_chknorm 3s 5s -40%
poly_decompose 3s 2s +50%
poly_decompose_native_x86_64 3s - new
poly_invntt_tomont_native 3s 2s +50%
poly_ntt 3s 3s +0%
poly_permute_bitrev_to_custom_optional_native 3s 2s +50%
poly_pointwise_montgomery_native 3s 3s +0%
poly_sub 3s 2s +50%
poly_uniform_gamma1 3s 2s +50%
poly_use_hint_native 3s 5s -40%
polyt0_pack 3s 4s -25%
polyveck_ntt 3s 5s -40%
polyveck_unpack_eta 3s 4s -25%
polyvecl_uniform_gamma1 3s 4s -25%
polyw1_pack_88 3s 2s +50%
polyz_unpack 3s 2s +50%
polyz_unpack_17_native_aarch64 3s 3s +0%
rej_eta_native 3s 4s -25%
rej_uniform_eta_native_aarch64 3s 5s -40%
rej_uniform_native_aarch64 3s 3s +0%
shake256_absorb 3s 3s +0%
sign_signature_extmu 3s 4s -25%
sign_signature_pre_hash_internal 3s 6s -50%
sign_verify_extmu 3s 5s -40%
sk_t0hat_get_poly 3s 1s +200%
unpack_pk_t1 3s 4s -25%
unpack_sk_s1hat 3s 5s -40%
unpack_sk_t0hat 3s 2s +50%
use_hint 3s 1s +200%
yvec_init 3s 2s +50%
caddq 2s 4s -50%
fqscale 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 1s +100%
keccakf1600_permute 2s 1s +100%
keccakf1600_permute_native 2s 3s -33%
keccakf1600x4_permute 2s 2s +0%
make_hint 2s 2s +0%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_ct_sel_int32 2s 1s +100%
mld_keccakf1600x4_extract_bytes_c 2s 2s +0%
mld_value_barrier_i64 2s 4s -50%
montgomery_reduce 2s 3s -33%
pack_sig_z 2s 4s -50%
pointwise_native_aarch64 2s 3s -33%
poly_caddq 2s 3s -33%
poly_caddq_native_x86_64 2s 3s -33%
poly_chknorm_native_x86_64 2s 3s -33%
poly_decompose_c 2s 6s -67%
poly_ntt_c 2s 4s -50%
poly_ntt_native 2s 3s -33%
poly_permute_bitrev_to_custom_optional 2s 4s -50%
poly_pointwise_montgomery 2s 4s -50%
poly_uniform 2s 4s -50%
poly_use_hint 2s 4s -50%
poly_use_hint_native_aarch64 2s 3s -33%
polyt1_pack 2s 3s -33%
polyt1_unpack 2s 2s +0%
polyvec_matrix_expand 2s 3s -33%
polyvecl_pack_eta 2s 3s -33%
polyz_unpack_19_native_aarch64 2s 2s +0%
shake128_squeeze 2s 1s +100%
shake128x4_squeezeblocks 2s 1s +100%
shake256 2s 4s -50%
shake256_finalize 2s 4s -50%
shake256_init 2s 2s +0%
shake256_release 2s 4s -50%
shake256_squeeze 2s 3s -33%
shake256x4_absorb_once 2s 2s +0%
sig_unpack_hints 2s 4s -50%
sign_signature 2s 3s -33%
sk_s1hat_get_poly 2s 2s +0%
sk_s2hat_get_poly 2s 2s +0%
keccakf1600_extract_bytes (big endian) 1s 2s -50%
keccakf1600_xor_bytes (big endian) 1s 3s -67%
nttunpack_native_x86_64 1s 3s -67%
poly_uniform_gamma1_4x 1s 4s -75%
polyeta_pack 1s 3s -67%
polyvecl_unpack_eta 1s 3s -67%
shake128_absorb 1s 3s -67%
shake128_finalize 1s 2s -50%
shake128_init 1s 2s -50%
shake128_release 1s 2s -50%
shake128x4_absorb_once 1s 1s +0%

@oqs-bot

oqs-bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-65, REDUCE-RAM)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1503s 1601s -6.1%
mld_invntt_layer 156s 181s -14%
poly_pointwise_montgomery_c 131s 140s -6%
rej_uniform_native 122s 133s -8%
polyvec_matrix_pointwise_montgomery_yvec 81s 85s -5%
mld_ct_memcmp 63s 73s -14%
mld_ntt_layer 45s 44s +2%
fqmul 40s 44s -9%
polyveck_chknorm 38s 40s -5%
keccakf1600x4_permute_native 23s 23s +0%
mld_attempt_signature_generation 22s 25s -12%
mld_ntt_butterfly_block 21s 25s -16%
poly_chknorm_c 16s 21s -24%
polyvecl_chknorm 15s 16s -6%
sign_verify_internal 15s 15s +0%
mld_check_pct 14s 14s +0%
polyt0_unpack 14s 16s -12%
polyveck_decompose 14s 16s -12%
rej_uniform_c 14s 19s -26%
poly_add 11s 11s +0%
poly_uniform_eta_4x 11s 14s -21%
polyvec_matrix_pointwise_montgomery_row 10s 8s +25%
pointwise_acc_native_x86_64 8s 7s +14%
polyeta_unpack 8s 7s +14%
polyvecl_ntt 8s 9s -11%
rej_uniform 8s 9s -11%
keccak_absorb 7s 7s +0%
keccak_absorb_once_x4 7s 9s -22%
mld_h 7s 4s +75%
mld_keccakf1600_permute_c 7s 7s +0%
poly_invntt_tomont_c 7s 11s -36%
sign 7s 7s +0%
sign_pk_from_sk 7s 5s +40%
sign_signature_pre_hash_internal 7s 5s +40%
compute_pack_t0_t1 6s 6s +0%
mld_sample_s1_s2_serial 6s 4s +50%
pointwise_acc_native_aarch64 6s 8s -25%
poly_reduce 6s 3s +100%
poly_uniform 6s 3s +100%
poly_use_hint_c 6s 3s +100%
polyveck_reduce 6s 4s +50%
polyvecl_pointwise_acc_montgomery_native 6s 3s +100%
polyz_unpack_c 6s 5s +20%
sign_keypair_internal 6s 2s +200%
sign_open 6s 5s +20%
mld_compute_pack_z 5s 6s -17%
mld_sample_s1_s2 5s 6s -17%
ntt_native_x86_64 5s 6s -17%
pointwise_native_x86_64 5s 2s +150%
poly_chknorm 5s 3s +67%
poly_power2round 5s 7s -29%
poly_uniform_gamma1 5s 3s +67%
polyveck_caddq 5s 7s -29%
polyveck_unpack_eta 5s 2s +150%
polyvecl_uniform_gamma1_serial 5s 1s +400%
polyz_unpack_19_native_aarch64 5s 2s +150%
rej_eta_c 5s 3s +67%
rej_uniform_eta_native_aarch64 5s 3s +67%
shake128_finalize 5s 3s +67%
sign_verify 5s 3s +67%
caddq 4s 4s +0%
intt_native_aarch64 4s 2s +100%
intt_native_x86_64 4s 3s +33%
keccak_f1600_x1_native_aarch64_v84a 4s 2s +100%
keccak_finalize 4s 2s +100%
keccakf1600_permute_native 4s 1s +300%
keccakf1600x4_xor_bytes_native 4s 3s +33%
mld_polymat_expand_entry 4s 3s +33%
montgomery_reduce 4s 3s +33%
poly_decompose 4s 2s +100%
poly_decompose_32_native_aarch64 4s 3s +33%
poly_decompose_88_native_aarch64 4s 1s +300%
poly_decompose_c 4s 6s -33%
poly_decompose_native_x86_64 4s - new
poly_ntt 4s 4s +0%
poly_ntt_c 4s 3s +33%
poly_pointwise_montgomery_native 4s 5s -20%
polyveck_invntt_tomont 4s 8s -50%
polyvecl_uniform_gamma1 4s 3s +33%
polyw1_pack_32 4s 1s +300%
polyw1_pack_88 4s 3s +33%
rej_eta_native 4s 5s -20%
rej_uniform_native_aarch64 4s 2s +100%
shake256 4s 2s +100%
sig_unpack_hints 4s 3s +33%
sign_signature_extmu 4s 3s +33%
sign_signature_internal 4s 5s -20%
sign_signature_pre_hash_shake256 4s 3s +33%
sign_verify_extmu 4s 4s +0%
sign_verify_pre_hash_internal 4s 5s -20%
sk_s2hat_get_poly 4s 4s +0%
unpack_sk_s2hat 4s 2s +100%
use_hint 4s 3s +33%
yvec_init 4s 1s +300%
decompose 3s 1s +200%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 5s -40%
keccak_f1600_x4_native_avx2 3s 2s +50%
keccak_init 3s 3s +0%
keccak_squeezeblocks_x4 3s 5s -40%
keccakf1600_extract_bytes (big endian) 3s 4s -25%
keccakf1600_xor_bytes (big endian) 3s 4s -25%
keccakf1600x4_permute 3s 3s +0%
mld_ct_abs_i32 3s 3s +0%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_ct_sel_int32 3s 1s +200%
mld_prepare_domain_separation_prefix 3s 3s +0%
ntt_native_aarch64 3s 3s +0%
pack_sig_c 3s 4s -25%
pack_sig_h 3s 5s -40%
pack_sk_rho_key_tr_s2 3s 2s +50%
poly_caddq 3s 4s -25%
poly_caddq_c 3s 6s -50%
poly_caddq_native 3s 6s -50%
poly_caddq_native_aarch64 3s 3s +0%
poly_challenge 3s 3s +0%
poly_chknorm_native 3s 4s -25%
poly_chknorm_native_aarch64 3s 3s +0%
poly_chknorm_native_x86_64 3s 3s +0%
poly_decompose_native 3s 4s -25%
poly_invntt_tomont_native 3s 3s +0%
poly_permute_bitrev_to_custom_optional 3s 3s +0%
poly_permute_bitrev_to_custom_optional_native 3s 4s -25%
poly_pointwise_montgomery 3s 2s +50%
poly_shiftl 3s 5s -40%
poly_sub 3s 6s -50%
poly_uniform_4x 3s 3s +0%
poly_uniform_eta 3s 4s -25%
poly_use_hint_native 3s 3s +0%
polyt1_pack 3s 3s +0%
polyveck_ntt 3s 2s +50%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyvecl_pointwise_acc_montgomery_c 3s 3s +0%
polyvecl_unpack_eta 3s 2s +50%
polyvecl_unpack_z 3s 3s +0%
shake128_absorb 3s 3s +0%
shake128_release 3s 3s +0%
shake128x4_absorb_once 3s 1s +200%
shake256_absorb 3s 2s +50%
shake256_init 3s 4s -25%
sign_keypair 3s 10s -70%
sign_signature 3s 2s +50%
sk_s1hat_get_poly 3s 3s +0%
sk_t0hat_get_poly 3s 2s +50%
unpack_pk_t1 3s 4s -25%
unpack_sk 3s 2s +50%
unpack_sk_s1hat 3s 3s +0%
yvec_get_poly 3s 3s +0%
fqscale 2s 4s -50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccakf1600_permute 2s 2s +0%
keccakf1600x4_extract_bytes_native 2s 3s -33%
keccakf1600x4_xor_bytes 2s 3s -33%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_ct_get_optblocker_u8 2s 1s +100%
mld_keccakf1600_extract_bytes 2s 4s -50%
mld_value_barrier_i64 2s 1s +100%
mld_value_barrier_u32 2s 3s -33%
mld_value_barrier_u8 2s 4s -50%
nttunpack_native_x86_64 2s 2s +0%
pack_sig_z 2s 4s -50%
pack_sk_s1 2s 2s +0%
pointwise_native_aarch64 2s 3s -33%
poly_caddq_native_x86_64 2s 4s -50%
poly_uniform_gamma1_4x 2s 2s +0%
poly_use_hint 2s 4s -50%
poly_use_hint_native_aarch64 2s 3s -33%
polyeta_pack 2s 3s -33%
polyt0_pack 2s 4s -50%
polyvec_matrix_expand_serial 2s 1s +100%
polyveck_pack_eta 2s 4s -50%
polyveck_pack_w1 2s 2s +0%
polyvecl_pack_eta 2s 1s +100%
polyw1_pack 2s 3s -33%
polyz_pack 2s 3s -33%
polyz_unpack_17_native_aarch64 2s 4s -50%
power2round 2s 3s -33%
reduce32 2s 3s -33%
rej_eta 2s 3s -33%
shake128_squeeze 2s 2s +0%
shake128x4_squeezeblocks 2s 3s -33%
shake256_release 2s 3s -33%
shake256x4_absorb_once 2s 1s +100%
shake256x4_squeezeblocks 2s 3s -33%
sign_verify_pre_hash_shake256 2s 8s -75%
sys_check_capability 2s 4s -50%
unpack_sk_t0hat 2s 3s -33%
keccak_squeeze 1s 1s +0%
keccakf1600_xor_bytes 1s 3s -67%
keccakf1600x4_extract_bytes 1s 2s -50%
make_hint 1s 2s -50%
mld_ct_cmask_neg_i32 1s 1s +0%
mld_ct_cmask_nonzero_u32 1s 3s -67%
mld_keccakf1600x4_extract_bytes_c 1s 5s -80%
mld_keccakf1600x4_xor_bytes_c 1s 2s -50%
poly_invntt_tomont 1s 5s -80%
poly_ntt_native 1s 2s -50%
polyt1_unpack 1s 3s -67%
polyvec_matrix_expand 1s 5s -80%
polyz_unpack 1s 3s -67%
polyz_unpack_native 1s 2s -50%
shake128_init 1s 2s -50%
shake256_finalize 1s 2s -50%
shake256_squeeze 1s 2s -50%

@oqs-bot

oqs-bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-65)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 2039s 2067s -1.4%
mld_invntt_layer 270s 296s -9%
polyvecl_pointwise_acc_montgomery_c 187s 205s -9%
rej_uniform_native 147s 145s +1%
polyvec_matrix_expand 130s 127s +2%
poly_pointwise_montgomery_c 95s 100s -5%
mld_ct_memcmp 67s 67s +0%
mld_attempt_signature_generation 63s 66s -5%
sign_verify_internal 63s 61s +3%
sign_signature_internal 46s 45s +2%
mld_ntt_layer 45s 44s +2%
fqmul 42s 41s +2%
polyvec_matrix_expand_serial 26s 26s +0%
keccakf1600x4_permute_native 24s 21s +14%
mld_ntt_butterfly_block 23s 20s +15%
rej_uniform 21s 21s +0%
poly_chknorm_c 18s 20s -10%
polyt0_unpack 16s 15s +7%
polyveck_decompose 16s 15s +7%
polyvecl_chknorm 15s 18s -17%
compute_pack_t0_t1 14s 15s -7%
poly_uniform_4x 13s 10s +30%
rej_uniform_c 13s 14s -7%
mld_check_pct 11s 11s +0%
poly_add 11s 10s +10%
poly_uniform_eta_4x 11s 11s +0%
polyvec_matrix_pointwise_montgomery_yvec 11s 8s +38%
mld_compute_pack_z 9s 12s -25%
polyveck_chknorm 9s 10s -10%
polyveck_ntt 9s 8s +12%
polyvecl_ntt 9s 6s +50%
keccak_absorb_once_x4 8s 7s +14%
poly_invntt_tomont_c 8s 7s +14%
polyveck_invntt_tomont 8s 7s +14%
polyvecl_pointwise_acc_montgomery_native 8s 4s +100%
keccak_squeezeblocks_x4 7s 3s +133%
mld_h 7s 2s +250%
mld_keccakf1600_permute_c 7s 5s +40%
polyz_unpack_c 7s 5s +40%
sign_open 7s 3s +133%
sign_verify_pre_hash_shake256 7s 3s +133%
keccak_absorb 6s 5s +20%
mld_prepare_domain_separation_prefix 6s 5s +20%
pointwise_acc_native_aarch64 6s 6s +0%
pointwise_acc_native_x86_64 6s 6s +0%
poly_caddq_native_aarch64 6s 2s +200%
poly_challenge 6s 4s +50%
polyveck_caddq 6s 4s +50%
unpack_sk_t0hat 6s 4s +50%
yvec_get_poly 6s 4s +50%
intt_native_x86_64 5s 2s +150%
keccakf1600_permute 5s 2s +150%
mld_sample_s1_s2_serial 5s 3s +67%
poly_chknorm_native 5s 2s +150%
poly_ntt_c 5s 3s +67%
poly_permute_bitrev_to_custom_optional_native 5s 2s +150%
poly_uniform_eta 5s 4s +25%
poly_uniform_gamma1 5s 4s +25%
poly_uniform_gamma1_4x 5s 4s +25%
polyt0_pack 5s 3s +67%
polyt1_pack 5s 3s +67%
polyvecl_pointwise_acc_montgomery 5s 2s +150%
rej_eta 5s 3s +67%
rej_eta_native 5s 5s +0%
shake128_absorb 5s 3s +67%
shake256_init 5s 1s +400%
sign 5s 10s -50%
sign_keypair 5s 4s +25%
sign_pk_from_sk 5s 7s -29%
sign_verify 5s 4s +25%
sign_verify_pre_hash_internal 5s 4s +25%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 2s +100%
keccak_init 4s 3s +33%
keccakf1600_extract_bytes (big endian) 4s 4s +0%
keccakf1600_xor_bytes 4s 2s +100%
keccakf1600_xor_bytes (big endian) 4s 3s +33%
keccakf1600x4_extract_bytes_native 4s 4s +0%
keccakf1600x4_permute 4s 3s +33%
mld_sample_s1_s2 4s 7s -43%
montgomery_reduce 4s 3s +33%
ntt_native_aarch64 4s 3s +33%
pack_sig_h 4s 4s +0%
poly_chknorm_native_x86_64 4s 2s +100%
poly_decompose 4s 2s +100%
poly_decompose_32_native_aarch64 4s 6s -33%
poly_permute_bitrev_to_custom_optional 4s 3s +33%
poly_pointwise_montgomery 4s 2s +100%
poly_power2round 4s 3s +33%
poly_reduce 4s 2s +100%
poly_use_hint_native 4s 2s +100%
polyeta_unpack 4s 3s +33%
polyveck_unpack_eta 4s 3s +33%
polyvecl_uniform_gamma1_serial 4s 4s +0%
polyw1_pack 4s 3s +33%
polyz_unpack_17_native_aarch64 4s 3s +33%
polyz_unpack_native 4s 2s +100%
rej_uniform_native_aarch64 4s 5s -20%
shake128_release 4s 4s +0%
shake128x4_absorb_once 4s 5s -20%
sign_signature 4s 7s -43%
sign_signature_extmu 4s 2s +100%
decompose 3s 4s -25%
keccak_f1600_x1_native_aarch64 3s 3s +0%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_avx2 3s 5s -40%
keccakf1600_permute_native 3s 3s +0%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_cmask_nonzero_u32 3s 5s -40%
mld_ct_cmask_nonzero_u8 3s 4s -25%
mld_ct_sel_int32 3s 2s +50%
mld_keccakf1600_extract_bytes 3s 2s +50%
mld_polymat_expand_entry 3s 3s +0%
pack_sig_z 3s 3s +0%
pack_sk_rho_key_tr_s2 3s 2s +50%
pack_sk_s1 3s 3s +0%
poly_caddq 3s 4s -25%
poly_caddq_c 3s 8s -62%
poly_caddq_native 3s 2s +50%
poly_caddq_native_x86_64 3s 3s +0%
poly_chknorm 3s 4s -25%
poly_chknorm_native_aarch64 3s 5s -40%
poly_decompose_88_native_aarch64 3s 2s +50%
poly_decompose_c 3s 5s -40%
poly_decompose_native 3s 2s +50%
poly_invntt_tomont_native 3s 6s -50%
poly_ntt 3s 3s +0%
poly_shiftl 3s 3s +0%
polyt1_unpack 3s 2s +50%
polyvec_matrix_pointwise_montgomery_row 3s 2s +50%
polyveck_pack_eta 3s 4s -25%
polyveck_reduce 3s 2s +50%
polyw1_pack_32 3s 4s -25%
polyw1_pack_88 3s 4s -25%
polyz_pack 3s 3s +0%
polyz_unpack 3s 2s +50%
polyz_unpack_19_native_aarch64 3s 3s +0%
power2round 3s 4s -25%
reduce32 3s 4s -25%
rej_eta_c 3s 4s -25%
rej_uniform_eta_native_aarch64 3s 5s -40%
shake128_finalize 3s 5s -40%
shake256 3s 2s +50%
shake256_absorb 3s 2s +50%
shake256_squeeze 3s 3s +0%
sign_keypair_internal 3s 5s -40%
sign_signature_pre_hash_shake256 3s 2s +50%
sign_verify_extmu 3s 4s -25%
sk_s1hat_get_poly 3s 2s +50%
sk_t0hat_get_poly 3s 6s -50%
sys_check_capability 3s 3s +0%
unpack_pk_t1 3s 2s +50%
unpack_sk 3s 3s +0%
use_hint 3s 2s +50%
yvec_init 3s 5s -40%
caddq 2s 2s +0%
fqscale 2s 3s -33%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_finalize 2s 5s -60%
keccak_squeeze 2s 3s -33%
keccakf1600x4_xor_bytes 2s 5s -60%
keccakf1600x4_xor_bytes_native 2s 2s +0%
make_hint 2s 3s -33%
mld_ct_get_optblocker_i64 2s 2s +0%
mld_ct_get_optblocker_u8 2s 3s -33%
mld_keccakf1600x4_extract_bytes_c 2s 2s +0%
mld_keccakf1600x4_xor_bytes_c 2s 3s -33%
mld_value_barrier_i64 2s 2s +0%
mld_value_barrier_u32 2s 2s +0%
mld_value_barrier_u8 2s 3s -33%
ntt_native_x86_64 2s 3s -33%
nttunpack_native_x86_64 2s 4s -50%
pack_sig_c 2s 2s +0%
pointwise_native_aarch64 2s 3s -33%
pointwise_native_x86_64 2s 2s +0%
poly_decompose_native_x86_64 2s - new
poly_invntt_tomont 2s 4s -50%
poly_ntt_native 2s 3s -33%
poly_pointwise_montgomery_native 2s 4s -50%
poly_sub 2s 3s -33%
poly_uniform 2s 3s -33%
poly_use_hint 2s 4s -50%
poly_use_hint_c 2s 3s -33%
poly_use_hint_native_aarch64 2s 4s -50%
polyeta_pack 2s 2s +0%
polyvecl_pack_eta 2s 2s +0%
polyvecl_uniform_gamma1 2s 6s -67%
polyvecl_unpack_eta 2s 4s -50%
shake128_init 2s 2s +0%
shake128_squeeze 2s 4s -50%
shake256_release 2s 3s -33%
shake256x4_squeezeblocks 2s 2s +0%
sign_signature_pre_hash_internal 2s 5s -60%
sk_s2hat_get_poly 2s 3s -33%
unpack_sk_s1hat 2s 2s +0%
unpack_sk_s2hat 2s 3s -33%
intt_native_aarch64 1s 5s -80%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 1s 2s -50%
keccakf1600x4_extract_bytes 1s 2s -50%
mld_ct_abs_i32 1s 4s -75%
mld_ct_get_optblocker_u32 1s 1s +0%
polyveck_pack_w1 1s 3s -67%
polyvecl_unpack_z 1s 5s -80%
shake128x4_squeezeblocks 1s 2s -50%
shake256_finalize 1s 2s -50%
shake256x4_absorb_once 1s 4s -75%
sig_unpack_hints 1s 4s -75%

@oqs-bot

oqs-bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-87)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 2641s 2351s +12.3%
polyvecl_pointwise_acc_montgomery_c 377s 321s +17%
mld_invntt_layer 328s 276s +19%
polyvec_matrix_expand 251s 218s +15%
rej_uniform_native 161s 152s +6%
poly_pointwise_montgomery_c 114s 86s +33%
mld_attempt_signature_generation 109s 105s +4%
mld_ct_memcmp 78s 66s +18%
sign_signature_internal 68s 63s +8%
sign_verify_internal 62s 59s +5%
polyvec_matrix_expand_serial 50s 47s +6%
fqmul 47s 40s +18%
mld_ntt_layer 46s 42s +10%
polyvec_matrix_pointwise_montgomery_yvec 33s 32s +3%
compute_pack_t0_t1 32s 34s -6%
rej_uniform 27s 22s +23%
poly_chknorm_c 26s 19s +37%
mld_ntt_butterfly_block 24s 21s +14%
keccakf1600x4_permute_native 23s 23s +0%
polyt0_unpack 18s 16s +12%
polyeta_unpack 17s 14s +21%
mld_check_pct 16s 13s +23%
rej_uniform_c 16s 11s +45%
poly_uniform_eta_4x 13s 11s +18%
poly_add 12s 12s +0%
poly_uniform_4x 11s 14s -21%
polyveck_invntt_tomont 11s 9s +22%
polyvecl_ntt 11s 10s +10%
poly_invntt_tomont_c 10s 8s +25%
polyveck_decompose 10s 11s -9%
keccak_absorb_once_x4 9s 10s -10%
sign 9s 8s +12%
mld_compute_pack_z 8s 8s +0%
mld_sample_s1_s2 8s 3s +167%
mld_sample_s1_s2_serial 8s 6s +33%
poly_decompose_c 8s 5s +60%
polyveck_caddq 8s 8s +0%
keccak_absorb 7s 6s +17%
mld_keccakf1600_permute_c 7s 8s -12%
mld_value_barrier_i64 7s 1s +600%
pointwise_acc_native_aarch64 7s 7s +0%
pointwise_acc_native_x86_64 7s 7s +0%
sign_pk_from_sk 7s 4s +75%
sign_verify_extmu 7s 3s +133%
sign_verify_pre_hash_internal 7s 7s +0%
sign_verify_pre_hash_shake256 7s 3s +133%
keccak_squeezeblocks_x4 6s 4s +50%
poly_decompose 6s 3s +100%
poly_pointwise_montgomery 6s 4s +50%
polyveck_chknorm 6s 5s +20%
polyveck_ntt 6s 9s -33%
polyw1_pack_32 6s 3s +100%
polyz_unpack_c 6s 6s +0%
keccakf1600x4_permute 5s 4s +25%
mld_ct_cmask_nonzero_u32 5s 5s +0%
ntt_native_x86_64 5s 3s +67%
pack_sig_c 5s 2s +150%
poly_caddq_c 5s 5s +0%
poly_shiftl 5s 2s +150%
poly_use_hint_c 5s 2s +150%
polyt1_unpack 5s 1s +400%
polyveck_pack_eta 5s 2s +150%
polyveck_pack_w1 5s 3s +67%
polyvecl_pointwise_acc_montgomery_native 5s 5s +0%
rej_eta_native 5s 3s +67%
rej_uniform_native_aarch64 5s 2s +150%
shake256 5s 2s +150%
shake256_init 5s 2s +150%
sign_keypair_internal 5s 4s +25%
sign_open 5s 5s +0%
sign_signature_extmu 5s 5s +0%
unpack_sk 5s 3s +67%
decompose 4s 1s +300%
make_hint 4s 3s +33%
mld_ct_cmask_nonzero_u8 4s 1s +300%
mld_h 4s 3s +33%
mld_keccakf1600x4_extract_bytes_c 4s 2s +100%
pack_sk_rho_key_tr_s2 4s 1s +300%
pointwise_native_x86_64 4s 3s +33%
poly_caddq 4s 4s +0%
poly_caddq_native 4s 2s +100%
poly_caddq_native_aarch64 4s 3s +33%
poly_caddq_native_x86_64 4s 3s +33%
poly_challenge 4s 6s -33%
poly_chknorm 4s 3s +33%
poly_decompose_88_native_aarch64 4s 2s +100%
poly_invntt_tomont 4s 1s +300%
poly_invntt_tomont_native 4s 4s +0%
poly_ntt 4s 6s -33%
poly_sub 4s 1s +300%
poly_uniform 4s 5s -20%
poly_uniform_gamma1_4x 4s 4s +0%
polyeta_pack 4s 2s +100%
polyt0_pack 4s 2s +100%
polyt1_pack 4s 2s +100%
polyvecl_pointwise_acc_montgomery 4s 5s -20%
polyvecl_uniform_gamma1_serial 4s 2s +100%
polyvecl_unpack_z 4s 2s +100%
polyz_unpack 4s 4s +0%
rej_eta 4s 2s +100%
shake256_absorb 4s 4s +0%
sign_keypair 4s 3s +33%
sk_s1hat_get_poly 4s 3s +33%
sk_s2hat_get_poly 4s 2s +100%
sk_t0hat_get_poly 4s 4s +0%
unpack_pk_t1 4s 3s +33%
unpack_sk_s1hat 4s 5s -20%
yvec_init 4s 2s +100%
fqscale 3s 2s +50%
intt_native_x86_64 3s 6s -50%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_avx2 3s 3s +0%
keccak_finalize 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 4s -25%
keccakf1600_permute_native 3s 1s +200%
keccakf1600_xor_bytes (big endian) 3s 3s +0%
keccakf1600x4_extract_bytes_native 3s 2s +50%
keccakf1600x4_xor_bytes 3s 1s +200%
mld_ct_abs_i32 3s 3s +0%
mld_ct_cmask_neg_i32 3s 3s +0%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_keccakf1600_extract_bytes 3s 2s +50%
mld_prepare_domain_separation_prefix 3s 3s +0%
mld_value_barrier_u32 3s 2s +50%
montgomery_reduce 3s 4s -25%
ntt_native_aarch64 3s 1s +200%
pack_sig_z 3s 3s +0%
pack_sk_s1 3s 2s +50%
poly_decompose_native_x86_64 3s - new
poly_ntt_c 3s 3s +0%
poly_permute_bitrev_to_custom_optional_native 3s 5s -40%
poly_power2round 3s 5s -40%
poly_reduce 3s 5s -40%
poly_use_hint_native 3s 4s -25%
poly_use_hint_native_aarch64 3s 3s +0%
polyveck_unpack_eta 3s 3s +0%
polyvecl_chknorm 3s 3s +0%
polyvecl_pack_eta 3s 2s +50%
polyvecl_uniform_gamma1 3s 2s +50%
polyvecl_unpack_eta 3s 5s -40%
polyw1_pack 3s 3s +0%
polyw1_pack_88 3s 5s -40%
power2round 3s 3s +0%
reduce32 3s 3s +0%
rej_eta_c 3s 2s +50%
rej_uniform_eta_native_aarch64 3s 3s +0%
shake128_finalize 3s 2s +50%
shake128_init 3s 2s +50%
shake128_release 3s 3s +0%
shake256_finalize 3s 2s +50%
shake256_release 3s 2s +50%
shake256_squeeze 3s 5s -40%
shake256x4_absorb_once 3s 2s +50%
sig_unpack_hints 3s 4s -25%
sign_signature 3s 4s -25%
sign_signature_pre_hash_shake256 3s 4s -25%
sys_check_capability 3s 2s +50%
unpack_sk_s2hat 3s 3s +0%
unpack_sk_t0hat 3s 8s -62%
use_hint 3s 3s +0%
caddq 2s 3s -33%
intt_native_aarch64 2s 6s -67%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 4s -50%
keccak_init 2s 2s +0%
keccakf1600_permute 2s 2s +0%
keccakf1600_xor_bytes 2s 3s -33%
keccakf1600x4_xor_bytes_native 2s 4s -50%
mld_ct_get_optblocker_i64 2s 1s +100%
mld_ct_get_optblocker_u32 2s 3s -33%
mld_ct_sel_int32 2s 3s -33%
mld_keccakf1600x4_xor_bytes_c 2s 3s -33%
mld_polymat_expand_entry 2s 3s -33%
mld_value_barrier_u8 2s 2s +0%
nttunpack_native_x86_64 2s 4s -50%
pack_sig_h 2s 5s -60%
pointwise_native_aarch64 2s 3s -33%
poly_chknorm_native 2s 4s -50%
poly_chknorm_native_aarch64 2s 3s -33%
poly_chknorm_native_x86_64 2s 3s -33%
poly_decompose_32_native_aarch64 2s 4s -50%
poly_decompose_native 2s 3s -33%
poly_ntt_native 2s 3s -33%
poly_pointwise_montgomery_native 2s 4s -50%
poly_uniform_eta 2s 4s -50%
poly_uniform_gamma1 2s 2s +0%
poly_use_hint 2s 2s +0%
polyveck_reduce 2s 2s +0%
polyz_unpack_17_native_aarch64 2s 3s -33%
polyz_unpack_19_native_aarch64 2s 4s -50%
polyz_unpack_native 2s 2s +0%
shake128x4_absorb_once 2s 2s +0%
shake256x4_squeezeblocks 2s 2s +0%
sign_signature_pre_hash_internal 2s 4s -50%
sign_verify 2s 5s -60%
keccak_f1600_x4_native_aarch64_v84a 1s 2s -50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 1s 3s -67%
keccak_squeeze 1s 3s -67%
keccakf1600x4_extract_bytes 1s 3s -67%
poly_permute_bitrev_to_custom_optional 1s 4s -75%
polyvec_matrix_pointwise_montgomery_row 1s 4s -75%
polyz_pack 1s 6s -83%
shake128_absorb 1s 1s +0%
shake128_squeeze 1s 3s -67%
shake128x4_squeezeblocks 1s 3s -67%
yvec_get_poly 1s 4s -75%

@oqs-bot

oqs-bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-44)

Full Results (205 proofs)
Proof Status Current Previous Change
**TOTAL** 1885s 1769s +6.6%
mld_invntt_layer 321s 281s +14%
rej_uniform_native 161s 144s +12%
polyvecl_pointwise_acc_montgomery_c 135s 119s +13%
poly_pointwise_montgomery_c 109s 92s +18%
mld_ct_memcmp 75s 63s +19%
mld_attempt_signature_generation 66s 63s +5%
mld_ntt_layer 46s 43s +7%
fqmul 44s 40s +10%
polyvec_matrix_expand 27s 25s +8%
sign_verify_internal 27s 26s +4%
mld_ntt_butterfly_block 24s 20s +20%
keccakf1600x4_permute_native 23s 22s +5%
rej_uniform 23s 23s +0%
poly_chknorm_c 21s 19s +11%
polyeta_unpack 18s 16s +12%
polyt0_unpack 18s 19s -5%
sign_signature_internal 18s 20s -10%
compute_pack_t0_t1 13s 14s -7%
mld_check_pct 13s 11s +18%
poly_uniform_eta_4x 13s 13s +0%
polyz_unpack_c 13s 13s +0%
rej_uniform_c 13s 14s -7%
poly_add 12s 10s +20%
polyveck_chknorm 12s 11s +9%
poly_use_hint_c 11s 6s +83%
poly_uniform_4x 10s 12s -17%
keccak_absorb_once_x4 9s 9s +0%
polyvec_matrix_pointwise_montgomery_yvec 9s 10s -10%
mld_compute_pack_z 8s 9s -11%
poly_invntt_tomont_c 8s 8s +0%
polyvec_matrix_expand_serial 8s 8s +0%
polyveck_decompose 8s 8s +0%
keccak_absorb 7s 7s +0%
poly_caddq_c 7s 7s +0%
poly_chknorm_native 7s 5s +40%
keccak_squeeze 6s 4s +50%
mld_keccakf1600_permute_c 6s 7s -14%
pointwise_acc_native_aarch64 6s 6s +0%
poly_decompose 6s 3s +100%
poly_uniform_gamma1_4x 6s 5s +20%
polyvecl_chknorm 6s 3s +100%
rej_eta_c 6s 3s +100%
sign 6s 6s +0%
sign_open 6s 4s +50%
sign_pk_from_sk 6s 7s -14%
sign_signature_pre_hash_shake256 6s 6s +0%
unpack_pk_t1 6s 3s +100%
mld_ct_cmask_nonzero_u8 5s 1s +400%
mld_sample_s1_s2 5s 4s +25%
pack_sk_s1 5s 2s +150%
pointwise_acc_native_x86_64 5s 5s +0%
poly_decompose_c 5s 5s +0%
poly_decompose_native 5s 5s +0%
poly_ntt 5s 4s +25%
poly_power2round 5s 6s -17%
poly_uniform 5s 3s +67%
poly_uniform_eta 5s 3s +67%
poly_use_hint 5s 2s +150%
poly_use_hint_native 5s 5s +0%
polyeta_pack 5s 4s +25%
polyvec_matrix_pointwise_montgomery_row 5s 3s +67%
polyveck_invntt_tomont 5s 4s +25%
polyz_pack 5s 5s +0%
rej_eta_native 5s 10s -50%
shake128_squeeze 5s 3s +67%
sign_signature_pre_hash_internal 5s 5s +0%
caddq 4s 3s +33%
decompose 4s 3s +33%
intt_native_aarch64 4s 4s +0%
keccak_init 4s 2s +100%
keccakf1600_permute_native 4s 2s +100%
keccakf1600x4_extract_bytes_native 4s 6s -33%
mld_h 4s 4s +0%
ntt_native_aarch64 4s 3s +33%
pack_sig_c 4s 3s +33%
pack_sig_z 4s 4s +0%
poly_challenge 4s 5s -20%
poly_chknorm_native_aarch64 4s 3s +33%
poly_ntt_c 4s 4s +0%
poly_permute_bitrev_to_custom_optional 4s 3s +33%
poly_pointwise_montgomery_native 4s 4s +0%
poly_reduce 4s 3s +33%
poly_shiftl 4s 6s -33%
polyt0_pack 4s 4s +0%
polyveck_caddq 4s 3s +33%
polyveck_ntt 4s 3s +33%
polyvecl_ntt 4s 6s -33%
polyvecl_pointwise_acc_montgomery 4s 2s +100%
polyvecl_unpack_z 4s 4s +0%
power2round 4s 3s +33%
rej_eta 4s 5s -20%
rej_uniform_native_aarch64 4s 4s +0%
shake256 4s 2s +100%
shake256_init 4s 2s +100%
sign_keypair 4s 4s +0%
sign_keypair_internal 4s 5s -20%
sign_signature_extmu 4s 4s +0%
sign_verify_pre_hash_shake256 4s 5s -20%
sys_check_capability 4s 1s +300%
use_hint 4s 2s +100%
yvec_init 4s 2s +100%
intt_native_x86_64 3s 3s +0%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccak_f1600_x4_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 3s +0%
keccak_squeezeblocks_x4 3s 4s -25%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600_permute 3s 2s +50%
keccakf1600_xor_bytes 3s 3s +0%
keccakf1600_xor_bytes (big endian) 3s 3s +0%
keccakf1600x4_extract_bytes 3s 3s +0%
mld_ct_sel_int32 3s 2s +50%
mld_polymat_expand_entry 3s 3s +0%
mld_sample_s1_s2_serial 3s 2s +50%
mld_value_barrier_u8 3s 1s +200%
montgomery_reduce 3s 3s +0%
ntt_native_x86_64 3s 4s -25%
pack_sig_h 3s 3s +0%
pack_sk_rho_key_tr_s2 3s 2s +50%
pointwise_native_aarch64 3s 3s +0%
pointwise_native_x86_64 3s 2s +50%
poly_caddq 3s 3s +0%
poly_caddq_native 3s 4s -25%
poly_caddq_native_x86_64 3s 2s +50%
poly_chknorm_native_x86_64 3s 4s -25%
poly_decompose_32_native_aarch64 3s 7s -57%
poly_decompose_88_native_aarch64 3s 2s +50%
poly_decompose_native_x86_64 3s - new
poly_invntt_tomont 3s 3s +0%
poly_invntt_tomont_native 3s 4s -25%
poly_use_hint_native_aarch64 3s 1s +200%
polyveck_unpack_eta 3s 3s +0%
polyvecl_pointwise_acc_montgomery_native 3s 2s +50%
polyvecl_uniform_gamma1_serial 3s 5s -40%
polyvecl_unpack_eta 3s 3s +0%
polyw1_pack 3s 4s -25%
shake128_release 3s 2s +50%
shake128x4_absorb_once 3s 1s +200%
shake256_absorb 3s 2s +50%
shake256_release 3s 3s +0%
shake256_squeeze 3s 1s +200%
sign_signature 3s 4s -25%
sign_verify_extmu 3s 3s +0%
sk_s2hat_get_poly 3s 3s +0%
unpack_sk 3s 4s -25%
unpack_sk_s1hat 3s 2s +50%
fqscale 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 3s -33%
keccakf1600x4_permute 2s 2s +0%
keccakf1600x4_xor_bytes 2s 3s -33%
keccakf1600x4_xor_bytes_native 2s 4s -50%
make_hint 2s 3s -33%
mld_ct_get_optblocker_u32 2s 4s -50%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_keccakf1600x4_extract_bytes_c 2s 4s -50%
mld_keccakf1600x4_xor_bytes_c 2s 3s -33%
mld_prepare_domain_separation_prefix 2s 2s +0%
mld_value_barrier_i64 2s 1s +100%
mld_value_barrier_u32 2s 2s +0%
nttunpack_native_x86_64 2s 4s -50%
poly_caddq_native_aarch64 2s 3s -33%
poly_chknorm 2s 4s -50%
poly_ntt_native 2s 3s -33%
poly_permute_bitrev_to_custom_optional_native 2s 3s -33%
poly_pointwise_montgomery 2s 2s +0%
poly_sub 2s 3s -33%
poly_uniform_gamma1 2s 3s -33%
polyt1_pack 2s 5s -60%
polyt1_unpack 2s 2s +0%
polyveck_pack_w1 2s 4s -50%
polyw1_pack_32 2s 2s +0%
polyw1_pack_88 2s 4s -50%
polyz_unpack 2s 4s -50%
polyz_unpack_17_native_aarch64 2s 3s -33%
polyz_unpack_19_native_aarch64 2s 4s -50%
polyz_unpack_native 2s 3s -33%
reduce32 2s 2s +0%
rej_uniform_eta_native_aarch64 2s 5s -60%
shake128_absorb 2s 3s -33%
shake128_finalize 2s 2s +0%
shake128_init 2s 1s +100%
shake256_finalize 2s 2s +0%
shake256x4_squeezeblocks 2s 3s -33%
sig_unpack_hints 2s 2s +0%
sign_verify 2s 5s -60%
sign_verify_pre_hash_internal 2s 3s -33%
sk_s1hat_get_poly 2s 2s +0%
sk_t0hat_get_poly 2s 2s +0%
unpack_sk_s2hat 2s 6s -67%
unpack_sk_t0hat 2s 5s -60%
yvec_get_poly 2s 2s +0%
keccak_finalize 1s 3s -67%
mld_ct_abs_i32 1s 1s +0%
mld_ct_cmask_neg_i32 1s 2s -50%
mld_ct_cmask_nonzero_u32 1s 3s -67%
mld_ct_get_optblocker_i64 1s 3s -67%
mld_ct_get_optblocker_u8 1s 3s -67%
polyveck_pack_eta 1s 3s -67%
polyveck_reduce 1s 4s -75%
polyvecl_pack_eta 1s 4s -75%
polyvecl_uniform_gamma1 1s 2s -50%
shake128x4_squeezeblocks 1s 3s -67%
shake256x4_absorb_once 1s 2s -50%

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 46539 cycles 46537 cycles 1.00
ML-DSA-44 sign 131082 cycles 131061 cycles 1.00
ML-DSA-44 verify 47348 cycles 47345 cycles 1.00
ML-DSA-65 keypair 81686 cycles 81683 cycles 1.00
ML-DSA-65 sign 215322 cycles 215331 cycles 1.00
ML-DSA-65 verify 79305 cycles 79302 cycles 1.00
ML-DSA-87 keypair 132401 cycles 132400 cycles 1.00
ML-DSA-87 sign 277532 cycles 277357 cycles 1.00
ML-DSA-87 verify 134051 cycles 134055 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 112752 cycles 112752 cycles 1
ML-DSA-44 sign 400901 cycles 400863 cycles 1.00
ML-DSA-44 verify 119445 cycles 119445 cycles 1
ML-DSA-65 keypair 192978 cycles 192931 cycles 1.00
ML-DSA-65 sign 649977 cycles 649957 cycles 1.00
ML-DSA-65 verify 192863 cycles 192871 cycles 1.00
ML-DSA-87 keypair 318842 cycles 318724 cycles 1.00
ML-DSA-87 sign 828816 cycles 828761 cycles 1.00
ML-DSA-87 verify 326790 cycles 326654 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 231326 cycles 226291 cycles 1.02
ML-DSA-44 sign 636953 cycles 613701 cycles 1.04
ML-DSA-44 verify 218748 cycles 223287 cycles 0.98
ML-DSA-65 keypair 401703 cycles 401301 cycles 1.00
ML-DSA-65 sign 1038618 cycles 1019858 cycles 1.02
ML-DSA-65 verify 382587 cycles 377404 cycles 1.01
ML-DSA-87 keypair 663545 cycles 662182 cycles 1.00
ML-DSA-87 sign 1385036 cycles 1364271 cycles 1.02
ML-DSA-87 verify 639267 cycles 646030 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 sign 636953 cycles 613701 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 43326 cycles 43362 cycles 1.00
ML-DSA-44 sign 130586 cycles 131357 cycles 0.99
ML-DSA-44 verify 45079 cycles 45329 cycles 0.99
ML-DSA-65 keypair 75623 cycles 75529 cycles 1.00
ML-DSA-65 sign 214943 cycles 215494 cycles 1.00
ML-DSA-65 verify 74300 cycles 74395 cycles 1.00
ML-DSA-87 keypair 123196 cycles 123052 cycles 1.00
ML-DSA-87 sign 271568 cycles 270650 cycles 1.00
ML-DSA-87 verify 120595 cycles 120614 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 91451 cycles 91711 cycles 1.00
ML-DSA-44 sign 351826 cycles 353153 cycles 1.00
ML-DSA-44 verify 99751 cycles 100089 cycles 1.00
ML-DSA-65 keypair 153919 cycles 153963 cycles 1.00
ML-DSA-65 sign 571974 cycles 570637 cycles 1.00
ML-DSA-65 verify 160011 cycles 159765 cycles 1.00
ML-DSA-87 keypair 255432 cycles 256233 cycles 1.00
ML-DSA-87 sign 726067 cycles 727004 cycles 1.00
ML-DSA-87 verify 264170 cycles 264081 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 55425 cycles 55232 cycles 1.00
ML-DSA-44 sign 159735 cycles 159604 cycles 1.00
ML-DSA-44 verify 57634 cycles 57852 cycles 1.00
ML-DSA-65 keypair 96240 cycles 95882 cycles 1.00
ML-DSA-65 sign 263039 cycles 264042 cycles 1.00
ML-DSA-65 verify 96049 cycles 96233 cycles 1.00
ML-DSA-87 keypair 154779 cycles 154587 cycles 1.00
ML-DSA-87 sign 320859 cycles 322317 cycles 1.00
ML-DSA-87 verify 151689 cycles 151310 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 133300 cycles 133370 cycles 1.00
ML-DSA-44 sign 519465 cycles 519316 cycles 1.00
ML-DSA-44 verify 146603 cycles 146733 cycles 1.00
ML-DSA-65 keypair 224063 cycles 224213 cycles 1.00
ML-DSA-65 sign 843482 cycles 843252 cycles 1.00
ML-DSA-65 verify 234302 cycles 234223 cycles 1.00
ML-DSA-87 keypair 367801 cycles 367144 cycles 1.00
ML-DSA-87 sign 1060988 cycles 1060336 cycles 1.00
ML-DSA-87 verify 381200 cycles 380930 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 112335 cycles 112525 cycles 1.00
ML-DSA-44 sign 354217 cycles 354017 cycles 1.00
ML-DSA-44 verify 117414 cycles 117394 cycles 1.00
ML-DSA-65 keypair 194466 cycles 194697 cycles 1.00
ML-DSA-65 sign 584370 cycles 584501 cycles 1.00
ML-DSA-65 verify 193464 cycles 193283 cycles 1.00
ML-DSA-87 keypair 320864 cycles 320987 cycles 1.00
ML-DSA-87 sign 747862 cycles 746658 cycles 1.00
ML-DSA-87 verify 318010 cycles 318698 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 46713 cycles 47001 cycles 0.99
ML-DSA-44 sign 140686 cycles 139110 cycles 1.01
ML-DSA-44 verify 49619 cycles 49262 cycles 1.01
ML-DSA-65 keypair 82371 cycles 82519 cycles 1.00
ML-DSA-65 sign 228022 cycles 227885 cycles 1.00
ML-DSA-65 verify 82195 cycles 82013 cycles 1.00
ML-DSA-87 keypair 130193 cycles 129228 cycles 1.01
ML-DSA-87 sign 280026 cycles 279733 cycles 1.00
ML-DSA-87 verify 128511 cycles 128347 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 67357 cycles 67336 cycles 1.00
ML-DSA-44 sign 198377 cycles 198343 cycles 1.00
ML-DSA-44 verify 70274 cycles 70240 cycles 1.00
ML-DSA-65 keypair 119485 cycles 119389 cycles 1.00
ML-DSA-65 sign 326268 cycles 325890 cycles 1.00
ML-DSA-65 verify 117012 cycles 116943 cycles 1.00
ML-DSA-87 keypair 196441 cycles 196722 cycles 1.00
ML-DSA-87 sign 421298 cycles 421918 cycles 1.00
ML-DSA-87 verify 193282 cycles 193428 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 299428 cycles 301288 cycles 0.99
ML-DSA-44 sign 1141001 cycles 1145421 cycles 1.00
ML-DSA-44 verify 330622 cycles 332207 cycles 1.00
ML-DSA-65 keypair 563729 cycles 547720 cycles 1.03
ML-DSA-65 sign 1920855 cycles 1895789 cycles 1.01
ML-DSA-65 verify 539918 cycles 531514 cycles 1.02
ML-DSA-87 keypair 861758 cycles 855749 cycles 1.01
ML-DSA-87 sign 2426173 cycles 2378667 cycles 1.02
ML-DSA-87 verify 906246 cycles 883654 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 118433 cycles 118101 cycles 1.00
ML-DSA-44 sign 458646 cycles 458153 cycles 1.00
ML-DSA-44 verify 130869 cycles 130875 cycles 1.00
ML-DSA-65 keypair 200886 cycles 200971 cycles 1.00
ML-DSA-65 sign 745083 cycles 742473 cycles 1.00
ML-DSA-65 verify 209164 cycles 209101 cycles 1.00
ML-DSA-87 keypair 331397 cycles 332976 cycles 1.00
ML-DSA-87 sign 938261 cycles 938796 cycles 1.00
ML-DSA-87 verify 343275 cycles 342887 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 127600 cycles 127655 cycles 1.00
ML-DSA-44 sign 441155 cycles 441153 cycles 1.00
ML-DSA-44 verify 136410 cycles 136366 cycles 1.00
ML-DSA-65 keypair 220532 cycles 220720 cycles 1.00
ML-DSA-65 sign 714274 cycles 713831 cycles 1.00
ML-DSA-65 verify 221102 cycles 220771 cycles 1.00
ML-DSA-87 keypair 364562 cycles 365122 cycles 1.00
ML-DSA-87 sign 915619 cycles 921347 cycles 0.99
ML-DSA-87 verify 370883 cycles 370803 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 212223 cycles 211830 cycles 1.00
ML-DSA-44 sign 761057 cycles 759846 cycles 1.00
ML-DSA-44 verify 229916 cycles 229343 cycles 1.00
ML-DSA-65 keypair 378717 cycles 377180 cycles 1.00
ML-DSA-65 sign 1247998 cycles 1247170 cycles 1.00
ML-DSA-65 verify 373347 cycles 371571 cycles 1.00
ML-DSA-87 keypair 602516 cycles 600508 cycles 1.00
ML-DSA-87 sign 1584868 cycles 1584292 cycles 1.00
ML-DSA-87 verify 618340 cycles 616048 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 61876 cycles 61490 cycles 1.01
ML-DSA-44 sign 189410 cycles 189504 cycles 1.00
ML-DSA-44 verify 66354 cycles 66611 cycles 1.00
ML-DSA-65 keypair 112014 cycles 109562 cycles 1.02
ML-DSA-65 sign 320464 cycles 315340 cycles 1.02
ML-DSA-65 verify 109875 cycles 109818 cycles 1.00
ML-DSA-87 keypair 171027 cycles 171377 cycles 1.00
ML-DSA-87 sign 379568 cycles 379373 cycles 1.00
ML-DSA-87 verify 170309 cycles 170479 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 154733 cycles 154452 cycles 1.00
ML-DSA-44 sign 590183 cycles 589881 cycles 1.00
ML-DSA-44 verify 169693 cycles 170191 cycles 1.00
ML-DSA-65 keypair 262845 cycles 263626 cycles 1.00
ML-DSA-65 sign 965740 cycles 966009 cycles 1.00
ML-DSA-65 verify 272539 cycles 273308 cycles 1.00
ML-DSA-87 keypair 432704 cycles 431959 cycles 1.00
ML-DSA-87 sign 1211442 cycles 1210781 cycles 1.00
ML-DSA-87 verify 448136 cycles 447098 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 112168 cycles 112136 cycles 1.00
ML-DSA-44 sign 353497 cycles 353809 cycles 1.00
ML-DSA-44 verify 117008 cycles 117213 cycles 1.00
ML-DSA-65 keypair 194787 cycles 194348 cycles 1.00
ML-DSA-65 sign 583930 cycles 583675 cycles 1.00
ML-DSA-65 verify 192719 cycles 193087 cycles 1.00
ML-DSA-87 keypair 320908 cycles 320087 cycles 1.00
ML-DSA-87 sign 747304 cycles 747202 cycles 1.00
ML-DSA-87 verify 318766 cycles 317903 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 71374 cycles 71546 cycles 1.00
ML-DSA-44 sign 208951 cycles 208995 cycles 1.00
ML-DSA-44 verify 74784 cycles 74742 cycles 1.00
ML-DSA-65 keypair 125930 cycles 125949 cycles 1.00
ML-DSA-65 sign 345622 cycles 345451 cycles 1.00
ML-DSA-65 verify 124102 cycles 124199 cycles 1.00
ML-DSA-87 keypair 207053 cycles 206632 cycles 1.00
ML-DSA-87 sign 443985 cycles 439852 cycles 1.01
ML-DSA-87 verify 204072 cycles 204472 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 137939 cycles 138023 cycles 1.00
ML-DSA-44 sign 486208 cycles 486039 cycles 1.00
ML-DSA-44 verify 149048 cycles 149068 cycles 1.00
ML-DSA-65 keypair 241520 cycles 241810 cycles 1.00
ML-DSA-65 sign 792069 cycles 791663 cycles 1.00
ML-DSA-65 verify 242200 cycles 241314 cycles 1.00
ML-DSA-87 keypair 395771 cycles 396304 cycles 1.00
ML-DSA-87 sign 1013680 cycles 1019188 cycles 0.99
ML-DSA-87 verify 403660 cycles 403745 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 211517 cycles 211659 cycles 1.00
ML-DSA-44 sign 758603 cycles 760082 cycles 1.00
ML-DSA-44 verify 228966 cycles 229485 cycles 1.00
ML-DSA-65 keypair 377434 cycles 377835 cycles 1.00
ML-DSA-65 sign 1247925 cycles 1246585 cycles 1.00
ML-DSA-65 verify 371551 cycles 371729 cycles 1.00
ML-DSA-87 keypair 600603 cycles 601814 cycles 1.00
ML-DSA-87 sign 1582984 cycles 1582429 cycles 1.00
ML-DSA-87 verify 616265 cycles 617716 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 267515 cycles 268565 cycles 1.00
ML-DSA-44 sign 807339 cycles 806886 cycles 1.00
ML-DSA-44 verify 269280 cycles 269851 cycles 1.00
ML-DSA-65 keypair 461007 cycles 459957 cycles 1.00
ML-DSA-65 sign 1320496 cycles 1317354 cycles 1.00
ML-DSA-65 verify 447023 cycles 445463 cycles 1.00
ML-DSA-87 keypair 788862 cycles 788523 cycles 1.00
ML-DSA-87 sign 1808107 cycles 1804026 cycles 1.00
ML-DSA-87 verify 770607 cycles 772022 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@jakemas jakemas changed the title x86_64: Replace poly_decompose AVX2 intrinsics with hand-written assembly x86_64 + HOL-Light: Replace poly_decompose AVX2 intrinsics with hand-written assembly and HOL-Light proofs Jun 10, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 463001 cycles 462832 cycles 1.00
ML-DSA-44 sign 2131558 cycles 2130663 cycles 1.00
ML-DSA-44 verify 554513 cycles 554950 cycles 1.00
ML-DSA-65 keypair 780676 cycles 781350 cycles 1.00
ML-DSA-65 sign 3483972 cycles 3478846 cycles 1.00
ML-DSA-65 verify 863731 cycles 864624 cycles 1.00
ML-DSA-87 keypair 1265658 cycles 1261131 cycles 1.00
ML-DSA-87 sign 4297254 cycles 4307837 cycles 1.00
ML-DSA-87 verify 1390213 cycles 1384611 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Details
Benchmark suite Current: 460553d Previous: 08d40f9 Ratio
ML-DSA-44 keypair 760293 cycles 759832 cycles 1.00
ML-DSA-44 sign 3140737 cycles 3139623 cycles 1.00
ML-DSA-44 verify 859554 cycles 859077 cycles 1.00
ML-DSA-65 keypair 1286158 cycles 1285222 cycles 1.00
ML-DSA-65 sign 5077195 cycles 5072020 cycles 1.00
ML-DSA-65 verify 1364303 cycles 1363676 cycles 1.00
ML-DSA-87 keypair 2112223 cycles 2110495 cycles 1.00
ML-DSA-87 sign 6355356 cycles 6366388 cycles 1.00
ML-DSA-87 verify 2228739 cycles 2230493 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@jakemas jakemas marked this pull request as ready for review June 10, 2026 06:37
@jakemas jakemas requested a review from a team as a code owner June 10, 2026 06:37
@jakemas jakemas marked this pull request as draft June 10, 2026 14:04
@jakemas jakemas force-pushed the jakemas/poly-decompose-asm branch 6 times, most recently from d6ee368 to cd47996 Compare June 11, 2026 02:32
@jakemas jakemas marked this pull request as ready for review June 11, 2026 02:33
@jakemas jakemas force-pushed the jakemas/poly-decompose-asm branch 5 times, most recently from a0d2135 to 211fb52 Compare June 12, 2026 20:37
@hanno-becker hanno-becker self-assigned this Jun 14, 2026
Comment on lines +58 to +59
mldsa/poly_decompose_32_avx2_asm.o \
mldsa/poly_decompose_88_avx2_asm.o \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add them to the README.

…mbly

Mirror the AArch64 conversion in poly_decompose_{32,88}_aarch64_asm:
replace the C intrinsics with fully-unrolled AVX2 routines, add HOL-Light
correctness and memory-safety proofs, and CBMC contracts. Helper lemmas
common to both variants are shared via the x86-only mldsa_utils.ml.

- Resolves #420
- Resolves #914

Signed-off-by: Jake Massimo <jakemas@amazon.com>
@jakemas jakemas closed this Jun 15, 2026
@jakemas jakemas force-pushed the jakemas/poly-decompose-asm branch from 211fb52 to b138eef Compare June 15, 2026 18:52
@jakemas

jakemas commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Added to readme

@jakemas

jakemas commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

This PR's branch was accidentally force-pushed while the PR was closed (I was trying to add the update my ammending the commit, but failed), which left it unreopenable on GitHub's side. Reopened the work as #1181 with identical content (rebased on latest main, 31 files). The README addition you asked for (@mkannwischer) is included there. Apologies for the churn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HOL-Light: Prove AVX2 poly_decompose AVX2: Replace intrinsics implementation of poly_decompose with assembly

4 participants