Skip to content

x86_64: Replace rej_uniform_eta2/eta4 intrinsics with hand-written assembly#1187

Closed
jakemas wants to merge 1 commit into
pq-code-package:mainfrom
jakemas:x86-rej-eta-asm-draft
Closed

x86_64: Replace rej_uniform_eta2/eta4 intrinsics with hand-written assembly#1187
jakemas wants to merge 1 commit into
pq-code-package:mainfrom
jakemas:x86-rej-eta-asm-draft

Conversation

@jakemas

@jakemas jakemas commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Draft — opened while the HOL-Light proofs are still in progress.

Replaces the AVX2 intrinsics implementations of rej_uniform_eta2 and rej_uniform_eta4 with hand-written x86_64 assembly, following the same approach as #1014:

  • Table passed as a parameter (consistent with the aarch64 approach), avoiding external symbol references for simpasm compatibility.
  • All constants constructed from immediates (no .rodata), enabling future HOL-Light formal verification.
  • __contract__ annotations on the asm entry points (CBMC), to be kept in sync with the HOL-Light specs.
  • scripts/autogen and the x86_64 HOL-Light Makefile register the eta2/eta4 bytecode dump targets.
  • Adds a poly_uniform_eta_4x component benchmark.

Scope of this draft

This draft contains assembly + build/bytecode infrastructure only. It intentionally excludes the HOL-Light .ml proofs, which are still being developed. The proofs will be added before this is marked ready for review.

Notes / TODO before ready-for-review

  • meta.h currently wires only eta4 to the new asm; eta2 asm is in place but mld_rej_uniform_eta2_native still calls the intrinsic — wire it once verified.
  • Add HOL-Light CORRECT (and MEMSAFE) proofs for eta2/eta4.
  • Performance numbers.

…sembly

Add hand-written x86_64 AVX2 assembly for rej_uniform_eta2 and
rej_uniform_eta4, following the rej_uniform approach in pq-code-package#1014: the table
is passed as a parameter and all constants are built from immediates (no
.rodata), enabling future HOL-Light verification. Wire eta4 to the new
asm in meta.h, add the asm entry points and contracts in
arith_native_x86_64.h, register the bytecode dump targets in autogen and
the Makefile, and add a poly_uniform_eta_4x component benchmark.

Signed-off-by: jake massimo <jakemas@amazon.com>
@jakemas

jakemas commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by a branch pushed directly to the upstream repo so CI runs (fork PRs skip the self-hosted/EC2 test+bench jobs).

@jakemas jakemas closed this Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants