Skip to content

Improve hash functions: FNV-1a for strings, MurmurHash3 for ireps#8976

Open
tautschnig wants to merge 1 commit intodiffblue:developfrom
tautschnig:change-hash-functions
Open

Improve hash functions: FNV-1a for strings, MurmurHash3 for ireps#8976
tautschnig wants to merge 1 commit intodiffblue:developfrom
tautschnig:change-hash-functions

Conversation

@tautschnig
Copy link
Copy Markdown
Collaborator

Replace the djb2-variant string hash (h = h*31 + c) with FNV-1a which has better avalanche properties for strings with common prefixes (typical of CBMC's SSA-renamed symbols). Add a MurmurHash3 finalization mix for additional distribution quality.

Switch the irep hash combiner from BASIC (rotate+xor) to MurmurHash3 which was already implemented in irep_hash.h but not enabled. MurmurHash3 provides better mixing of sub-expression hashes.

Both changes reduce hash collisions in the dstring table (36% -> ~30%) and the prop_conv expression cache, improving hash table lookup performance throughout the pipeline.

Impact on Collections-C monolithic: 51s -> 47s (8% faster)

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Replace the djb2-variant string hash (h = h*31 + c) with FNV-1a
which has better avalanche properties for strings with common
prefixes (typical of CBMC's SSA-renamed symbols). Add a MurmurHash3
finalization mix for additional distribution quality.

Switch the irep hash combiner from BASIC (rotate+xor) to MurmurHash3
which was already implemented in irep_hash.h but not enabled.
MurmurHash3 provides better mixing of sub-expression hashes.

Both changes reduce hash collisions in the dstring table (36% -> ~30%)
and the prop_conv expression cache, improving hash table lookup
performance throughout the pipeline.

Impact on Collections-C monolithic: 51s -> 47s (8% faster)

Co-authored-by: Kiro <[email protected]>
@tautschnig tautschnig self-assigned this Apr 27, 2026
Copilot AI review requested due to automatic review settings April 27, 2026 10:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves hash distribution and reduces collisions by switching string hashing to FNV-1a with a MurmurHash3-style finalization mix, and enabling MurmurHash3 for irep hash combining.

Changes:

  • Replace the existing djb2-variant string hash with FNV-1a and add fmix64 finalization.
  • Enable IREP_HASH_MURMURHASH3 instead of the BASIC combiner.
  • Add comments describing the rationale for improved avalanche properties.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/util/string_hash.cpp Updates string hashing algorithm and introduces MurmurHash3 finalization mix.
src/util/irep_hash.h Switches irep hashing configuration to MurmurHash3.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/util/string_hash.cpp
Comment on lines +20 to +28
static inline std::size_t fmix64(std::size_t h)
{
h ^= h >> 33;
h *= 0xff51afd7ed558ccdULL;
h ^= h >> 33;
h *= 0xc4ceb9fe1a85ec53ULL;
h ^= h >> 33;
return h;
}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmix64 operates on std::size_t but performs h >> 33 and multiplies by 64-bit constants. On 32-bit platforms, shifting by 33 is undefined behavior and the intended mixing can’t work as designed. Make fmix64 take/return std::uint64_t and cast at the boundary, or provide a 32-bit fmix variant selected by sizeof(std::size_t) (or a static_assert(sizeof(std::size_t) == 8) if 32-bit is unsupported).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this comment.

Comment thread src/util/string_hash.cpp
{
size_t h=0;
size_t size=s.size();
// Process 8 bytes at a time, then finalize
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says “Process 8 bytes at a time” but the implementation hashes byte-by-byte. Please update the comment to match the code, or implement the intended 8-byte chunking (being mindful of alignment/endianness and portability).

Suggested change
// Process 8 bytes at a time, then finalize
// Process the string byte-by-byte, then finalize

Copilot uses AI. Check for mistakes.
Comment thread src/util/string_hash.cpp
Comment on lines +49 to +58
std::size_t h = 0xcbf29ce484222325ULL;
std::size_t len = 0;

for(; *s!=0; s++)
h=(h<<5)-h+*s;
for(const char *p = s; *p; ++p)
++len;

return h;
h ^= len;

for(; *s != 0; ++s)
h = (h ^ static_cast<unsigned char>(*s)) * 0x100000001b3ULL;
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This computes the string length in a first pass and then hashes in a second pass. If keeping length-mixing, consider counting length while hashing and XOR’ing len into h just before fmix64, or use std::strlen(s) (often optimized) to avoid a manual first loop. This reduces extra scanning for long C-strings.

Copilot uses AI. Check for mistakes.
Comment thread src/util/irep_hash.h
// #define IREP_HASH_MURMURHASH2A
// #define IREP_HASH_MURMURHASH3
#define IREP_HASH_MURMURHASH3

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header relies on manually commenting/uncommenting macros to select exactly one hash strategy. To prevent accidental multiple selections (or none), add preprocessor checks that enforce exactly one of IREP_HASH_BASIC, IREP_HASH_MURMURHASH2A, IREP_HASH_MURMURHASH3 is defined, and emit a clear #error otherwise.

Suggested change
#if !defined(IREP_HASH_BASIC) && !defined(IREP_HASH_MURMURHASH2A) && \
!defined(IREP_HASH_MURMURHASH3)
# error "Exactly one of IREP_HASH_BASIC, IREP_HASH_MURMURHASH2A, or IREP_HASH_MURMURHASH3 must be defined"
#endif
#if (defined(IREP_HASH_BASIC) && defined(IREP_HASH_MURMURHASH2A)) || \
(defined(IREP_HASH_BASIC) && defined(IREP_HASH_MURMURHASH3)) || \
(defined(IREP_HASH_MURMURHASH2A) && defined(IREP_HASH_MURMURHASH3))
# error "Exactly one of IREP_HASH_BASIC, IREP_HASH_MURMURHASH2A, or IREP_HASH_MURMURHASH3 must be defined"
#endif

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 68.18182% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.51%. Comparing base (b32074b) to head (0706180).

Files with missing lines Patch % Lines
src/util/string_hash.cpp 68.18% 7 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8976      +/-   ##
===========================================
+ Coverage    80.50%   80.51%   +0.01%     
===========================================
  Files         1704     1704              
  Lines       188778   188811      +33     
  Branches        73       73              
===========================================
+ Hits        151975   152025      +50     
+ Misses       36803    36786      -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants