Improve hash functions: FNV-1a for strings, MurmurHash3 for ireps by tautschnig · Pull Request #8976 · diffblue/cbmc

tautschnig · 2026-04-27T10:43:20Z

Replace the djb2-variant string hash (h = h*31 + c) with FNV-1a which has better avalanche properties for strings with common prefixes (typical of CBMC's SSA-renamed symbols). Add a MurmurHash3 finalization mix for additional distribution quality.

Switch the irep hash combiner from BASIC (rotate+xor) to MurmurHash3 which was already implemented in irep_hash.h but not enabled. MurmurHash3 provides better mixing of sub-expression hashes.

Both changes reduce hash collisions in the dstring table (36% -> ~30%) and the prop_conv expression cache, improving hash table lookup performance throughout the pipeline.

Impact on Collections-C monolithic: 51s -> 47s (8% faster)

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Replace the djb2-variant string hash (h = h*31 + c) with FNV-1a which has better avalanche properties for strings with common prefixes (typical of CBMC's SSA-renamed symbols). Add a MurmurHash3 finalization mix for additional distribution quality. Switch the irep hash combiner from BASIC (rotate+xor) to MurmurHash3 which was already implemented in irep_hash.h but not enabled. MurmurHash3 provides better mixing of sub-expression hashes. Both changes reduce hash collisions in the dstring table (36% -> ~30%) and the prop_conv expression cache, improving hash table lookup performance throughout the pipeline. Impact on Collections-C monolithic: 51s -> 47s (8% faster) Co-authored-by: Kiro <[email protected]>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves hash distribution and reduces collisions by switching string hashing to FNV-1a with a MurmurHash3-style finalization mix, and enabling MurmurHash3 for irep hash combining.

Changes:

Replace the existing djb2-variant string hash with FNV-1a and add fmix64 finalization.
Enable IREP_HASH_MURMURHASH3 instead of the BASIC combiner.
Add comments describing the rationale for improved avalanche properties.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
src/util/string_hash.cpp	Updates string hashing algorithm and introduces MurmurHash3 finalization mix.
src/util/irep_hash.h	Switches irep hashing configuration to MurmurHash3.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-27T11:04:06Z

+static inline std::size_t fmix64(std::size_t h)
+{
+  h ^= h >> 33;
+  h *= 0xff51afd7ed558ccdULL;
+  h ^= h >> 33;
+  h *= 0xc4ceb9fe1a85ec53ULL;
+  h ^= h >> 33;
+  return h;
+}


fmix64 operates on std::size_t but performs h >> 33 and multiplies by 64-bit constants. On 32-bit platforms, shifting by 33 is undefined behavior and the intended mixing can’t work as designed. Make fmix64 take/return std::uint64_t and cast at the boundary, or provide a 32-bit fmix variant selected by sizeof(std::size_t) (or a static_assert(sizeof(std::size_t) == 8) if 32-bit is unsupported).

I agree with this comment.

Copilot · 2026-04-27T11:04:07Z

 {
-  size_t h=0;
-  size_t size=s.size();
+  // Process 8 bytes at a time, then finalize


The comment says “Process 8 bytes at a time” but the implementation hashes byte-by-byte. Please update the comment to match the code, or implement the intended 8-byte chunking (being mindful of alignment/endianness and portability).

Suggested change

// Process 8 bytes at a time, then finalize

// Process the string byte-by-byte, then finalize

Copilot · 2026-04-27T11:04:07Z

+  std::size_t h = 0xcbf29ce484222325ULL;
+  std::size_t len = 0;

-  for(; *s!=0; s++)
-    h=(h<<5)-h+*s;
+  for(const char *p = s; *p; ++p)
+    ++len;

-  return h;
+  h ^= len;
+
+  for(; *s != 0; ++s)
+    h = (h ^ static_cast<unsigned char>(*s)) * 0x100000001b3ULL;


This computes the string length in a first pass and then hashes in a second pass. If keeping length-mixing, consider counting length while hashing and XOR’ing len into h just before fmix64, or use std::strlen(s) (often optimized) to avoid a manual first loop. This reduces extra scanning for long C-strings.

Copilot · 2026-04-27T11:04:07Z

 // #define IREP_HASH_MURMURHASH2A
-// #define IREP_HASH_MURMURHASH3
+#define IREP_HASH_MURMURHASH3



This header relies on manually commenting/uncommenting macros to select exactly one hash strategy. To prevent accidental multiple selections (or none), add preprocessor checks that enforce exactly one of IREP_HASH_BASIC, IREP_HASH_MURMURHASH2A, IREP_HASH_MURMURHASH3 is defined, and emit a clear #error otherwise.

Suggested change

#if !defined(IREP_HASH_BASIC) && !defined(IREP_HASH_MURMURHASH2A) && \

!defined(IREP_HASH_MURMURHASH3)

# error "Exactly one of IREP_HASH_BASIC, IREP_HASH_MURMURHASH2A, or IREP_HASH_MURMURHASH3 must be defined"

#endif

#if (defined(IREP_HASH_BASIC) && defined(IREP_HASH_MURMURHASH2A)) || \

(defined(IREP_HASH_BASIC) && defined(IREP_HASH_MURMURHASH3)) || \

(defined(IREP_HASH_MURMURHASH2A) && defined(IREP_HASH_MURMURHASH3))

# error "Exactly one of IREP_HASH_BASIC, IREP_HASH_MURMURHASH2A, or IREP_HASH_MURMURHASH3 must be defined"

#endif

codecov · 2026-04-27T11:53:39Z

Codecov Report

❌ Patch coverage is 68.18182% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.51%. Comparing base (b32074b) to head (0706180).

Files with missing lines	Patch %	Lines
src/util/string_hash.cpp	68.18%	7 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8976      +/-   ##
===========================================
+ Coverage    80.50%   80.51%   +0.01%     
===========================================
  Files         1704     1704              
  Lines       188778   188811      +33     
  Branches        73       73              
===========================================
+ Hits        151975   152025      +50     
+ Misses       36803    36786      -17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tautschnig self-assigned this Apr 27, 2026

Copilot AI review requested due to automatic review settings April 27, 2026 10:43

tautschnig requested review from kroening and peterschrammel as code owners April 27, 2026 10:43

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Copilot started reviewing on behalf of tautschnig April 27, 2026 12:37 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve hash functions: FNV-1a for strings, MurmurHash3 for ireps#8976

Improve hash functions: FNV-1a for strings, MurmurHash3 for ireps#8976
tautschnig wants to merge 1 commit intodiffblue:developfrom
tautschnig:change-hash-functions

tautschnig commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

kroening Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

codecov Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	// Process 8 bytes at a time, then finalize
	// Process the string byte-by-byte, then finalize

+#if !defined(IREP_HASH_BASIC) && !defined(IREP_HASH_MURMURHASH2A) && \
+  !defined(IREP_HASH_MURMURHASH3)
+#  error "Exactly one of IREP_HASH_BASIC, IREP_HASH_MURMURHASH2A, or IREP_HASH_MURMURHASH3 must be defined"
+#endif
+#if (defined(IREP_HASH_BASIC) && defined(IREP_HASH_MURMURHASH2A)) || \
+  (defined(IREP_HASH_BASIC) && defined(IREP_HASH_MURMURHASH3)) || \
+  (defined(IREP_HASH_MURMURHASH2A) && defined(IREP_HASH_MURMURHASH3))
+#  error "Exactly one of IREP_HASH_BASIC, IREP_HASH_MURMURHASH2A, or IREP_HASH_MURMURHASH3 must be defined"
+#endif

Conversation

tautschnig commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

kroening Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 27, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants