You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue #739 records a negative finding from PR #731: replacing the current FNV-1a join hash path for kc <= 2 with a splitmix64 finalizer regressed CRDT W=1 by +24%, even though a 16-bit chi-squared distribution test looked acceptable.
The current code still uses the FNV-1a path in wirelog/columnar/ops.c::col_join_hash_rel_keys().
Problem
The failed experiment showed that synthetic 16-bit bucket distribution is not enough. Any future join-hash specialization needs to prove behavior at the actual table sizes and key distributions used by the engine.
Proposed direction
Add a validation/benchmark harness for candidate join hash functions before attempting another engine swap:
Measure bucket distribution at actual join table sizes used by target workloads, especially CRDT's 18-19 bit hash tables.
Measure probe-chain length, including mean, p95/p99, and max.
Run on structured workload keys, not only random small-range pairs.
Compare against the current FNV-1a implementation as the baseline.
Only after the harness shows a clear win should an engine hash specialization be attempted.
Background
Issue #739 records a negative finding from PR #731: replacing the current FNV-1a join hash path for
kc <= 2with a splitmix64 finalizer regressed CRDT W=1 by +24%, even though a 16-bit chi-squared distribution test looked acceptable.The current code still uses the FNV-1a path in
wirelog/columnar/ops.c::col_join_hash_rel_keys().Problem
The failed experiment showed that synthetic 16-bit bucket distribution is not enough. Any future join-hash specialization needs to prove behavior at the actual table sizes and key distributions used by the engine.
Proposed direction
Add a validation/benchmark harness for candidate join hash functions before attempting another engine swap:
Acceptance criteria
kc <= 2specialization from do-not-retry: kc=2 splitmix64 join hash specialization (CRDT W=1 +24%) #739 without this proof.References
wirelog/columnar/ops.c::col_join_hash_rel_keys