June 30, 2022
Selecting wyhash
as the Default Hash Algorithm¶
Overview¶
BDE 3.103.0 updated the default hash algorithm used by BDE (e.g., by bsl::hash) to be wyhash (implemented in the BDE component bslh_wyhashalgorithm).
wyhash
demonstrated the best performance of the hash algorithms tested.
Background¶
BDE’s previously used an implementation of SpookyHash (bslh_spookyhashalgorithm) as its default hash algorithm. This default hash function is intended to be suitable for a general purpose hash table, but is not cryptographically secure. This algorithm was selected in 2014 and no longer compared well against alternative hash functions, particular on x86 hardware.
In order to select a new default hash algorithm we started by evaluating online hash algorithm benchmarks. There are many online benchmarks available whose quality and recency are sometimes hard to judge. The must useful for our purposes though was:
From SMhasher, and other online resources, we collected a set of candidate default hash algorithms.
Benchmark Comparisons¶
The BDE team performed a few rounds of its own testing to confirm and expand on information available online. The primary goal was to find a hash algorithm that performed best on x86 platforms (Linux and Windows), but at the same time did not degrade performance noticeably on legacy platforms like Sun and AIX.
Initial Algorithm Performance Comparison¶
The initial comparison tested a number of candidate algorithms on Linux.
Median Time in Nanoseconds to hash N bytes (N = 4, 8, 19, 69, 300), sorted by results for 19 Bytes.
Algorithm |
4 Bytes |
8 Bytes |
19 Bytes |
69 Bytes |
300 Bytes |
---|---|---|---|---|---|
Basic 32-bit Murmur hash |
1.69 |
1.95 |
5.01 |
12.4 |
34.0 |
std::hash<std::string> |
4.95 |
3.41 |
6.01 |
13.3 |
49.2 |
wyhash |
7.64 |
7.65 |
7.67 |
10.1 |
20.3 |
Fnv64 hash |
11.3 |
13.9 |
14.8 |
38.0 |
61.1 |
Knuth MMIX (based on random number generator) |
10.8 |
13.5 |
14.9 |
35.0 |
63.8 |
Extracted _Hash_bytes (gcc std::hash algorithm) |
18.0 |
24.6 |
22.4 |
50.3 |
118 |
Downloaded MurmurHash64A |
18.1 |
24.2 |
23.1 |
53.6 |
116 |
Downloaded MurmurHash64B |
32.4 |
33.0 |
28.5 |
60.0 |
147 |
Default BDE SpookyHash V2 |
23.3 |
25.5 |
30.2 |
38.3 |
83.5 |
Original SpookyHash V2 (from internet) |
24.3 |
27.2 |
30.6 |
37.4 |
81.2 |
Boost crc32 hash |
23.8 |
27.5 |
32.1 |
88.5 |
302 |
BDE SipHash |
39.0 |
26.9 |
43.5 |
67.6 |
197 |
- Notes:
Tests were performed on linux, x86, gcc, optimized
std::hash<std::string>
was included as a point of reference (several variants of _Hash_bytes, the hashing function used in gcc, were tested).32-bit murmur hash provides a 32-bit resulting hash value, which is not suitable for a default hash algorithm.
wyhash
was the fastest general-purpose hash function over a range of input lengths that we measured.
Avalanche Testing¶
A test for “avalanche effects” was then performed. For this test a single bit of input was modified, and the number of bits changed in the resulting hash code was counted. The expectation is there should is a 50% chance each output bit will changed if a single input bit is changed.
The avalanche tests excluded boost crc32 and Fnv64. As can be observed in the raw results, wyhash
exhibits a
clean bell curve similar to our existing default SpookyHash
implementation with almost the exact same standard deviation.
Final Verification for wyhash¶
wyhash
was selected as the primary candidate for the default hash function. This algorithm was
implemented in a BDE component (bslh_wyhashalgorithm)
and compared against BDE’s previous default hash function SpookyHash
.
The results of comparing BDE’s wyhash
implementation against BDE’s SpookyHash
implementation are below:
- Notes:
On x86 platforms,
wyhash
performed significantly better thanSpookyHash
(as expected).On AIX,
wyhash
was faster thanSpookyHash
for smaller inputs (< ~100 bytes) and slower for larger inputs. We considered this to be a neutral result.
The decline in performance for larger input on AIX was considered acceptable given the benefits in other contexts (especially x86 platforms, where there were substantial improvements).