June 30, 2022

Selecting wyhash as the Default Hash Algorithm

Overview

BDE 3.103.0 updated the default hash algorithm used by BDE (e.g., by bsl::hash) to be wyhash (implemented in the BDE component bslh_wyhashalgorithm).

wyhash demonstrated the best performance of the hash algorithms tested.

Background

BDE’s previously used an implementation of SpookyHash (bslh_spookyhashalgorithm) as its default hash algorithm. This default hash function is intended to be suitable for a general purpose hash table, but is not cryptographically secure. This algorithm was selected in 2014 and no longer compared well against alternative hash functions, particular on x86 hardware.

In order to select a new default hash algorithm we started by evaluating online hash algorithm benchmarks. There are many online benchmarks available whose quality and recency are sometimes hard to judge. The must useful for our purposes though was:

From SMhasher, and other online resources, we collected a set of candidate default hash algorithms.

Benchmark Comparisons

The BDE team performed a few rounds of its own testing to confirm and expand on information available online. The primary goal was to find a hash algorithm that performed best on x86 platforms (Linux and Windows), but at the same time did not degrade performance noticeably on legacy platforms like Sun and AIX.

Initial Algorithm Performance Comparison

The initial comparison tested a number of candidate algorithms on Linux.

  • Median Time in Nanoseconds to hash N bytes (N = 4, 8, 19, 69, 300), sorted by results for 19 Bytes.

Algorithm

4 Bytes

8 Bytes

19 Bytes

69 Bytes

300 Bytes

Basic 32-bit Murmur hash

1.69

1.95

5.01

12.4

34.0

std::hash<std::string>

4.95

3.41

6.01

13.3

49.2

wyhash

7.64

7.65

7.67

10.1

20.3

Fnv64 hash

11.3

13.9

14.8

38.0

61.1

Knuth MMIX (based on random number generator)

10.8

13.5

14.9

35.0

63.8

Extracted _Hash_bytes (gcc std::hash algorithm)

18.0

24.6

22.4

50.3

118

Downloaded MurmurHash64A

18.1

24.2

23.1

53.6

116

Downloaded MurmurHash64B

32.4

33.0

28.5

60.0

147

Default BDE SpookyHash V2

23.3

25.5

30.2

38.3

83.5

Original SpookyHash V2 (from internet)

24.3

27.2

30.6

37.4

81.2

Boost crc32 hash

23.8

27.5

32.1

88.5

302

BDE SipHash

39.0

26.9

43.5

67.6

197

Notes:
  • Tests were performed on linux, x86, gcc, optimized

  • std::hash<std::string> was included as a point of reference (several variants of _Hash_bytes, the hashing function used in gcc, were tested).

  • 32-bit murmur hash provides a 32-bit resulting hash value, which is not suitable for a default hash algorithm.

wyhash was the fastest general-purpose hash function over a range of input lengths that we measured.

Avalanche Testing

A test for “avalanche effects” was then performed. For this test a single bit of input was modified, and the number of bits changed in the resulting hash code was counted. The expectation is there should is a 50% chance each output bit will changed if a single input bit is changed.

The avalanche tests excluded boost crc32 and Fnv64. As can be observed in the raw results, wyhash exhibits a clean bell curve similar to our existing default SpookyHash implementation with almost the exact same standard deviation.

Final Verification for wyhash

wyhash was selected as the primary candidate for the default hash function. This algorithm was implemented in a BDE component (bslh_wyhashalgorithm) and compared against BDE’s previous default hash function SpookyHash.

The results of comparing BDE’s wyhash implementation against BDE’s SpookyHash implementation are below:

LinuxSunAIXWindows
Notes:
  • On x86 platforms, wyhash performed significantly better than SpookyHash (as expected).

  • On AIX, wyhash was faster than SpookyHash for smaller inputs (< ~100 bytes) and slower for larger inputs. We considered this to be a neutral result.

The decline in performance for larger input on AIX was considered acceptable given the benefits in other contexts (especially x86 platforms, where there were substantial improvements).