BDE 4.14.0 Production release
|
Macros | |
#define | BSLH_WYHASHINCREMENTALALGORITHM_WYMUM_XOR 0 |
#define | BSLH_WYHASHINCREMENTALALGORITHM_WYMUM_PSEUDO_MULTIPLY 0 |
Provide an implementation of the WyHash algorithm final v3.
bslh::WyHashIncrementalAlgorithm
implements the WyHash algorithm by Wang Yi et al (see implementation file for full list of authors) with modifications. This algorithm is known to be very fast yet have good avalanche behavior.
The original algorithm was downloaded from https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h which had been updated on September 14, 2021, last commit 166f352, and modified to conform to BDE coding conventions with no change in the binary results produced.
The modifications are:
WyHash is not a "Cryptographically Secure" hash. It is "Cryptographically
Strong", but not "Cryptographically Secure". In order to be cryptographically secure, an algorithm must, among other things, provide "Collision Resistance", described in https://en.wikipedia.org/wiki/Collision_resistance , meaning that it should be difficult to find two different messages m1
and m2
such that hash(m1) == hash(m2)
. Because of the limited sized output (only 2**64 possibilities) and the fast execution time of the algorithm, it is probable to find two such values searching only about sqrt(2**64) == 2**32
inputs, which wont take long.
WyHash is, however, a cryptographically strong PRF (pseudo-random function). This means, assuming a cryptographically secure random seed is given, the output of this algorithm will be indistinguishable from a uniform random distribution. This property is enough for the algorithm to be able to protect a hash table from malicious Denial of Service (DoS) attacks.
Given a cryptographically secure seed, this algorithm will produce hashes with a distribution that is indistinguishable from random. This distribution means that there is no way for an attacker to predict which keys will cause collisions, meaning that this algorithm can help mitigate Denial of Service (DoS) attacks on a hash table. DoS attacks occur when an attacker deliberately degrades the performance of the hash table by inserting data that will collide to the same bucket, causing an average constant time lookup to become a linear search. This protection is only effective if the seed provided is a cryptographically secure random number that is not available to the attacker.
Output hashes will be well distributed and will avalanche, which means changing one bit of the input will change approximately 50% of the output bits. This will prevent similar values from funneling to the same hash or bucket.
The value obtained by hashing a segment of memory is independent of the alignment of the segment of memory.
Note that this algorithm is subdivision-invariant (see {bslh_hash |Subdivision-Invariance}).
This algorithm is at least 2X faster than Spooky on all sizes of objects on Linux, Windows, and Solaris. On Aix it is about twice as fast as Spooky on small objects and about 50% slower on large objects.
This section illustrates intended usage of this component.
Suppose we have any array of types that define operator==
, and we want a fast way to find out if values are contained in the array. We can create a HashTable
data structure that is capable of looking up values in O(1) time.
Further suppose that we will be storing futures (the financial instruments) in this table. Since futures have standardized names, we don't have to worry about any malicious values causing collisions. We will want to use a general purpose hashing algorithm with a good hash distribution and good speed. This algorithm will need to be in the form of a hash functor – an object that will take objects stored in our array as input, and yield a 64-bit int value. The functor can pass the attributes of the TYPE
that are salient to hashing into the hashing algorithm, and then return the hash that is produced.
We can use the result of the hash function to index into our array of buckets
. Each bucket
is simply a pointer to a value in our original array of TYPE
objects.
First, we define our HashTable
template class, with the two type parameters: TYPE
(the type being referenced) and HASHER
(a functor that produces the hash).
This class template
implements a hash table providing fast lookup of an external, non-owned, array of values of (template parameter) TYPE
.
The (template parameter) TYPE
shall have a transitive, symmetric operator==
function. There is no requirement that it have any kind of creator defined.
The HASHER
template parameter type must be a functor with a method having the following signature:
and HASHER
shall have a publicly accessible default constructor and destructor.
Note that this hash table has numerous simplifications because we know the size of the array and never have to resize the table.
Then, we define a Future
class, which holds a c-string name
, char callMonth
, and short callYear
.
This class
identifies a future contract. It tracks the name, call month and year of the contract it represents, and allows equality comparison.
Next, we need a hash functor for Future
. We are going to use the SpookyHashAlgorithm
because it is a fast, general purpose hashing algorithm that will provide an easy way to combine the attributes of Future
objects that are salient to hashing into one reasonable hash that will distribute the items evenly throughout the hash table.
#define BSLH_WYHASHINCREMENTALALGORITHM_WYMUM_PSEUDO_MULTIPLY 0 |
#define BSLH_WYHASHINCREMENTALALGORITHM_WYMUM_XOR 0 |