BDE 4.14.0 Production release
|
Provide a reasonable seeded hashing algorithm for default use.
bslh::DefaultSeededHashAlgorithm
provides an unspecified default seeded hashing algorithm. The supplied algorithm is suitable for general purpose use in a hash table. The underlying algorithm is subject to change in future releases.
This class satisfies the requirements for seeded bslh
hashing algorithms, defined in bslh_seededhash.h
. More information can be found in the package level documentation for bslh
(internal users can also find information here {TEAM BDE:USING MODULAR HASHING<GO>})
In this context "security" refers to the ability of the algorithm to produce hashes that are not predictable by an attacker. Security is a concern when an attacker may be able to provide malicious input into a hash table, thereby causing hashes to collide to buckets, which degrades performance. There are no security guarantees made by bslh::DefaultHashAlgorithm
, meaning attackers may be able to engineer keys that will cause a Denial of Service (DoS) attack in hash tables using this algorithm. Note that even if an attacker does not know the seed used to initialize this algorithm, they may still be able to produce keys that will cause a DoS attack in hash tables using this algorithm. If security is required, an algorithm that documents better secure properties should be used, such as bslh::SipHashAlgorithm
.
The default hash algorithm will compute a hash on the order of O(n) where n
is the length of the input data. Note that this algorithm will produce hashes fast enough to be used to hash keys in a hash table. The chosen algorithm will be quicker than specialized algorithms such as SipHash, but not as fast as hashing using the identity function.
The default hash algorithm will distribute hashes in a pseudo-random distribution across the key space. The hash function will exhibit avalanche behavior, meaning changing one bit of input will result in a 50% chance of each output bit changing. Avalanche behavior is enough to guarantee good key distribution, even when values are consecutive.
The default hash algorithm guarantees only that hashes will remain consistent within a single process, meaning different hashes may be produced on machines of different endianness or even between runs on the same machine. Therefor it is not recommended to send hashes from bslh::DefaultSeededHashAlgorithm
over a network. It is also not recommended to write hashes from bslh::DefaultSeededHashAlgorithm
to any memory accessible by multiple machines.
This section illustrates intended usage of this component.
Suppose we have any array of types that define operator==
, and we want a fast way to find out if values are contained in the array. We can create a HashTable
data structure that is capable of looking up values in O(1) time.
Further suppose that we will be storing futures (the financial instruments) in this table. Since futures have standardized names, we don't have to worry about any malicious values causing collisions. We will want to use a general purpose hashing algorithm with a good hash distribution and good speed. This algorithm will need to be in the form of a hash functor – an object that will take objects stored in our array as input, and yield an integer value. The functor can pass the attributes of TYPE
that are salient to hashing into the hashing algorithm, and then return the hash that is produced.
We can use the result of the hash function to index into our array of buckets
. Each bucket
is simply a pointer to a value in our original array of TYPE
objects.
First, we define our HashTable
template class, with the two type parameters: TYPE
(the type being referenced) and HASHER
(a functor that produces the hash).
Then, we define a Future
class, which holds a c-string name
, char callMonth
, and short callYear
.
Next, we need a hash functor for Future
. We are going to use the DefaultSeededHashAlgorithm
because it is a fast, general purpose hashing algorithm that will provide an easy way to combine the attributes of Future
objects that are salient to hashing into one reasonable hash that will distribute the items evenly throughout the hash table. Moreover, when a new hashing algorithm is discovered to be a better default, we can be automatically be upgraded to use it as soon as bslh::DefaultSeededHashAlgorithm
is updated.
Then, we use a bslh::SeedGenerator
combined with a RNG (implementation not shown), to generate the seeds for our algorithm.
Next, after seeding our algorithm, we pass data into it and operate on it just as easily as for a non-seeded algorithm
Then, we want to actually use our hash table on Future
objects. We create an array of Future
s based on data that was originally from some external source:
Next, we create our HashTable hashTable
. We pass the functor that we defined above as the second argument:
Now, we verify that each element in our array registers with count:
Finally, we verify that futures not in our original array are correctly identified as not being in the set: