BDE 4.14.0 Production release
|
Provide an implementation of the SipHash algorithm.
bslh::SipHashAlgorithm
implements the SipHash algorithm. SipHash is an algorithm designed for speed and security. A primary use case for this algorithm is to provide an extra line of defense in hash tables (such as the hash table that is used to implement unordered_map
) against malicious input that could cause Denial of Service (DoS) attacks. It is based on one of the finalists for the SHA-3 cryptographic hash standard. Full details of the hash function can be found here: {https://131002.net/siphash/siphash.pdf}
. This particular implementation has been derived from siphash.h
in Howard Hinnant's work here: {https://github.com/HowardHinnant/hash_append}
and as much of the original code as possible, including comment headers, has been preserved.
This class satisfies the requirements for seeded bslh
hashing algorithms, defined in bslh_seededhash.h
. More information can be found in the package level documentation for bslh
(internal users can also find information here {TEAM BDE:USING MODULAR HASHING<GO>})
SipHash is not a cryptographically secure hash. In the paper linked above, the creators of this hash function describe it as "Cryptographically
Strong", but explicitly avoid calling it cryptographically secure. In order to be cryptographically secure, an algorithm must, among other things, provide "collision resistance". "Collision resistance" means that it should be difficult to find two different messages m1 and m2 such that hash(m1) == hash(m2)
. Because of the limited sized output (only 2^64 possibilities) and the fast execution time of the algorithm, it is feasible to find collisions by brute force, making the algorithm not cryptographically secure.
SipHash is, however, a cryptographically strong PRF (pseudo-random function). This means, assuming a cryptographically secure random seed is given, the output of this algorithm will be indistinguishable from a uniform random distribution. This property is enough for the algorithm to be able to protect a hash table from malicious Denial of Service (DoS) attacks.
Given a cryptographically secure seed, this algorithm will produce hashes with a distribution that is indistinguishable from random. This distribution means that there is no way for an attacker to predict which keys will cause collisions, meaning that this algorithm can help mitigate Denial of Service (DoS) attacks on a hash table. DoS attacks occur when an attacker deliberately degrades the performance of the hash table by inserting data that will collide to the same bucket, causing an average constant time lookup to become a linear search. This protection is only effective if the seed provided is a cryptographically secure random number that is not available to the attacker.
This algorithm is designed to be fast in comparison to other algorithms making similar guarantees. It is still slower than other commonly accepted and used hashes such as WyHash or SpookyHash. This algorithm should only be used when protection from malicious input is required. Otherwise, an algorithm that documents better performance properties should be used, such as bslh::WyHashAlgorithm
.
Output hashes will be well distributed and will avalanche, which means changing one bit of the input will change approximately 50% of the output bits. This will prevent similar values from funneling to the same hash or bucket.
This hash algorithm is endian-independent. The hashes produced for a given 16-byte key sequence and given data will be the same on big-endian and little-endian platforms. (In the literature the key is sometimes presented as a large integer or sequence of integers, such as 0xDEADBEEF, 0xCAFEBABE, 0x8BADF00D, 0x1BADB002
, in which case care must be taken to supply the individual key bytes in the same order on both platforms if the same hash results are desired.) However, if the "given data" is not just a character string but has internal structure, such as being integral or floating-point, it is likely ordered in different ways depending on the platform, and thus will not hash to the same value.
This section illustrates intended usage of this component.
Suppose we have any array of types that define operator==
, and we want a fast way to find out if values are contained in the array. We can create a HashTable
data structure that is capable of looking up values in O(1) time.
Further suppose that we will be storing arbitrary user input in our table. It is possible that an attacker with knowledge of the hashing algorithm we are using could specially craft input that will cause collisions in our hash table, degrading performance to O(n). To avoid this we will need to use a secure hash algorithm with a random seed. This algorithm will need to be in the form of a hash functor – an object that will take objects stored in our array as input, and yield an integer value which is hard enough for an outside observer to predict that it appear random. The functor can pass the attributes of the TYPE
that are salient to hashing into the hashing algorithm, and then return the hash that is produced.
We can use the result of the hash function to index into our array of buckets
. Each bucket
is simply a pointer to a value in our original array of TYPE
objects.
First, we define our HashTable
template class, with the two type parameters: TYPE
(the type being referenced) and HASHER
(a functor that produces the hash).
Then, we define a Future
class, which holds a cstring name
, char callMonth
, and short callYear
. This class can be used to store custom futures that the users have uploaded.
Next, we need a hash functor for Future
. We are going to use the SipHashAlgorithm
because, it is a secure hash algorithm that will provide a way to securely combine the attributes of Future
objects that are salient to hashing into one reasonable hash that an malicious user will not be able to predict.
Then, we want to actually use our hash table on Future
objects. We create an array of Future
s based on data that was originally from some external source:
Now, we verify that each element in our array registers with count:
Finally, we verify that futures not in our original array are correctly identified as not being in the set:
The third party code begins with the siphash.h
header below, and continues until the TYPE TRAITS banner below. Changes made to the original code include:
BloombergLP
and bslh
namespacessiphash
to SipHashAlgorithm
noexcept
, std::Uint64_t
, explicit conversion operator, and an = default
constructor.typedef
to replace removed std::Uint64_t
computeHash()
to replace the removed explicit conversionk_SEED_LENGTH
and changed the constructor to accept a const char *
include
guardssize_t
rather than unsigned int
----------------------------— siphash.h --------------------------------—
This software is in the public domain. The only restriction on its use is that no one can remove it from the public domain by claiming ownership of it, including the original authors.
There is no warranty of correctness on the software contained herein. Use at your own risk.
Derived from:
SipHash reference C implementation
Written in 2012 by Jean-Philippe Aumasson jeanp.nosp@m.hili.nosp@m.ppe.a.nosp@m.umas.nosp@m.son@g.nosp@m.mail.nosp@m..com Daniel J. Bernstein djb@c.nosp@m.r.yp.nosp@m..to
To the extent possible under law, the author(s) have dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty.
You should have received a copy of the CC0 Public Domain Dedication along with this software. If not, see http://creativecommons.org/publicdomain/zero/1.0/.