BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bslh_wyhashincrementalalgorithm

Macros

#define BSLH_WYHASHINCREMENTALALGORITHM_WYMUM_XOR   0
 
#define BSLH_WYHASHINCREMENTALALGORITHM_WYMUM_PSEUDO_MULTIPLY   0
 

Detailed Description

Outline

Purpose

Provide an implementation of the WyHash algorithm final v3.

Classes

See also
bslh_hash

Description

bslh::WyHashIncrementalAlgorithm implements the WyHash algorithm by Wang Yi et al (see implementation file for full list of authors) with modifications. This algorithm is known to be very fast yet have good avalanche behavior.

The original algorithm was downloaded from https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h which had been updated on September 14, 2021, last commit 166f352, and modified to conform to BDE coding conventions with no change in the binary results produced.

The modifications are:

Security

WyHash is not a "Cryptographically Secure" hash. It is "Cryptographically Strong", but not "Cryptographically Secure". In order to be cryptographically secure, an algorithm must, among other things, provide "Collision Resistance", described in https://en.wikipedia.org/wiki/Collision_resistance , meaning that it should be difficult to find two different messages m1 and m2 such that hash(m1) == hash(m2). Because of the limited sized output (only 2**64 possibilities) and the fast execution time of the algorithm, it is probable to find two such values searching only about sqrt(2**64) == 2**32 inputs, which wont take long.

WyHash is, however, a cryptographically strong PRF (pseudo-random function). This means, assuming a cryptographically secure random seed is given, the output of this algorithm will be indistinguishable from a uniform random distribution. This property is enough for the algorithm to be able to protect a hash table from malicious Denial of Service (DoS) attacks.

Denial of Service (DoS) Protection

Given a cryptographically secure seed, this algorithm will produce hashes with a distribution that is indistinguishable from random. This distribution means that there is no way for an attacker to predict which keys will cause collisions, meaning that this algorithm can help mitigate Denial of Service (DoS) attacks on a hash table. DoS attacks occur when an attacker deliberately degrades the performance of the hash table by inserting data that will collide to the same bucket, causing an average constant time lookup to become a linear search. This protection is only effective if the seed provided is a cryptographically secure random number that is not available to the attacker.

Hash Distribution

Output hashes will be well distributed and will avalanche, which means changing one bit of the input will change approximately 50% of the output bits. This will prevent similar values from funneling to the same hash or bucket.

Alignment-Independence

The value obtained by hashing a segment of memory is independent of the alignment of the segment of memory.

Subdivision-Invariance

Note that this algorithm is subdivision-invariant (see {bslh_hash |Subdivision-Invariance}).

Speed

This algorithm is at least 2X faster than Spooky on all sizes of objects on Linux, Windows, and Solaris. On Aix it is about twice as fast as Spooky on small objects and about 50% slower on large objects.

Usage

This section illustrates intended usage of this component.

Example: Creating and Using a Hash Table

Suppose we have any array of types that define operator==, and we want a fast way to find out if values are contained in the array. We can create a HashTable data structure that is capable of looking up values in O(1) time.

Further suppose that we will be storing futures (the financial instruments) in this table. Since futures have standardized names, we don't have to worry about any malicious values causing collisions. We will want to use a general purpose hashing algorithm with a good hash distribution and good speed. This algorithm will need to be in the form of a hash functor – an object that will take objects stored in our array as input, and yield a 64-bit int value. The functor can pass the attributes of the TYPE that are salient to hashing into the hashing algorithm, and then return the hash that is produced.

We can use the result of the hash function to index into our array of buckets. Each bucket is simply a pointer to a value in our original array of TYPE objects.

First, we define our HashTable template class, with the two type parameters: TYPE (the type being referenced) and HASHER (a functor that produces the hash).

template <class TYPE, class HASHER>
class HashTable {

This class template implements a hash table providing fast lookup of an external, non-owned, array of values of (template parameter) TYPE.

The (template parameter) TYPE shall have a transitive, symmetric operator== function. There is no requirement that it have any kind of creator defined.

The HASHER template parameter type must be a functor with a method having the following signature:

size_t operator()(TYPE) const;
-OR-
size_t operator()(const TYPE&) const;

and HASHER shall have a publicly accessible default constructor and destructor.

Note that this hash table has numerous simplifications because we know the size of the array and never have to resize the table.

// DATA
const TYPE *d_values; // Array of values table is to
// hold
size_t d_numValues; // Length of 'd_values'.
const TYPE **d_bucketArray; // Contains ptrs into 'd_values'
size_t d_bucketArrayMask; // Will always be '2^N - 1'.
HASHER d_hasher; // User supplied hashing algorithm
private:
// PRIVATE ACCESSORS
bool lookup(size_t *idx,
const TYPE& value,
size_t hashValue) const;
// Look up the specified 'value', having the specified 'hashValue',
// and load its index in 'd_bucketArray' into the specified 'idx'.
// If not found, return the vacant entry in 'd_bucketArray' where
// it should be inserted. Return 'true' if 'value' is found and
// 'false' otherwise.
public:
// CREATORS
HashTable(const TYPE *valuesArray,
size_t numValues);
// Create a hash table referring to the specified 'valuesArray'
// having length of the specified 'numValues'. No value in
// 'valuesArray' shall have the same value as any of the other
// values in 'valuesArray'
~HashTable();
// Free up memory used by this hash table.
// ACCESSORS
bool contains(const TYPE& value) const;
// Return true if the specified 'value' is found in the table and
// false otherwise.
};
// PRIVATE ACCESSORS
template <class TYPE, class HASHER>
bool HashTable<TYPE, HASHER>::lookup(size_t *idx,
const TYPE& value,
size_t hashValue) const
{
const TYPE *ptr;
for (*idx = hashValue & d_bucketArrayMask; (ptr = d_bucketArray[*idx]);
*idx = (*idx + 1) & d_bucketArrayMask) {
if (value == *ptr) {
return true; // RETURN
}
}
// value was not found in table
return false;
}
// CREATORS
template <class TYPE, class HASHER>
HashTable<TYPE, HASHER>::HashTable(const TYPE *valuesArray,
size_t numValues)
: d_values(valuesArray)
, d_numValues(numValues)
, d_hasher()
{
size_t bucketArrayLength = 4;
while (bucketArrayLength < numValues * 4) {
bucketArrayLength *= 2;
}
d_bucketArrayMask = bucketArrayLength - 1;
d_bucketArray = new const TYPE *[bucketArrayLength];
memset(d_bucketArray, 0, bucketArrayLength * sizeof(TYPE *));
for (unsigned i = 0; i < numValues; ++i) {
const TYPE& value = d_values[i];
size_t idx;
const bool found = lookup(&idx, value, d_hasher(value));
BSLS_ASSERT_OPT(!found); (void) found;
d_bucketArray[idx] = &d_values[i];
}
}
template <class TYPE, class HASHER>
HashTable<TYPE, HASHER>::~HashTable()
{
delete [] d_bucketArray;
}
// ACCESSORS
template <class TYPE, class HASHER>
bool HashTable<TYPE, HASHER>::contains(const TYPE& value) const
{
size_t idx;
return lookup(&idx, value, d_hasher(value));
}
#define BSLS_ASSERT_OPT(X)
Definition bsls_assert.h:1856

Then, we define a Future class, which holds a c-string name, char callMonth, and short callYear.

class Future {

This class identifies a future contract. It tracks the name, call month and year of the contract it represents, and allows equality comparison.

// DATA
const char *d_name; // held, not owned
const char d_callMonth;
const short d_callYear;
public:
// CREATORS
Future(const char *name, const char callMonth, const short callYear)
: d_name(name), d_callMonth(callMonth), d_callYear(callYear)
// Create a 'Future' object out of the specified 'name',
// 'callMonth', and 'callYear'.
{}
Future() : d_name(""), d_callMonth('\0'), d_callYear(0)
// Create a 'Future' with default values.
{}
// ACCESSORS
const char * getMonth() const
// Return the month that this future expires.
{
return &d_callMonth;
}
const char * getName() const
// Return the name of this future
{
return d_name;
}
const short * getYear() const
// Return the year that this future expires
{
return &d_callYear;
}
bool operator==(const Future& rhs) const
// Compare this to the specified 'other' object and return true if
// they are equal
{
return (!strcmp(d_name, rhs.d_name)) &&
d_callMonth == rhs.d_callMonth &&
d_callYear == rhs.d_callYear;
}
};
bool operator!=(const Future& lhs, const Future& rhs)
// Compare compare the specified 'lhs' and 'rhs' objects and return
// true if they are not equal
{
return !(lhs == rhs);
}
bool operator!=(const FileCleanerConfiguration &lhs, const FileCleanerConfiguration &rhs)
bool operator==(const FileCleanerConfiguration &lhs, const FileCleanerConfiguration &rhs)

Next, we need a hash functor for Future. We are going to use the SpookyHashAlgorithm because it is a fast, general purpose hashing algorithm that will provide an easy way to combine the attributes of Future objects that are salient to hashing into one reasonable hash that will distribute the items evenly throughout the hash table.

struct HashFuture {
// This struct is a functor that will apply the 'SpookyHashAlgorithm'
// to objects of type 'Future'.
HashFuture()
{
// Generate random bits in 'd_seed' based on the time of day in
// nanoseconds.
const int iterations = static_cast<int>(nano & 7) + 1;
for (int ii = 0; ii < iterations; ++ii) {
totalNanoseconds();
nano += nano >> 32;
}
BSLMF_ASSERT(sizeof(d_seed) <= sizeof(nano));
memcpy(&d_seed, &nano, sizeof(d_seed));
}
// MANIPULATOR
size_t operator()(const Future& future) const
// Return the hash of the of the specified 'future'. Note that
// this uses the 'SpookyHashAlgorithm' to quickly combine the
// attributes of 'Future' objects that are salient to hashing into
// a hash suitable for a hash table.
{
hash(future.getName(), strlen(future.getName()));
hash(future.getMonth(), sizeof(char));
hash(future.getYear(), sizeof(short));
return static_cast<size_t>(hash.computeHash());
}
};
Definition bslh_wyhashincrementalalgorithm.h:447
BSLS_KEYWORD_CONSTEXPR_CPP14 bsls::Types::Int64 totalNanoseconds() const
Definition bsls_timeinterval.h:1409
#define BSLMF_ASSERT(expr)
Definition bslmf_assert.h:229
static TimeInterval nowMonotonicClock()
unsigned long long Uint64
Definition bsls_types.h:137

Macro Definition Documentation

◆ BSLH_WYHASHINCREMENTALALGORITHM_WYMUM_PSEUDO_MULTIPLY

#define BSLH_WYHASHINCREMENTALALGORITHM_WYMUM_PSEUDO_MULTIPLY   0

◆ BSLH_WYHASHINCREMENTALALGORITHM_WYMUM_XOR

#define BSLH_WYHASHINCREMENTALALGORITHM_WYMUM_XOR   0