Quick Links: |
Provide a struct to run bslh
hash algorithms on supported types.
More...
Namespaces | |
namespace | bslh |
bslh
hash algorithms on supported types. bslh::Hash | functor that runs bslh hash algorithms on supported types |
struct
, bslh::Hash
, that defines a hash-functor that can be used with standard containers (a drop in replacement for bsl::hash
), and which applies the supplied (template parameter) HASH_ALGORITHM
to the attributes of the (template parameter) TYPE
which have been identified as salient to hashing. The bslh::Hash
template parameter HASH_ALGORITHM
must be a hashing algorithm that conforms to the requirements outlined below (see Requirements for Regular bslh
Hashing Algorithms). Note that there are several hashing algorithms defined within the bslh
package and some, such as those that require seeds, will not meet these requirements, meaning they cannot be used with bslh::Hash
. A call to bslh::Hash::operator()
for a (template parameter) TYPE
will call the hashAppend
free function for TYPE
and provide hashAppend
an instance of a hash algorithm in the bslh
namespace that will use the (template parameter) HASH_ALGORITHM
to compute hash values. hashAppend
definitions for fundamental types, which are required by algorithms defined in bslh
. Clients are expected to define a free-function hashAppend
for each of the types they wish to be hashable (see hashAppend
below). More information can be found in the package level documentation for bslh
(internal users can also find information here {TEAM BDE:USING MODULAR HASHING<GO>}) bslh::Hash
provides a modular system for hashing. This modularity refers to the decoupling of the various tasks associated with hashing. Using this system, type implementers can identify attributes of their type that are salient to hashing, without having to write a hashing algorithm. Conversely, hashing algorithms can be written independent of types. Attributes that are salient to hashing are called out on a type using hashAppend
. Hashing algorithms are written to operate on the attributes called out by hashAppend
. Some default algorithms have been provided in the bslh
package. This modularity allows type creators to avoid writing hashing algorithms, which can save work and avoid bad hashing algorithm implementations. hashAppend
is the free function that is used to pass attributes that are salient to hashing into a hashing algorithm. A custom type defined in another component must define it's own hashAppend
free function overload that can be discovered through ADL in order to be hashed using this facility. A simple implementation of an overload for hashAppend
might call hashAppend
on each of the type's fields that are salient to hashing. Note that when writing a hashAppend
function that itself calls hashAppend, 'using bslh::hashAppend;
must be included as the first line of code in the function. The using statement ensures that ADL will always be able to find the hashAppend
functions defined in this component for handling fundamental types even when the (template parameter) type HASH_ALGORITHM
is not implemented in bslh
. struct
, class
, or union
by providing an appropriate hashAppend
. In some very rare cases, a client will want to provide special behavior when hashing fundamental types such as integral types, pointers, or enums, and this can be done by providing an appropriate typed operator()
overload of your own hash function. Support for doing this is not provided for ther types, or for bool
. hashAppend
, such as types containing C-strings which are salient to hashing. These C-strings must be passed directly into the (template parameter) type HASH_ALGORITHM
, rather than calling hashAppend
with the pointer as an argument. This special case exists because calling hashAppend
with a pointer will hash the pointer rather than the data that is pointed to. hashAppend
has been implemented for all of the fundamental types, and for arrays. When hashAppend
is reached on a fundamental type, the hashing algorithm is no longer propagated, and instead a pointer to the beginning of the type in memory is passed to the algorithm, along with the length of the type. There are special cases with floating point numbers and boolean values where the data is tweaked before hashing to ensure that values that compare equal will be hashed with the same bit-wise representation. The algorithm will then incorporate the type into its internal state and return a finalized hash when requested. bslh
package that can be passed in and used as template parameters for bslh::Hash
or other struct
s like it. Some of these algorithms, such as bslh::SpookyHashAlgorithm
or bslh::WyHashAlgorithm
, are named for the algorithm they implement. These named algorithms are intended for use by those who want a specific algorithm. There are other algorithms, such as bslh::DefaultHashAlgorithm
, which wrap an unspecified algorithm and describe the properties of the wrapped algorithm. The descriptive algorithms are intended for use by those who need specific properties and want to be updated to a new algorithm when one is published with improvements to the desired properties. bslh::DefaultHashAlgorithm
has the property of being a good default algorithm, specifically for use in a hash table. bslh::Hash
, the user-implemented algorithms must implement the interface shown here: class SomeHashAlgorithm { public: // TYPES typedef Uint64 result_type; // CREATORS SomeHashAlgorithm(); // MANIPULATORS void operator()(const void *data, size_t numBytes); result_type computeHash(); };
result_type
typedef
must define the return type of this particular algorithm. A default constructor (either implicit or explicit) must be supplied that creates an algorithm functor that is in a usable state. An operator()
must be supplied that takes a const void *
to the data to be hashed and a size_t
length of bytes to be hashed. This operator must operate on all data uniformly, meaning that regardless of whether data is passed in all at once, or one byte at a time, the result returned by computeHash()
will be the same. computeHash()
will return the final result of the hashing algorithm, as type result_type
. computeHash()
is allowed to modify the internal state of the algorithm, meaning calling computeHash()
more than once may not return the correct value. operator()
. More precisely, for any subdivision of the message M
of size S
into N
segments M_i
of size S_i
, the following must be true: SomeHashAlgorithm algM; algM(M, S); SomeHashAlgorithm algI; for (int i = 0; i < N; ++i) { algI(M_i, S_i); } assert(algM.computeHash() == algI.computeHash());
Box
, that contains attributes that are salient to hashing as well as attributes that are not salient to hashing. Some of these attributes are themselves user defined types. We want to store objects of type Box
in a hash table, so we need to be able to produce hash values that represent instances of Box
. We don't want to write our own hashing or hash combine algorithm, because we know it is very difficult and labor-intensive to write a proper hashing algorithm. In order to hash this Box
, we will use the modular hashing system supplied in bslh
. Point
, a class that allows us to identify a location on a two dimensional Cartesian plane. class Point { // This class is a value-semantic type that represents a two // dimensional location on a Cartesian plane. private: int d_x; int d_y; double d_distToOrigin; // This value will be accessed frequently, so we // cache it rather than recalculate it every // time. public: Point (int x, int y); // Create a 'Point' having the specified 'x' and 'y' coordinates. double distanceToOrigin() const; // Return the distance from the origin (0, 0) to this point. int getX() const; // Return the x coordinate of this point. int getY() const; // Return the y coordinate of this point. }; inline Point::Point(int x, int y) : d_x(x) , d_y(y) { d_distToOrigin = sqrt(static_cast<double>(d_x * d_x) + static_cast<double>(d_y * d_y)); } inline double Point::distanceToOrigin() const { return d_distToOrigin; } inline int Point::getX() const { return d_x; } inline int Point::getY() const { return d_y; }
operator==
. Notice how it checks only attributes that we would want to incorporate into the hashed value. Note that attributes that are salient to hashing tend to be the same as or a subset of the attributes that are checked in operator==
. bool operator==(const Point& lhs, const Point& rhs) // Return true if the specified 'lhs' and 'rhs' have the same value. // Two 'Point' objects have the same value if they have the same x and // y coordinates. { return (lhs.getX() == rhs.getX()) && (lhs.getY() == rhs.getY()); }
hashAppend
. This function will allow any hashing algorithm that meets the bslh
hashing algorithm requirements to be applied to Point
. This is the full extent of the work that needs to be done by type creators. They do not need to implement any algorithms, they just need to call out the attributes that are salient to hashing by calling hashAppend
on them. template <class HASH_ALGORITHM> void hashAppend(HASH_ALGORITHM& hashAlg, const Point& point) // Apply the specified 'hashAlg' to the specified 'point' { using bslh::hashAppend; hashAppend(hashAlg, point.getX()); hashAppend(hashAlg, point.getY()); }
Box
that will have a Point
as one of its attributes that are salient to hashing. class Box { // This class is a value-semantic type that represents a box drawn on // to a Cartesian plane. private: Point d_position; int d_length; int d_width; public: Box(Point position, int length, int width); // Create a box having the specified 'length' and 'width', with its // upper left corner at the specified 'position' int getLength() const; // Return the length of this box. Point getPosition() const; // Return a 'Point' representing the upper left corner of this box // on a Cartesian plane int getWidth() const; // Return the width of this box. }; inline Box::Box(Point position, int length, int width) : d_position(position) , d_length(length) , d_width(width) { } int Box::getLength() const { return d_length; } Point Box::getPosition() const { return d_position; } int Box::getWidth() const { return d_width; }
operator==
. This time all of the data members are salient to equality. bool operator==(const Box& lhs, const Box& rhs) // Return true if the specified 'lhs' and 'rhs' have the same value. // Two 'Box' objects have the same value if they have the same length, // width, and position. { return (lhs.getPosition() == rhs.getPosition()) && (lhs.getLength() == rhs.getLength()) && (lhs.getWidth() == rhs.getWidth()); }
hashAppend
for Box
. Notice how as well as calling hashAppend
on fundamental types, we can also call it with our user defined type Point
. Calling hashAppend
with Point
will propogate a reference to the hashing algorithm functor hashAlg
down to the fundamental types that make up Point
, and those types will then be passed into the referenced algorithm functor. template <class HASH_ALGORITHM> void hashAppend(HASH_ALGORITHM& hashAlg, const Box& box) // Apply the specified 'hashAlg' to the specified 'box' { using bslh::hashAppend; hashAppend(hashAlg, box.getPosition()); hashAppend(hashAlg, box.getLength()); hashAppend(hashAlg, box.getWidth()); }
TYPE
(the type being referenced) and HASHER
(a functor that produces the hash). HASHER
will default to bslh::Hash<>
. template <class TYPE, class HASHER = bslh::Hash<> > class HashTable { // This class template implements a hash table providing fast lookup of // an external, non-owned, array of values of (template parameter) // 'TYPE'. // // The (template parameter) 'TYPE' shall have a transitive, symmetric // 'operator==' function and it will be hashable using 'bslh::Hash'. // Note that there is no requirement that it have any kind of creator // defined. // // The 'HASHER' template parameter type must be a functor with a method // having the following signature: //.. // size_t operator()(TYPE) const; // -OR- // size_t operator()(const TYPE&) const; //.. // and 'HASHER' shall have a publicly accessible default constructor // and destructor. Here we use 'bslh::Hash' as our default template // argument. This allows us to hash any type for which 'hashAppend' // has been implemented. // // Note that this hash table has numerous simplifications because we // know the size of the array and never have to resize the table. // DATA const TYPE *d_values; // Array of values table is to // hold size_t d_numValues; // Length of 'd_values'. const TYPE **d_bucketArray; // Contains ptrs into // 'd_values' size_t d_bucketArrayMask; // Will always be '2^N - 1'. HASHER d_hasher; private: // PRIVATE ACCESSORS bool lookup(size_t *idx, const TYPE& value, size_t hashValue) const; // Look up the specified 'value', having the specified 'hashValue', // and load its index in 'd_bucketArray' into the specified 'idx'. // If not found, return the vacant entry in 'd_bucketArray' where // it should be inserted. Return 'true' if 'value' is found and // 'false' otherwise. public: // CREATORS HashTable(const TYPE *valuesArray, size_t numValues); // Create a hash table referring to the specified 'valuesArray' // having length of the specified 'numValues'. No value in // 'valuesArray' shall have the same value as any of the other // values in 'valuesArray' ~HashTable(); // Free up memory used by this hash table. // ACCESSORS bool contains(const TYPE& value) const; // Return true if the specified 'value' is found in the table and // false otherwise. };
Box boxes[] = { Box(Point(1, 1), 3, 2), Box(Point(3, 1), 4, 2), Box(Point(1, 2), 3, 3), Box(Point(1, 1), 2, 2), Box(Point(1, 4), 4, 3), Box(Point(2, 1), 4, 2), Box(Point(1, 0), 3, 1)}; enum { NUM_BOXES = sizeof boxes / sizeof *boxes };
hashTable
. We pass we use the default functor which will pick up the hashAppend
function we created: HashTable<Box> hashTable(boxes, NUM_BOXES);
for ( int i = 0; i < 6; ++i) { ASSERT(hashTable.contains(boxes[i])); }
ASSERT(!hashTable.contains(Box(Point(1, 1), 1, 1))); ASSERT(!hashTable.contains(Box(Point(0, 0), 0, 0))); ASSERT(!hashTable.contains(Box(Point(3, 3), 3, 3)));