Quick Links: |
Provide algorithms for implementing a hash table. More...
Namespaces | |
namespace | bslalg |
bslalg::HashTableImpUtil | functions used to implement a hash table |
HashTableAnchor
, a type encapsulating the key data members of a hash table. HashTableImpUtil
operate on a HashTableAnchor
, which encapsulates the key data members of a hash table. A HashTableAnchor
has the address of a single, doubly linked list holding all the elements in the hash table, as well as the address of an array of buckets. Each bucket holds a reference to the first and last element in the linked-list whose adjusted hash value is equal to the index of the bucket. Further, the functions in this component ensure (and require) that all elements that fall within a bucket form a contiguous sequence in the linked list, as can be seen in the diagram below: FIG 1: a hash table holding 5 elements Hash Function: h(n) -> n [identity function] F: First Element L: Last Element 0 1 2 3 4 +-------+-------+-------+-------+-------+-- bucket array | F L | F L | F L | F L | F L | ... +--+-+--+-------+-------+--+-+--+-------+-- | \___ _________/ / \ \ / | V V V V ,-. ,-. ,-. ,-. ,-. doubly |---|0|---|0|---|3|---|3|---|3|--| linked-list `-' `-' `-' `-' `-'
h(k)
returning (integral) values of type size_t
, such that, for two different values of k1
and k2
, the probability that h(k1) == h(k2)
is true should approach 1.0 / numeric_limits<size_t>max()
(see 17.6.3.4 [hash.requirements]). Such a function h(k)
may return values within the entire range of values that can be described using size_t
, [0 .. numeric_limits<size_t>max()], however the array of buckets maintained by a hash table is typically significantly smaller than number_limits<size_t>max()
, therefore a hash-table implementation must adjust the returned hash function so that it falls in the valid range of bucket indices (typically either using an integer division or modulo operation) -- we refer to this as the adjusted hash value. Note that currently HashTableImpUtil
adjusts the value returned by a supplied hash function using operator%
(modulo), which is more resilient to pathological behaviors when used in conjunction with a hash function that may produce contiguous hash values (with the div
method lower order bits do not participate to the final adjusted value); however, the means of adjustment may change in the future. HashTableAnchor
objects, which describe the attributes of a hash table. The HashTableAnchor
objects supplied to HashTableImpUtil
are required to meet a series of constraints that are not enforced by the HashTableAnchor
type itself. A HashTableAnchor
object meeting these requirements is said to be "well-formed" and the method HashTableImpUtil::isWellFormed
returns true
for such an object. A HastTableAnchor
is considered well-formed for a particular key policy, KEY_CONFIG
, and hash functor, HASHER
, if all of the following are true: bslalg_bidirectionallinklistutil
). BidirectionalNode<KEY_CONFIG::ValueType>
[ bucket.first(), bucket.last() ]
contains all nodes in the hash table for which computeBucketIndex(HASHER(extractKey(link)
is the index of the bucket, and no other nodes. HashTableImpUtil
are template functions parameterized on the typename KEY_CONFIG
. KEY_CONFIG
template parameter must provide the following type aliases and functions: typedef <VALUE_TYPE> ValueType; // Alias for the type of the values stored by the 'BidirectionalNode' // elements in the hash table. typedef <KEY_TYPE> KeyType; // Alias for the type of the key value extracted from the 'ValueType' // stored in the 'BidirectionalNode' elements of a hash table. static const KeyType& extractKey(const ValueType& obj); // Return the 'KeyType' information associated with the specified // 'object'.
HashSet
that will provide a hash set for any type that has a copy constructor, a destructor, an equality comparator and a hash function. We inherit from the HashTableAnchor
class use the BidirectionalLinkListUtil
and HashTableImpUtil
classes to facilitate building the table: template <class KEY, class HASHER, class EQUAL> class HashSet : public bslalg::HashTableAnchor { // PRIVATE TYPES typedef bslalg::BidirectionalLink Link; typedef bslalg::BidirectionalNode<KEY> Node; typedef bslalg::HashTableBucket Bucket; typedef bslalg::BidirectionalLinkListUtil ListUtil; typedef bslalg::HashTableImpUtil ImpUtil; typedef std::size_t size_t; struct Policy { typedef KEY KeyType; typedef KEY ValueType; static const KeyType& extractKey(const ValueType& value) { return value; } }; // DATA double d_maxLoadFactor; unsigned d_numNodes; HASHER d_hasher; EQUAL d_equal; bslma::Allocator *d_allocator_p; // PRIVATE MANIPULATORS void grow(); // Roughly double the number of buckets, such that the number of // buckets shall always be '2^N - 1'. // PRIVATE ACCESSORS bool checkInvariants() const; // Perform sanity checks on this table, returning 'true' if all the // tests pass and 'false' otherwise. Note that many of the checks // are done with the 'ASSERTV' macro and will cause messages to be // written to the console. Node* find(const KEY& key, size_t hashCode) const; // Return a pointer to the node containing the specified 'key', and // 0 if no such node is in the table. private: // NOT IMPLEMENTED HashSet(const HashSet&, bslma::Allocator *); HashSet& operator=(const HashSet&); public: // CREATORS explicit HashSet(bslma::Allocator *allocator = 0); // Create a 'HashSet', using the specified 'allocator'. If no // allocator is specified, use the default allocator. ~HashSet(); // Destroy this 'HashSet', freeing all its memory. // MANIPULATORS bool insert(const KEY& key); // If the specfied 'key' is not in this hash table, add it, // returning 'true'. If it is already in the table, return 'false' // with no action taken. bool erase(const KEY& key); // If the specfied 'key' is in this hash table, remove it, // returning 'true'. If it is not found in the table, return // 'false' with no action taken. // ACCESSORS std::size_t count(const KEY& key) const; // Return 1 if the specified 'key' is in this table and 0 // otherwise. std::size_t size() const; // Return the number of discrete keys that are stored in this // table. }; // PRIVATE MANIPULATORS template <class KEY, class HASHER, class EQUAL> void HashSet<KEY, HASHER, EQUAL>::grow() { // 'bucketArraySize' will always be '2^N - 1', so that if hashed values // are aligned by some 2^N they're likely to be relatively prime to the // length of the hash table. d_allocator_p->deallocate(bucketArrayAddress()); size_t newBucketArraySize = bucketArraySize() * 2 + 1; setBucketArrayAddressAndSize((Bucket *) d_allocator_p->allocate( newBucketArraySize * sizeof(Bucket)), newBucketArraySize); ImpUtil::rehash<Policy, HASHER>(this, listRootAddress(), d_hasher); } // PRIVATE ACCESSORS template <class KEY, class HASHER, class EQUAL> bool HashSet<KEY, HASHER, EQUAL>::checkInvariants() const {
HashTableImpUtil
s isWellFormed
will verify that all nodes are in their proper buckets, that there are no buckets containing nodes that are not in the main linked list, and no nodes in the main linked list that are not in buckets. To verify that d_numNodes
is correct we have to traverse the list and count the nodes ourselves. size_t numNodes = 0; for (BidirectionalLink *cursor = listRootAddress; cursor; cursor = cursor->nextLink()) { ++numNodes; } return size() == numNodes && ImpUtil::isWellFormed<Policy, HASHER>(this, d_allocator_p); } template <class KEY, class HASHER, class EQUAL> bslalg::BidirectionalNode<KEY> *HashSet<KEY, HASHER, EQUAL>::find( const KEY& key, std::size_t hashCode) const { return (Node *) ImpUtil::find<Policy, EQUAL>(*this, key, d_equal, hashCode); } // CREATORS template <class KEY, class HASHER, class EQUAL> HashSet<KEY, HASHER, EQUAL>::HashSet(bslma::Allocator *allocator) : HashTableAnchor(0, 0, 0) , d_maxLoadFactor(0.4) , d_numNodes(0) { enum { NUM_BUCKETS = 3 }; // 'NUM_BUCKETS' must be '2^N - 1' for // some 'N'. d_allocator_p = bslma::Default::allocator(allocator); std::size_t bucketArraySizeInBytes = NUM_BUCKETS * sizeof(Bucket); setBucketArrayAddressAndSize( (Bucket *) d_allocator_p->allocate(bucketArraySizeInBytes), NUM_BUCKETS); memset(bucketArrayAddress(), 0, bucketArraySizeInBytes); } template <class KEY, class HASHER, class EQUAL> HashSet<KEY, HASHER, EQUAL>::~HashSet() { BSLS_ASSERT_SAFE(checkInvariants()); for (Link *link = listRootAddress(); link; ) { Node *toDelete = (Node *) link; link = link->nextLink(); toDelete->value().~KEY(); d_allocator_p->deallocate(toDelete); } d_allocator_p->deallocate(bucketArrayAddress()); } // MANIPULATORS template <class KEY, class HASHER, class EQUAL> bool HashSet<KEY, HASHER, EQUAL>::erase(const KEY& key) { size_t hashCode = d_hasher(key); Node *node = find(key, hashCode); if (!node) { return false; // RETURN } size_t bucketIdx = ImpUtil::computeBucketIndex(hashCode, bucketArraySize()); Bucket& bucket = bucketArrayAddress()[bucketIdx]; BSLS_ASSERT_SAFE(bucket.first() && bucket.last()); if (bucket.first() == node) { if (bucket.last() == node) { bucket.reset(); } else { bucket.setFirst(node->nextLink()); } } else if (bucket.last() == node) { bucket.setLast(node->previousLink()); } if (listRootAddress() == node) { setListRootAddress(node->nextLink()); } ListUtil::unlink(node); node->value().~KEY(); d_allocator_p->deallocate(node); --d_numNodes; BSLS_ASSERT_SAFE(checkInvariants()); return true; } template <class KEY, class HASHER, class EQUAL> bool HashSet<KEY, HASHER, EQUAL>::insert(const KEY& key) { size_t hashCode = d_hasher(key); if (find(key, hashCode)) { // Already in set, do nothing. return false; // RETURN } if (bucketArraySize() * d_maxLoadFactor < d_numNodes + 1) { grow(); } ++d_numNodes; Node *node = (Node *) d_allocator_p->allocate(sizeof(Node)); bslalg::ScalarPrimitives::copyConstruct(&node->value(), key, d_allocator_p); ImpUtil::insertAtBackOfBucket(this, node, hashCode); BSLS_ASSERT_SAFE(find(key, hashCode)); BSLS_ASSERT_SAFE(checkInvariants()); return true; } // ACCESSORS template <class KEY, class HASHER, class EQUAL> std::size_t HashSet<KEY, HASHER, EQUAL>::count(const KEY& key) const { return 0 != find(key, d_hasher(key)); } template <class KEY, class HASHER, class EQUAL> std::size_t HashSet<KEY, HASHER, EQUAL>::size() const { return d_numNodes; }
const char *
strings. We make the simplifying assumption that the strings pointed at by the const char *
s are longer-lived that the HashSet
will be. We must provide an equality comparator so that two copies, in different locations, of the same sequence of characters will evaluate equal: struct StringEqual { bool operator()(const char *lhs, const char *rhs) const { return !strcmp(lhs, rhs); } };
const char *
to a size_t
: struct StringHash { std::size_t operator()(const char *string) const; }; std::size_t StringHash::operator()(const char *string) const { enum { BITS_IN_SIZE_T = sizeof(size_t) * 8 }; std::size_t result = 0; for (int shift = 0; *string; ++string, shift = (shift + 7) % BITS_IN_SIZE_T) { unsigned char c = *string; if (shift <= BITS_IN_SIZE_T - 8) { result += c << shift; } else { result += c << shift; result += c >> (BITS_IN_SIZE_T - shift); } } return result; };
TestAllocator
s to use during our example: bslma::TestAllocator da("defaultAllocator"); bslma::DefaultAllocatorGuard defaultGuard(&da); bslma::TestAllocator ta("testAllocator");
main
, we create an instance of our HashSet
type, configured to contain const char *
strings: HashSet<const char *, StringHash, StringEqual> hs(&ta);
assert(1 == hs.insert("woof")); assert(1 == hs.insert("arf")); assert(1 == hs.insert("meow"));
insert
method returns false
to indicate that the insert was refused: assert(0 == hs.insert("woof"));
size
method to observe that there are 3 strings stored in our HashSet
: assert(3 == hs.size());
count
method to observe, specifically, which strings are and are not in our HashSet
: assert(1 == hs.count("woof")); assert(1 == hs.count("arf")); assert(1 == hs.count("meow")); assert(0 == hs.count("ruff")); assert(0 == hs.count("chomp"));
HashSet
and observe that false
is returned, which tells us the erase
attempt was unsuccessful: assert(0 == hs.erase("ruff"));
HashSet
and observe that true
is returned, telling us the erase
attempt succeeded: assert(1 == hs.erase("meow"));
size
method to verify there are 2 strings remaining in our HashSet
: assert(2 == hs.size());
count
method to observe specifically which strings are still in our HashSet
. Note that "meow" is no longer there. We observe that the default allocator was never used. When we leave the block, our HashSet
will be destroyed, freeing its memory, then our TestAllocator
will be destroyed, verifying that our destructor worked correctly and that no memory was leaked: assert(1 == hs.count("woof")); assert(1 == hs.count("arf")); assert(0 == hs.count("meow")); assert(0 == hs.count("ruff")); assert(0 == hs.count("chomp")); assert(0 == da.numAllocations());