BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bslalg_hashtableimputil

Detailed Description

Outline

Purpose

Provide algorithms for implementing a hash table.

Classes

See also
bslalg_bidirectionallinklistutil, bslalg_hashtableanchor, bslstl_hashtable

Description

This component provides a namespace for utility functions used to implement a hash table container. Almost all the functions provided by this component operate on a HashTableAnchor, a type encapsulating the key data members of a hash table.

Hash Table Structure

The utilities provided by this component are used to create and manipulate a hash table that resolves collisions using a linked-list of elements (i.e., chaining). Many of the operations provided by HashTableImpUtil operate on a HashTableAnchor, which encapsulates the key data members of a hash table. A HashTableAnchor has the address of a single, doubly linked list holding all the elements in the hash table, as well as the address of an array of buckets. Each bucket holds a reference to the first and last element in the linked-list whose adjusted hash value is equal to the index of the bucket. Further, the functions in this component ensure (and require) that all elements that fall within a bucket form a contiguous sequence in the linked list, as can be seen in the diagram below:

FIG 1: a hash table holding 5 elements
Hash Function: h(n) -> n [identity function]
F: First Element
L: Last Element
0 1 2 3 4
+-------+-------+-------+-------+-------+--
bucket array | F L | F L | F L | F L | F L | ...
+--+-+--+-------+-------+--+-+--+-------+--
| \___ _________/ /
\ \ / |
V V V V
,-. ,-. ,-. ,-. ,-.
doubly |---|0|---|0|---|3|---|3|---|3|--|
linked-list `-' `-' `-' `-' `-'

Hash Function and the Adjusted Hash Value

The C++11 standard defines a hash function as a function h(k) returning (integral) values of type size_t, such that, for two different values of k1 and k2, the probability that h(k1) == h(k2) is true should approach 1.0 / numeric_limits<size_t>::max() (see 17.6.3.4 [hash.requirements]). Such a function h(k) may return values within the entire range of values that can be described using size_t, [0 .. numeric_limits<size_t>::max()], however the array of buckets maintained by a hash table is typically significantly smaller than number_limits<size_t>::max(), therefore a hash-table implementation must adjust the returned hash function so that it falls in the valid range of bucket indices (typically either using an integer division or modulo operation) – we refer to this as the adjusted hash value. Note that currently HashTableImpUtil adjusts the value returned by a supplied hash function using operator% (modulo), which is more resilient to pathological behaviors when used in conjunction with a hash function that may produce contiguous hash values (with the div method lower order bits do not participate to the final adjusted value); however, the means of adjustment may change in the future.

Well-Formed HashTableAnchor Objects

Many of the algorithms defined in this component operate on HashTableAnchor objects, which describe the attributes of a hash table. The HashTableAnchor objects supplied to HashTableImpUtil are required to meet a series of constraints that are not enforced by the HashTableAnchor type itself. A HashTableAnchor object meeting these requirements is said to be well-formed and the method HashTableImpUtil::isWellFormed returns true for such an object. A HastTableAnchor is considered well-formed for a particular key policy, KEY_CONFIG, and hash functor, HASHER, if all of the following are true:

  1. The list refers to a well-formed doubly linked list (see bslalg_bidirectionallinklistutil ).
  2. Each link in the list is an object of type BidirectionalNode<KEY_CONFIG::ValueType>
  3. For each bucket, the range of nodes [ bucket.first(), bucket.last() ] contains all nodes in the hash table for which computeBucketIndex(HASHER(extractKey(link) is the index of the bucket, and no other nodes.

KEY_CONFIG Template Parameter

Several of the operations provided by HashTableImpUtil are template functions parameterized on the typename KEY_CONFIG.

KEY_CONFIG

The KEY_CONFIG template parameter must provide the following type aliases and functions:

/// Alias for the type of the values stored by the `BidirectionalNode`
/// elements in the hash table.
typedef <VALUE_TYPE> ValueType;
/// Alias for the type of the key value extracted from the `ValueType`
/// stored in the `BidirectionalNode` elements of a hash table.
typedef <KEY_TYPE> KeyType;
/// Return the `KeyType` information associated with the specified
/// `object`.
static const KeyType& extractKey(const ValueType& obj);

Usage

This section illustrates intended usage of this component.

Example 1: Creating and Using a Hash Set

Suppose we want to build a hash set that will keep track of keys stored in set.

First, we define an abstract template class HashSet that will provide a hash set for any type that has a copy constructor, a destructor, an equality comparator and a hash function. We inherit from the HashTableAnchor class use the BidirectionalLinkListUtil and HashTableImpUtil classes to facilitate building the table:

template <class KEY, class HASHER, class EQUAL>
class HashSet : public bslalg::HashTableAnchor {
// PRIVATE TYPES
typedef bslalg::HashTableBucket Bucket;
typedef bslalg::HashTableImpUtil ImpUtil;
typedef std::size_t size_t;
struct Policy {
typedef KEY KeyType;
typedef KEY ValueType;
static const KeyType& extractKey(const ValueType& value)
{
return value;
}
};
// DATA
double d_maxLoadFactor;
unsigned d_numNodes;
HASHER d_hasher;
EQUAL d_equal;
bslma::Allocator *d_allocator_p;
// PRIVATE MANIPULATORS
/// Roughly double the number of buckets, such that the number of
/// buckets shall always be `2^N - 1`.
void grow();
// PRIVATE ACCESSORS
/// Perform sanity checks on this table, returning 'true' if all the
/// tests pass and 'false' otherwise. Note that many of the checks
/// are done with the 'ASSERTV' macro and will cause messages to be
/// written to the console.
bool checkInvariants() const;
/// Return a pointer to the node containing the specified 'key', and
/// 0 if no such node is in the table.
Node* find(const KEY& key, size_t hashCode) const;
private:
// NOT IMPLEMENTED
HashSet(const HashSet&, bslma::Allocator *);
HashSet& operator=(const HashSet&);
public:
// CREATORS
/// Create a `HashSet`, using the specified `allocator`. If no
/// allocator is specified, use the default allocator.
explicit
HashSet(bslma::Allocator *allocator = 0);
// Destroy this `HashSet`, freeing all its memory.
~HashSet();
// MANIPULATORS
/// If the specfied `key` is not in this hash table, add it,
/// returning 'true'. If it is already in the table, return `false`
/// with no action taken.
bool insert(const KEY& key);
/// If the specfied `key` is in this hash table, remove it,
/// returning `true`. If it is not found in the table, return
/// `false` with no action taken.
bool erase(const KEY& key);
// ACCESSORS
/// Return 1 if the specified `key` is in this table and 0
/// otherwise.
std::size_t count(const KEY& key) const;
/// Return the number of discrete keys that are stored in this
/// table.
std::size_t size() const;
};
// PRIVATE MANIPULATORS
template <class KEY, class HASHER, class EQUAL>
void HashSet<KEY, HASHER, EQUAL>::grow()
{
// `bucketArraySize` will always be `2^N - 1`, so that if hashed values
// are aligned by some 2^N they're likely to be relatively prime to the
// length of the hash table.
d_allocator_p->deallocate(bucketArrayAddress());
size_t newBucketArraySize = bucketArraySize() * 2 + 1;
setBucketArrayAddressAndSize((Bucket *) d_allocator_p->allocate(
newBucketArraySize * sizeof(Bucket)),
newBucketArraySize);
ImpUtil::rehash<Policy, HASHER>(this,
listRootAddress(),
d_hasher);
}
// PRIVATE ACCESSORS
template <class KEY, class HASHER, class EQUAL>
bool HashSet<KEY, HASHER, EQUAL>::checkInvariants() const
{
Definition bslalg_bidirectionalnode.h:357
Definition bslalg_hashtableanchor.h:541
Definition bslma_allocator.h:457
bsl::size_t size(const TYPE &array)
Return the number of elements in the specified array.
deque< VALUE_TYPE, ALLOCATOR >::size_type erase(deque< VALUE_TYPE, ALLOCATOR > &deq, const BDE_OTHER_TYPE &value)
Definition bslstl_deque.h:4126
Definition bslalg_bidirectionallinklistutil.h:237
Definition bslalg_hashtablebucket.h:297
Definition bslalg_hashtableimputil.h:613

HashTableImpUtils isWellFormed will verify that all nodes are in their proper buckets, that there are no buckets containing nodes that are not in the main linked list, and no nodes in the main linked list that are not in buckets. To verify that d_numNodes is correct we have to traverse the list and count the nodes ourselves.

size_t numNodes = 0;
for (BidirectionalLink *cursor = listRootAddress;
cursor; cursor = cursor->nextLink()) {
++numNodes;
}
return size() == numNodes &&
ImpUtil::isWellFormed<Policy, HASHER>(this, d_allocator_p);
}
template <class KEY, class HASHER, class EQUAL>
bslalg::BidirectionalNode<KEY> *HashSet<KEY, HASHER, EQUAL>::find(
const KEY& key,
std::size_t hashCode) const
{
return (Node *) ImpUtil::find<Policy, EQUAL>(*this,
key,
d_equal,
hashCode);
}
// CREATORS
template <class KEY, class HASHER, class EQUAL>
HashSet<KEY, HASHER, EQUAL>::HashSet(bslma::Allocator *allocator)
: HashTableAnchor(0, 0, 0)
, d_maxLoadFactor(0.4)
, d_numNodes(0)
{
enum { NUM_BUCKETS = 3 }; // 'NUM_BUCKETS' must be '2^N - 1' for
// some 'N'.
d_allocator_p = bslma::Default::allocator(allocator);
std::size_t bucketArraySizeInBytes = NUM_BUCKETS * sizeof(Bucket);
setBucketArrayAddressAndSize(
(Bucket *) d_allocator_p->allocate(bucketArraySizeInBytes),
NUM_BUCKETS);
memset(bucketArrayAddress(), 0, bucketArraySizeInBytes);
}
template <class KEY, class HASHER, class EQUAL>
HashSet<KEY, HASHER, EQUAL>::~HashSet()
{
BSLS_ASSERT_SAFE(checkInvariants());
for (Link *link = listRootAddress(); link; ) {
Node *toDelete = (Node *) link;
link = link->nextLink();
toDelete->value().~KEY();
d_allocator_p->deallocate(toDelete);
}
d_allocator_p->deallocate(bucketArrayAddress());
}
// MANIPULATORS
template <class KEY, class HASHER, class EQUAL>
bool HashSet<KEY, HASHER, EQUAL>::erase(const KEY& key)
{
size_t hashCode = d_hasher(key);
Node *node = find(key, hashCode);
if (!node) {
return false; // RETURN
}
size_t bucketIdx = ImpUtil::computeBucketIndex(hashCode,
bucketArraySize());
Bucket& bucket = bucketArrayAddress()[bucketIdx];
BSLS_ASSERT_SAFE(bucket.first() && bucket.last());
if (bucket.first() == node) {
if (bucket.last() == node) {
bucket.reset();
}
else {
bucket.setFirst(node->nextLink());
}
}
else if (bucket.last() == node) {
bucket.setLast(node->previousLink());
}
if (listRootAddress() == node) {
setListRootAddress(node->nextLink());
}
ListUtil::unlink(node);
node->value().~KEY();
d_allocator_p->deallocate(node);
--d_numNodes;
BSLS_ASSERT_SAFE(checkInvariants());
return true;
}
template <class KEY, class HASHER, class EQUAL>
bool HashSet<KEY, HASHER, EQUAL>::insert(const KEY& key)
{
size_t hashCode = d_hasher(key);
if (find(key, hashCode)) {
// Already in set, do nothing.
return false; // RETURN
}
if (bucketArraySize() * d_maxLoadFactor < d_numNodes + 1) {
grow();
}
++d_numNodes;
Node *node = (Node *) d_allocator_p->allocate(sizeof(Node));
key,
d_allocator_p);
ImpUtil::insertAtBackOfBucket(this, node, hashCode);
BSLS_ASSERT_SAFE(find(key, hashCode));
BSLS_ASSERT_SAFE(checkInvariants());
return true;
}
// ACCESSORS
template <class KEY, class HASHER, class EQUAL>
std::size_t HashSet<KEY, HASHER, EQUAL>::count(const KEY& key) const
{
return 0 != find(key, d_hasher(key));
}
template <class KEY, class HASHER, class EQUAL>
std::size_t HashSet<KEY, HASHER, EQUAL>::size() const
{
return d_numNodes;
}
#define BSLS_ASSERT_SAFE(X)
Definition bsls_assert.h:1762
static void copyConstruct(TARGET_TYPE *address, const TARGET_TYPE &original, bslma::Allocator *allocator)
Definition bslalg_scalarprimitives.h:1599
static Allocator * allocator(Allocator *basicAllocator=0)
Definition bslma_default.h:897

Then, we customize our table to manipulate zero-terminated const char * strings. We make the simplifying assumption that the strings pointed at by the const char *s are longer-lived that the HashSet will be. We must provide an equality comparator so that two copies, in different locations, of the same sequence of characters will evaluate equal:

struct StringEqual {
bool operator()(const char *lhs, const char *rhs) const
{
return !strcmp(lhs, rhs);
}
};

Next, we must provide a string hash function to convert a const char * to a size_t:

struct StringHash {
std::size_t operator()(const char *string) const;
};
std::size_t StringHash::operator()(const char *string) const
{
enum { BITS_IN_SIZE_T = sizeof(size_t) * 8 };
std::size_t result = 0;
for (int shift = 0; *string;
++string, shift = (shift + 7) % BITS_IN_SIZE_T) {
unsigned char c = *string;
if (shift <= BITS_IN_SIZE_T - 8) {
result += c << shift;
}
else {
result += c << shift;
result += c >> (BITS_IN_SIZE_T - shift);
}
}
return result;
};
basic_string< char > string
Definition bslstl_string.h:782

Then, we declare a couple of TestAllocators to use during our example:

bslma::TestAllocator da("defaultAllocator");
bslma::DefaultAllocatorGuard defaultGuard(&da);
bslma::TestAllocator ta("testAllocator");
Definition bslma_defaultallocatorguard.h:186
Definition bslma_testallocator.h:384

Next, in main, we create an instance of our HashSet type, configured to contain const char * strings:

HashSet<const char *, StringHash, StringEqual> hs(&ta);

Then, we insert a few values:

assert(1 == hs.insert("woof"));
assert(1 == hs.insert("arf"));
assert(1 == hs.insert("meow"));

Next, we attempt to insert a redundant value, and observe that the insert method returns false to indicate that the insert was refused:

assert(0 == hs.insert("woof"));

Then, we use to size method to observe that there are 3 strings stored in our HashSet:

assert(3 == hs.size());

Next, we use the count method to observe, specifically, which strings are and are not in our HashSet:

assert(1 == hs.count("woof"));
assert(1 == hs.count("arf"));
assert(1 == hs.count("meow"));
assert(0 == hs.count("ruff"));
assert(0 == hs.count("chomp"));

Then, we attempt to erase a string which is not in our HashSet and observe that false is returned, which tells us the erase attempt was unsuccessful:

assert(0 == hs.erase("ruff"));

Next, we erase the string "meow", which is stored in our HashSet and observe that true is returned, telling us the erase attempt succeeded:

assert(1 == hs.erase("meow"));

Now, we use the size method to verify there are 2 strings remaining in our HashSet:

assert(2 == hs.size());

Finally, we use the count method to observe specifically which strings are still in our HashSet. Note that "meow" is no longer there. We observe that the default allocator was never used. When we leave the block, our HashSet will be destroyed, freeing its memory, then our TestAllocator will be destroyed, verifying that our destructor worked correctly and that no memory was leaked:

assert(1 == hs.count("woof"));
assert(1 == hs.count("arf"));
assert(0 == hs.count("meow"));
assert(0 == hs.count("ruff"));
assert(0 == hs.count("chomp"));
assert(0 == da.numAllocations());