Outline
Purpose
Provide algorithms for implementing a hash table.
Classes
- See also
- bslalg_bidirectionallinklistutil, bslalg_hashtableanchor, bslstl_hashtable
Description
This component provides a namespace for utility functions used to implement a hash table container. Almost all the functions provided by this component operate on a HashTableAnchor
, a type encapsulating the key data members of a hash table.
Hash Table Structure
The utilities provided by this component are used to create and manipulate a hash table that resolves collisions using a linked-list of elements (i.e., chaining). Many of the operations provided by HashTableImpUtil
operate on a HashTableAnchor
, which encapsulates the key data members of a hash table. A HashTableAnchor
has the address of a single, doubly linked list holding all the elements in the hash table, as well as the address of an array of buckets. Each bucket holds a reference to the first and last element in the linked-list whose adjusted hash value is equal to the index of the bucket. Further, the functions in this component ensure (and require) that all elements that fall within a bucket form a contiguous sequence in the linked list, as can be seen in the diagram below:
FIG 1: a hash table holding 5 elements
Hash Function: h(n) -> n [identity function]
F: First Element
L: Last Element
0 1 2 3 4
+-------+-------+-------+-------+-------+--
bucket array | F L | F L | F L | F L | F L | ...
+--+-+--+-------+-------+--+-+--+-------+--
| \___ _________/ /
\ \ / |
V V V V
,-. ,-. ,-. ,-. ,-.
doubly |---|0|---|0|---|3|---|3|---|3|--|
linked-list `-' `-' `-' `-' `-'
Hash Function and the Adjusted Hash Value
The C++11 standard defines a hash function as a function h(k)
returning (integral) values of type size_t
, such that, for two different values of k1
and k2
, the probability that h(k1) == h(k2)
is true should approach 1.0 / numeric_limits<size_t>::max()
(see 17.6.3.4 [hash.requirements]). Such a function h(k)
may return values within the entire range of values that can be described using size_t
, [0 .. numeric_limits<size_t>::max()], however the array of buckets maintained by a hash table is typically significantly smaller than number_limits<size_t>::max()
, therefore a hash-table implementation must adjust the returned hash function so that it falls in the valid range of bucket indices (typically either using an integer division or modulo operation) – we refer to this as the adjusted hash value. Note that currently HashTableImpUtil
adjusts the value returned by a supplied hash function using operator%
(modulo), which is more resilient to pathological behaviors when used in conjunction with a hash function that may produce contiguous hash values (with the div
method lower order bits do not participate to the final adjusted value); however, the means of adjustment may change in the future.
Well-Formed HashTableAnchor Objects
Many of the algorithms defined in this component operate on HashTableAnchor
objects, which describe the attributes of a hash table. The HashTableAnchor
objects supplied to HashTableImpUtil
are required to meet a series of constraints that are not enforced by the HashTableAnchor
type itself. A HashTableAnchor
object meeting these requirements is said to be well-formed and the method HashTableImpUtil::isWellFormed
returns true
for such an object. A HastTableAnchor
is considered well-formed for a particular key policy, KEY_CONFIG
, and hash functor, HASHER
, if all of the following are true:
- The list refers to a well-formed doubly linked list (see bslalg_bidirectionallinklistutil ).
- Each link in the list is an object of type
BidirectionalNode<KEY_CONFIG::ValueType>
- For each bucket, the range of nodes
[ bucket.first(), bucket.last() ]
contains all nodes in the hash table for which computeBucketIndex(HASHER(extractKey(link)
is the index of the bucket, and no other nodes.
KEY_CONFIG Template Parameter
Several of the operations provided by HashTableImpUtil
are template functions parameterized on the typename KEY_CONFIG
.
KEY_CONFIG
The KEY_CONFIG
template parameter must provide the following type aliases and functions:
typedef <VALUE_TYPE> ValueType;
typedef <KEY_TYPE> KeyType;
static const KeyType& extractKey(const ValueType& obj);
Usage
This section illustrates intended usage of this component.
Example 1: Creating and Using a Hash Set
Suppose we want to build a hash set that will keep track of keys stored in set.
First, we define an abstract template class HashSet
that will provide a hash set for any type that has a copy constructor, a destructor, an equality comparator and a hash function. We inherit from the HashTableAnchor
class use the BidirectionalLinkListUtil
and HashTableImpUtil
classes to facilitate building the table:
template <class KEY, class HASHER, class EQUAL>
typedef std::size_t size_t;
struct Policy {
typedef KEY KeyType;
typedef KEY ValueType;
static const KeyType& extractKey(const ValueType& value)
{
return value;
}
};
double d_maxLoadFactor;
unsigned d_numNodes;
HASHER d_hasher;
EQUAL d_equal;
void grow();
bool checkInvariants() const;
Node* find(const KEY& key, size_t hashCode) const;
private:
HashSet& operator=(const HashSet&);
public:
explicit
~HashSet();
bool insert(const KEY& key);
bool erase(
const KEY& key);
std::size_t count(const KEY& key) const;
std::size_t
size()
const;
};
template <class KEY, class HASHER, class EQUAL>
void HashSet<KEY, HASHER, EQUAL>::grow()
{
d_allocator_p->deallocate(bucketArrayAddress());
size_t newBucketArraySize = bucketArraySize() * 2 + 1;
setBucketArrayAddressAndSize((Bucket *) d_allocator_p->allocate(
newBucketArraySize * sizeof(Bucket)),
newBucketArraySize);
ImpUtil::rehash<Policy, HASHER>(this,
listRootAddress(),
d_hasher);
}
template <class KEY, class HASHER, class EQUAL>
bool HashSet<KEY, HASHER, EQUAL>::checkInvariants() const
{
Definition bslalg_bidirectionallink.h:346
Definition bslalg_bidirectionalnode.h:357
Definition bslalg_hashtableanchor.h:541
Definition bslma_allocator.h:457
bsl::size_t size(const TYPE &array)
Return the number of elements in the specified array.
deque< VALUE_TYPE, ALLOCATOR >::size_type erase(deque< VALUE_TYPE, ALLOCATOR > &deq, const BDE_OTHER_TYPE &value)
Definition bslstl_deque.h:4126
Definition bslalg_bidirectionallinklistutil.h:237
Definition bslalg_hashtablebucket.h:297
Definition bslalg_hashtableimputil.h:613
HashTableImpUtil
s isWellFormed
will verify that all nodes are in their proper buckets, that there are no buckets containing nodes that are not in the main linked list, and no nodes in the main linked list that are not in buckets. To verify that d_numNodes
is correct we have to traverse the list and count the nodes ourselves.
size_t numNodes = 0;
for (BidirectionalLink *cursor = listRootAddress;
cursor; cursor = cursor->nextLink()) {
++numNodes;
}
return size() == numNodes &&
ImpUtil::isWellFormed<Policy, HASHER>(this, d_allocator_p);
}
template <class KEY, class HASHER, class EQUAL>
const KEY& key,
std::size_t hashCode) const
{
return (Node *) ImpUtil::find<Policy, EQUAL>(*this,
key,
d_equal,
hashCode);
}
template <class KEY, class HASHER, class EQUAL>
: HashTableAnchor(0, 0, 0)
, d_maxLoadFactor(0.4)
, d_numNodes(0)
{
enum { NUM_BUCKETS = 3 };
std::size_t bucketArraySizeInBytes = NUM_BUCKETS * sizeof(Bucket);
setBucketArrayAddressAndSize(
(Bucket *) d_allocator_p->allocate(bucketArraySizeInBytes),
NUM_BUCKETS);
memset(bucketArrayAddress(), 0, bucketArraySizeInBytes);
}
template <class KEY, class HASHER, class EQUAL>
HashSet<KEY, HASHER, EQUAL>::~HashSet()
{
for (Link *link = listRootAddress(); link; ) {
Node *toDelete = (Node *) link;
link = link->nextLink();
toDelete->value().~KEY();
d_allocator_p->deallocate(toDelete);
}
d_allocator_p->deallocate(bucketArrayAddress());
}
template <class KEY, class HASHER, class EQUAL>
bool HashSet<KEY, HASHER, EQUAL>::erase(const KEY& key)
{
size_t hashCode = d_hasher(key);
Node *node = find(key, hashCode);
if (!node) {
return false;
}
size_t bucketIdx = ImpUtil::computeBucketIndex(hashCode,
bucketArraySize());
Bucket& bucket = bucketArrayAddress()[bucketIdx];
if (bucket.first() == node) {
if (bucket.last() == node) {
bucket.reset();
}
else {
bucket.setFirst(node->nextLink());
}
}
else if (bucket.last() == node) {
bucket.setLast(node->previousLink());
}
if (listRootAddress() == node) {
setListRootAddress(node->nextLink());
}
ListUtil::unlink(node);
node->value().~KEY();
d_allocator_p->deallocate(node);
--d_numNodes;
return true;
}
template <class KEY, class HASHER, class EQUAL>
bool HashSet<KEY, HASHER, EQUAL>::insert(const KEY& key)
{
size_t hashCode = d_hasher(key);
if (find(key, hashCode)) {
return false;
}
if (bucketArraySize() * d_maxLoadFactor < d_numNodes + 1) {
grow();
}
++d_numNodes;
Node *node = (Node *) d_allocator_p->allocate(sizeof(Node));
key,
d_allocator_p);
ImpUtil::insertAtBackOfBucket(this, node, hashCode);
return true;
}
template <class KEY, class HASHER, class EQUAL>
std::size_t HashSet<KEY, HASHER, EQUAL>::count(const KEY& key) const
{
return 0 != find(key, d_hasher(key));
}
template <class KEY, class HASHER, class EQUAL>
std::size_t HashSet<KEY, HASHER, EQUAL>::size() const
{
return d_numNodes;
}
#define BSLS_ASSERT_SAFE(X)
Definition bsls_assert.h:1762
static void copyConstruct(TARGET_TYPE *address, const TARGET_TYPE &original, bslma::Allocator *allocator)
Definition bslalg_scalarprimitives.h:1599
static Allocator * allocator(Allocator *basicAllocator=0)
Definition bslma_default.h:897
Then, we customize our table to manipulate zero-terminated const char *
strings. We make the simplifying assumption that the strings pointed at by the const char *
s are longer-lived that the HashSet
will be. We must provide an equality comparator so that two copies, in different locations, of the same sequence of characters will evaluate equal:
struct StringEqual {
bool operator()(const char *lhs, const char *rhs) const
{
return !strcmp(lhs, rhs);
}
};
Next, we must provide a string hash function to convert a const char *
to a size_t
:
struct StringHash {
std::size_t operator()(const char *string) const;
};
std::size_t StringHash::operator()(const char *string) const
{
enum { BITS_IN_SIZE_T = sizeof(size_t) * 8 };
std::size_t result = 0;
++
string, shift = (shift + 7) % BITS_IN_SIZE_T) {
if (shift <= BITS_IN_SIZE_T - 8) {
result += c << shift;
}
else {
result += c << shift;
result += c >> (BITS_IN_SIZE_T - shift);
}
}
return result;
};
basic_string< char > string
Definition bslstl_string.h:782
Then, we declare a couple of TestAllocator
s to use during our example:
Definition bslma_defaultallocatorguard.h:186
Definition bslma_testallocator.h:384
Next, in main
, we create an instance of our HashSet
type, configured to contain const char *
strings:
HashSet<const char *, StringHash, StringEqual> hs(&ta);
Then, we insert a few values:
assert(1 == hs.insert("woof"));
assert(1 == hs.insert("arf"));
assert(1 == hs.insert("meow"));
Next, we attempt to insert a redundant value, and observe that the insert
method returns false
to indicate that the insert was refused:
assert(0 == hs.insert("woof"));
Then, we use to size
method to observe that there are 3 strings stored in our HashSet
:
Next, we use the count
method to observe, specifically, which strings are and are not in our HashSet
:
assert(1 == hs.count("woof"));
assert(1 == hs.count("arf"));
assert(1 == hs.count("meow"));
assert(0 == hs.count("ruff"));
assert(0 == hs.count("chomp"));
Then, we attempt to erase a string which is not in our HashSet
and observe that false
is returned, which tells us the erase
attempt was unsuccessful:
assert(0 == hs.erase("ruff"));
Next, we erase the string "meow", which is stored in our HashSet
and observe that true
is returned, telling us the erase
attempt succeeded:
assert(1 == hs.erase("meow"));
Now, we use the size
method to verify there are 2 strings remaining in our HashSet
:
Finally, we use the count
method to observe specifically which strings are still in our HashSet
. Note that "meow" is no longer there. We observe that the default allocator was never used. When we leave the block, our HashSet
will be destroyed, freeing its memory, then our TestAllocator
will be destroyed, verifying that our destructor worked correctly and that no memory was leaked:
assert(1 == hs.count("woof"));
assert(1 == hs.count("arf"));
assert(0 == hs.count("meow"));
assert(0 == hs.count("ruff"));
assert(0 == hs.count("chomp"));
assert(0 == da.numAllocations());