Quick Links:

bal | bbl | bdl | bsl

Namespaces

Component bslalg_hashtableimputil
[Package bslalg]

Provide algorithms for implementing a hash table. More...

Namespaces

namespace  bslalg

Detailed Description

Outline
Purpose:
Provide algorithms for implementing a hash table.
Classes:
bslalg::HashTableImpUtil functions used to implement a hash table
See also:
Component bslalg_bidirectionallinklistutil, Component bslalg_hashtableanchor, Component bslstl_hashtable
Description:
This component provides a namespace for utility functions used to implement a hash table container. Almost all the functions provided by this component operate on a HashTableAnchor, a type encapsulating the key data members of a hash table.
Hash Table Structure:
The utilities provided by this component are used to create and manipulate a hash table that resolves collisions using a linked-list of elements (i.e., chaining). Many of the operations provided by HashTableImpUtil operate on a HashTableAnchor, which encapsulates the key data members of a hash table. A HashTableAnchor has the address of a single, doubly linked list holding all the elements in the hash table, as well as the address of an array of buckets. Each bucket holds a reference to the first and last element in the linked-list whose adjusted hash value is equal to the index of the bucket. Further, the functions in this component ensure (and require) that all elements that fall within a bucket form a contiguous sequence in the linked list, as can be seen in the diagram below:
  FIG 1: a hash table holding 5 elements

  Hash Function:  h(n) -> n  [identity function]
  F: First Element
  L: Last Element

                     0       1       2       3       4
                 +-------+-------+-------+-------+-------+--
  bucket array   |  F L  |  F L  |  F L  |  F L  |  F L  |  ...
                 +--+-+--+-------+-------+--+-+--+-------+--
                    | \___         _________/ /
                     \    \       /          |
                     V     V     V           V
                    ,-.   ,-.   ,-.   ,-.   ,-.
  doubly        |---|0|---|0|---|3|---|3|---|3|--|
  linked-list       `-'   `-'   `-'   `-'   `-'
Hash Function and the Adjusted Hash Value:
The C++11 standard defines a hash function as a function h(k) returning (integral) values of type size_t, such that, for two different values of k1 and k2, the probability that h(k1) == h(k2) is true should approach 1.0 / numeric_limits<size_t>max() (see 17.6.3.4 [hash.requirements]). Such a function h(k) may return values within the entire range of values that can be described using size_t, [0 .. numeric_limits<size_t>max()], however the array of buckets maintained by a hash table is typically significantly smaller than number_limits<size_t>max(), therefore a hash-table implementation must adjust the returned hash function so that it falls in the valid range of bucket indices (typically either using an integer division or modulo operation) -- we refer to this as the adjusted hash value. Note that currently HashTableImpUtil adjusts the value returned by a supplied hash function using operator% (modulo), which is more resilient to pathological behaviors when used in conjunction with a hash function that may produce contiguous hash values (with the div method lower order bits do not participate to the final adjusted value); however, the means of adjustment may change in the future.
Well-Formed HashTableAnchor Objects:
Many of the algorithms defined in this component operate on HashTableAnchor objects, which describe the attributes of a hash table. The HashTableAnchor objects supplied to HashTableImpUtil are required to meet a series of constraints that are not enforced by the HashTableAnchor type itself. A HashTableAnchor object meeting these requirements is said to be "well-formed" and the method HashTableImpUtil::isWellFormed returns true for such an object. A HastTableAnchor is considered well-formed for a particular key policy, KEY_CONFIG, and hash functor, HASHER, if all of the following are true:
  1. The list refers to a well-formed doubly linked list (see bslalg_bidirectionallinklistutil).
  2. Each link in the list is an object of type BidirectionalNode<KEY_CONFIG::ValueType>
  3. For each bucket, the range of nodes [ bucket.first(), bucket.last() ] contains all nodes in the hash table for which computeBucketIndex(HASHER(extractKey(link) is the index of the bucket, and no other nodes.
KEY_CONFIG Template Parameter:
Several of the operations provided by HashTableImpUtil are template functions parameterized on the typename KEY_CONFIG.
KEY_CONFIG:
The KEY_CONFIG template parameter must provide the following type aliases and functions:
  typedef <VALUE_TYPE> ValueType;
     // Alias for the type of the values stored by the 'BidirectionalNode'
     // elements in the hash table.

  typedef <KEY_TYPE> KeyType;
     // Alias for the type of the key value extracted from the 'ValueType'
     // stored in the 'BidirectionalNode' elements of a hash table.

  static const KeyType& extractKey(const ValueType& obj);
      // Return the 'KeyType' information associated with the specified
      // 'object'.
Usage:
This section illustrates intended usage of this component.
Example 1: Creating and Using a Hash Set:
Suppose we want to build a hash set that will keep track of keys stored in set.
First, we define an abstract template class HashSet that will provide a hash set for any type that has a copy constructor, a destructor, an equality comparator and a hash function. We inherit from the HashTableAnchor class use the BidirectionalLinkListUtil and HashTableImpUtil classes to facilitate building the table:
  template <class KEY, class HASHER, class EQUAL>
  class HashSet : public bslalg::HashTableAnchor {
      // PRIVATE TYPES
      typedef bslalg::BidirectionalLink         Link;
      typedef bslalg::BidirectionalNode<KEY>    Node;
      typedef bslalg::HashTableBucket           Bucket;
      typedef bslalg::BidirectionalLinkListUtil ListUtil;
      typedef bslalg::HashTableImpUtil          ImpUtil;
      typedef std::size_t                       size_t;

      struct Policy {
          typedef KEY KeyType;
          typedef KEY ValueType;

          static const KeyType& extractKey(const ValueType& value)
          {
              return value;
          }
      };

      // DATA
      double            d_maxLoadFactor;
      unsigned          d_numNodes;
      HASHER            d_hasher;
      EQUAL             d_equal;
      bslma::Allocator *d_allocator_p;

      // PRIVATE MANIPULATORS
      void grow();
          // Roughly double the number of buckets, such that the number of
          // buckets shall always be '2^N - 1'.

      // PRIVATE ACCESSORS
      bool checkInvariants() const;
          // Perform sanity checks on this table, returning 'true' if all the
          // tests pass and 'false' otherwise.  Note that many of the checks
          // are done with the 'ASSERTV' macro and will cause messages to be
          // written to the console.

      Node* find(const KEY& key,
                 size_t     hashCode) const;
          // Return a pointer to the node containing the specified 'key', and
          // 0 if no such node is in the table.

    private:
      // NOT IMPLEMENTED
      HashSet(const HashSet&, bslma::Allocator *);
      HashSet& operator=(const HashSet&);

    public:
      // CREATORS
      explicit
      HashSet(bslma::Allocator *allocator = 0);
          // Create a 'HashSet', using the specified 'allocator'.  If no
          // allocator is specified, use the default allocator.

      ~HashSet();
          // Destroy this 'HashSet', freeing all its memory.

      // MANIPULATORS
      bool insert(const KEY& key);
          // If the specfied 'key' is not in this hash table, add it,
          // returning 'true'.  If it is already in the table, return 'false'
          // with no action taken.

      bool erase(const KEY& key);
          // If the specfied 'key' is in this hash table, remove it,
          // returning 'true'.  If it is not found in the table, return
          // 'false' with no action taken.

      // ACCESSORS
      std::size_t count(const KEY& key) const;
          // Return 1 if the specified 'key' is in this table and 0
          // otherwise.

      std::size_t size() const;
          // Return the number of discrete keys that are stored in this
          // table.
  };

  // PRIVATE MANIPULATORS
  template <class KEY, class HASHER, class EQUAL>
  void HashSet<KEY, HASHER, EQUAL>::grow()
  {
      // 'bucketArraySize' will always be '2^N - 1', so that if hashed values
      // are aligned by some 2^N they're likely to be relatively prime to the
      // length of the hash table.

      d_allocator_p->deallocate(bucketArrayAddress());
      size_t newBucketArraySize = bucketArraySize() * 2 + 1;
      setBucketArrayAddressAndSize((Bucket *) d_allocator_p->allocate(
                                        newBucketArraySize * sizeof(Bucket)),
                                        newBucketArraySize);

      ImpUtil::rehash<Policy, HASHER>(this,
                                      listRootAddress(),
                                      d_hasher);
  }

  // PRIVATE ACCESSORS
  template <class KEY, class HASHER, class EQUAL>
  bool HashSet<KEY, HASHER, EQUAL>::checkInvariants() const
  {
HashTableImpUtils isWellFormed will verify that all nodes are in their proper buckets, that there are no buckets containing nodes that are not in the main linked list, and no nodes in the main linked list that are not in buckets. To verify that d_numNodes is correct we have to traverse the list and count the nodes ourselves.
      size_t numNodes = 0;
      for (BidirectionalLink *cursor = listRootAddress;
                                       cursor; cursor = cursor->nextLink()) {
          ++numNodes;
      }

      return size() == numNodes &&
                  ImpUtil::isWellFormed<Policy, HASHER>(this, d_allocator_p);
  }

  template <class KEY, class HASHER, class EQUAL>
  bslalg::BidirectionalNode<KEY> *HashSet<KEY, HASHER, EQUAL>::find(
                                           const KEY&         key,
                                           std::size_t        hashCode) const
  {
      return (Node *) ImpUtil::find<Policy, EQUAL>(*this,
                                                   key,
                                                   d_equal,
                                                   hashCode);
  }

  // CREATORS
  template <class KEY, class HASHER, class EQUAL>
  HashSet<KEY, HASHER, EQUAL>::HashSet(bslma::Allocator *allocator)
  : HashTableAnchor(0, 0, 0)
  , d_maxLoadFactor(0.4)
  , d_numNodes(0)
  {
      enum { NUM_BUCKETS = 3 };    // 'NUM_BUCKETS' must be '2^N - 1' for
                                   // some 'N'.

      d_allocator_p = bslma::Default::allocator(allocator);
      std::size_t bucketArraySizeInBytes = NUM_BUCKETS * sizeof(Bucket);
      setBucketArrayAddressAndSize(
                  (Bucket *) d_allocator_p->allocate(bucketArraySizeInBytes),
                  NUM_BUCKETS);
      memset(bucketArrayAddress(), 0, bucketArraySizeInBytes);
  }

  template <class KEY, class HASHER, class EQUAL>
  HashSet<KEY, HASHER, EQUAL>::~HashSet()
  {
      BSLS_ASSERT_SAFE(checkInvariants());

      for (Link *link = listRootAddress(); link; ) {
          Node *toDelete = (Node *) link;
          link = link->nextLink();

          toDelete->value().~KEY();
          d_allocator_p->deallocate(toDelete);
      }

      d_allocator_p->deallocate(bucketArrayAddress());
  }

  // MANIPULATORS
  template <class KEY, class HASHER, class EQUAL>
  bool HashSet<KEY, HASHER, EQUAL>::erase(const KEY& key)
  {
      size_t hashCode = d_hasher(key);
      Node *node = find(key, hashCode);

      if (!node) {
          return false;                                             // RETURN
      }

      size_t bucketIdx = ImpUtil::computeBucketIndex(hashCode,
                                                     bucketArraySize());
      Bucket& bucket = bucketArrayAddress()[bucketIdx];

      BSLS_ASSERT_SAFE(bucket.first() && bucket.last());

      if (bucket.first() == node) {
          if (bucket.last() == node) {
              bucket.reset();
          }
          else {
              bucket.setFirst(node->nextLink());
          }
      }
      else if (bucket.last() == node) {
          bucket.setLast(node->previousLink());
      }

      if (listRootAddress() == node) {
          setListRootAddress(node->nextLink());
      }

      ListUtil::unlink(node);

      node->value().~KEY();
      d_allocator_p->deallocate(node);

      --d_numNodes;
      BSLS_ASSERT_SAFE(checkInvariants());

      return true;
  }

  template <class KEY, class HASHER, class EQUAL>
  bool HashSet<KEY, HASHER, EQUAL>::insert(const KEY& key)
  {
      size_t hashCode = d_hasher(key);

      if (find(key, hashCode)) {
          // Already in set, do nothing.

          return false;                                             // RETURN
      }

      if (bucketArraySize() * d_maxLoadFactor < d_numNodes + 1) {
          grow();
      }

      ++d_numNodes;
      Node *node = (Node *) d_allocator_p->allocate(sizeof(Node));
      bslalg::ScalarPrimitives::copyConstruct(&node->value(),
                                              key,
                                              d_allocator_p);

      ImpUtil::insertAtBackOfBucket(this, node, hashCode);

      BSLS_ASSERT_SAFE(find(key, hashCode));
      BSLS_ASSERT_SAFE(checkInvariants());

      return true;
  }

  // ACCESSORS
  template <class KEY, class HASHER, class EQUAL>
  std::size_t HashSet<KEY, HASHER, EQUAL>::count(const KEY& key) const
  {
      return 0 != find(key, d_hasher(key));
  }

  template <class KEY, class HASHER, class EQUAL>
  std::size_t HashSet<KEY, HASHER, EQUAL>::size() const
  {
      return d_numNodes;
  }
Then, we customize our table to manipulate zero-terminated const char * strings. We make the simplifying assumption that the strings pointed at by the const char *s are longer-lived that the HashSet will be. We must provide an equality comparator so that two copies, in different locations, of the same sequence of characters will evaluate equal:
  struct StringEqual {
      bool operator()(const char *lhs, const char *rhs) const
      {
          return !strcmp(lhs, rhs);
      }
  };
Next, we must provide a string hash function to convert a const char * to a size_t:
  struct StringHash {
      std::size_t operator()(const char *string) const;
  };

  std::size_t StringHash::operator()(const char *string) const
  {
      enum { BITS_IN_SIZE_T = sizeof(size_t) * 8 };

      std::size_t result = 0;
      for (int shift = 0; *string;
                            ++string, shift = (shift + 7) % BITS_IN_SIZE_T) {
          unsigned char c = *string;
          if (shift <= BITS_IN_SIZE_T - 8) {
              result += c << shift;
          }
          else {
              result += c << shift;
              result += c >> (BITS_IN_SIZE_T - shift);
          }
      }

      return result;
  };
Then, we declare a couple of TestAllocators to use during our example:
  bslma::TestAllocator da("defaultAllocator");
  bslma::DefaultAllocatorGuard defaultGuard(&da);

  bslma::TestAllocator ta("testAllocator");
Next, in main, we create an instance of our HashSet type, configured to contain const char * strings:
  HashSet<const char *, StringHash, StringEqual> hs(&ta);
Then, we insert a few values:
  assert(1 == hs.insert("woof"));
  assert(1 == hs.insert("arf"));
  assert(1 == hs.insert("meow"));
Next, we attempt to insert a redundant value, and observe that the insert method returns false to indicate that the insert was refused:
  assert(0 == hs.insert("woof"));
Then, we use to size method to observe that there are 3 strings stored in our HashSet:
  assert(3 == hs.size());
Next, we use the count method to observe, specifically, which strings are and are not in our HashSet:
  assert(1 == hs.count("woof"));
  assert(1 == hs.count("arf"));
  assert(1 == hs.count("meow"));
  assert(0 == hs.count("ruff"));
  assert(0 == hs.count("chomp"));
Then, we attempt to erase a string which is not in our HashSet and observe that false is returned, which tells us the erase attempt was unsuccessful:
  assert(0 == hs.erase("ruff"));
Next, we erase the string "meow", which is stored in our HashSet and observe that true is returned, telling us the erase attempt succeeded:
  assert(1 == hs.erase("meow"));
Now, we use the size method to verify there are 2 strings remaining in our HashSet:
  assert(2 == hs.size());
Finally, we use the count method to observe specifically which strings are still in our HashSet. Note that "meow" is no longer there. We observe that the default allocator was never used. When we leave the block, our HashSet will be destroyed, freeing its memory, then our TestAllocator will be destroyed, verifying that our destructor worked correctly and that no memory was leaked:
  assert(1 == hs.count("woof"));
  assert(1 == hs.count("arf"));
  assert(0 == hs.count("meow"));
  assert(0 == hs.count("ruff"));
  assert(0 == hs.count("chomp"));

  assert(0 == da.numAllocations());