Quick Links:

bal | bbl | bdl | bsl

Namespaces

Component bdlc_hashtable
[Package bdlc]

Provide a double-hashed table with utility. More...

Namespaces

namespace  bdlc

Detailed Description

Outline
Purpose:
Provide a double-hashed table with utility.
Classes:
bdlc::HashTable double-hashed table
bdlc::HashTableDefaultTraits default traits
bdlc::HashTableDefaultHash1 default hash functor 1
bdlc::HashTableDefaultHash2 default hash functor 2
See also:
Component bdlb_hashutil
Description:
This component provides a mechanism, bdlc::HashTable, for efficiently finding elements identified by a parameterized KEY. Elements can also have an associated value by specifying an optional VALUE template parameter. Also, an optional TRAITS parameter can be supplied so that clients can override the default traits of the hash table, bdlc::HashTableDefaultTraits.
The bdlc::HashTable class achieves efficient lookup by using a double-hash algorithm, which will be explained later. Optional HASH1 and HASH2 parameters can be supplied so that clients can override the default hash functions used by the hash table, bdlc::HashTableDefaultHash1 and bdlc::HashTableDefaultHash2. Hash functors may also optionally be specified at construction time, in case the functors contain state (e.g., if bsl::function is used).
The constructor for bdlc::HashTable takes a capacityHint argument. This capacityHint is used to calculate the capacity of the hash table (i.e., the maximum number of elements that can be stored at any one time). Once constructed, the capacity cannot be changed. The capacity hint can be either a positive integer or a negative integer. If the capacity hint is positive, then the capacity of the hash table will be the first available prime number larger than, or equal to, the capacity hint. Otherwise, the capacity of the hash table will be the first available prime number smaller than, or equal to, the capacity hint. The list of available prime numbers is obtained from an array in the bdlc_hashtable.cpp file.
Traditional Hash Algorithm:
A typical hash table implementation uses only a single hash function to determine the index in the hash table to store a given element. This approach results in constant time access if there are no collisions. To handle cases where there are hash collisions, the hash table needs to maintain a linked list or tree of elements for each index in the table. This data structure is illustrated in the diagram below:
               Hash Table
               ----------

                :      :
                :      :
                :......:
                :      :
     index - 2  :      :
                :______:
                |      |
     index - 1  |      |
                |______|     ______      ______      ______
                |      |    |      |    |      |    |      |
      index     |      | -> |      | -> |      | -> |      | -> NULL
                |______|    |______|    |______|    |______|
                |      |
     index + 1  |      |    element1    element2    element3
                |______|
                |      |
     index + 2  |      |
                |______|
                :      :
                :      :
                :......:
                :      :
                :      :
In the diagram above, element1, element2, and element3 hash to the indexth bucket in the hash table. Because of this collision, they are maintained in a linked list, which results in linear time complexity.
Double-Hash Algorithm:
The double-hash algorithm improves on the traditional algorithm by using a second hash function to compute an increment value. The index is incremented by the increment value until an available bucket is found. This augmented algorithm is illustrated in the following diagrams. Suppose we have a hash table that is initially empty:
                            Hash Table
                            ----------

                             :      :
                             :      :
                             :......:
                             :      :
                  index - 2  :      :
                             :______:
                             |      |
                  index - 1  |      |
                             |______|
                             |      |
                   index     |      |
                             |______|
                             |      |
                  index + 1  |      |
                             |______|
                             |      |
                  index + 2  |      |
                             |______|
                             |      |
                  index + 3  |      |
                             |______|
                             :      :
                             :      :
                             :......:
                             :      :
                             :      :
Now suppose we insert element1. The first hash function evaluates to the indexth bucket in the hash table:
                            Hash Table
                            ----------

                             :      :
                             :      :
                             :......:
                             :      :
                  index - 2  :      :
                             :______:
                             |      |
                  index - 1  |      |
                             |______|     ______
                             |      |    |      |
                   index     |      | -> |      |   element1
                             |______|    |______|
                             |      |
                  index + 1  |      |
                             |______|
                             |      |
                  index + 2  |      |
                             |______|
                             |      |
                  index + 3  |      |
                             |______|
                             :      :
                             :      :
                             :......:
                             :      :
                             :      :
Now suppose we want to insert element2, for which the first hash function also evaluates to the indexth bucket in the hash table; however, there is a collision. So, we will calculate an increment using the second hash function. Suppose the increment value is 3, we will insert element2 at index + 3:
                            Hash Table
                            ----------

                             :      :
                             :      :
                             :......:
                             :      :
                  index - 2  :      :
                             :______:
                             |      |
                  index - 1  |      |
                             |______|     ______
                             |      |    |      |
            .----  index     |      | -> |      |   element1
            |                |______|    |______|
            |                |      |
            |     index + 1  |      |
            |                |______|
            |                |      |
            |     index + 2  |      |
            |                |______|     ______
            |                |      |    |      |
            `---> index + 3  |      | -> |      |   element2
                             |______|    |______|
                             |      |
                  index + 4  |      |
                             |______|
                             |      |
                  index + 5  |      |
                             |______|
                             |      |
                  index + 6  |      |
                             |______|
                             |      |
                  index + 7  |      |
                             |______|
                             :      :
                             :      :
                             :......:
                             :      :
                             :      :
The entry for element2 is said to be "chained" through node index.
Now suppose we want to insert element3, for which the first hash function also evaluates to the indexth bucket in the hash table. Again, there is a collision. So, we will calculate an increment using the second hash function. Suppose the increment value is 5, we will insert element3 at index + 5:
                            Hash Table
                            ----------

                             :      :
                             :      :
                             :......:
                             :      :
                  index - 2  :      :
                             :______:
                             |      |
                  index - 1  |      |
                             |______|     ______
                             |      |    |      |
      .-----.----  index     |      | -> |      |   element1
      |     |                |______|    |______|
      |     |                |      |
      |     |     index + 1  |      |
      |     |                |______|
      |     |                |      |
      |     |     index + 2  |      |
      |     |                |______|     ______
      |     |                |      |    |      |
      |     `---> index + 3  |      | -> |      |   element2
      |                      |______|    |______|
      |                      |      |
      |           index + 4  |      |
      |                      |______|     ______
      |                      |      |    |      |
      `---------> index + 5  |      | -> |      |   element3
                             |______|    |______|
                             |      |
                  index + 6  |      |
                             |______|
                             |      |
                  index + 7  |      |
                             |______|
                             :      :
                             :      :
                             :......:
                             :      :
                             :      :
The entry for element3 is also "chained" through node index.
If there is a collision even after applying the increment, then the increment can be applied again to form a longer chain, until an available bucket is found. For example, suppose we want to insert element4, for which the first hash function evaluates to the indexth bucket. Since there is a collision, we calculate an increment using the second hash function. Suppose the increment value is 3, we will get another collision because element2 occupies the bucket at index + 3. Therefore, we apply the increment again and we get index + 3 + 3, i.e., index + 6. This bucket is empty, so we can store element4 here:
                            Hash Table
                            ----------

                             :      :
                             :      :
                             :......:
                             :      :
                  index - 2  :      :
                             :______:
                             |      |
                  index - 1  |      |
                             |______|     ______
                             |      |    |      |
      .-----.----  index     |      | -> |      |   element1
      |     |                |______|    |______|
      |     |                |      |
      |     |     index + 1  |      |
      |     |                |______|
      |     |                |      |
      |     |     index + 2  |      |
      |     |                |______|     ______
      |     |                |      |    |      |
      |     `---> index + 3  |      | -> |      |   element2
      |     .----            |______|    |______|
      |     |                |      |
      |     |     index + 4  |      |
      |     |                |______|     ______
      |     |                |      |    |      |
      `-----+---> index + 5  |      | -> |      |   element3
            |                |______|    |______|
            |                |      |    |      |
            `---> index + 6  |      | -> |      |   element4
                             |______|    |______|
                             |      |
                  index + 7  |      |
                             |______|
                             :      :
                             :      :
                             :......:
                             :      :
                             :      :
The entry for element4 is chained through nodes index and index + 3.
If the total number of buckets in the hash table and the increment value are relatively prime (i.e., their greatest common divisor is 1), then it is guaranteed that every bucket will be visited before looping back to index.
The bdlc::HashTable container makes sure that the number of buckets in the hash table and the increment values are relatively prime. The bdlc::HashTable container also keeps track of the maximum chain length, number of collisions, and the total chain length, which can be used for statistical purposes when evaluating different hash functions.
Bucket Type:
The bdlc::HashTable class treats individual buckets as value-semantic types. The type of the buckets depends on the KEY and VALUE parameters used to instantiate the bdlc::HashTable template. If the VALUE parameter is bslmf::Nil, then the type of the buckets is KEY. Otherwise, the type of the buckets is bsl::pair<KEY, VALUE>. For convenience, we will refer to the bucket type as Bucket throughout this documentation.
The bdlc::HashTable class reserves two distinct values from Buckets value-space to represent a "null" bucket and a "removed" bucket. These values are determined by the TRAITS parameter, which is described in the next section. Since these two values are reserved for the internal use of the bdlc::HashTable class, the behavior is undefined if one of these values is inserted into the hash table. Taking these values from the value-space of Bucket allows the storage space required for each bucket to be as compact as possible.
Traits:
An optional TRAITS parameter can be specified when instantiating the bdlc::HashTable template. This component provides a default traits implementation, bdlc::HashTableDefaultTraits, which will be described later.
The TRAITS parameter allows clients to specify how to load a bucket and how to compare keys. It also allows clients to classify two distinct values to represent "null" and "removed" buckets (see "Bucket Type" for more information about these reserved values).
In the following description, key1 and key2 refer to objects of type KEY. bucket, dstBucket, and srcBucket refer to objects of type Bucket.
The following expressions must be supported by the TRAITS parameter:
  Expression                            Semantics
  ----------                            ---------
  TRAITS::load(&dstBucket, srcBucket)   Load the value of the specified
                                        'srcBucket' into the specified
                                        'dstBucket'.

  TRAITS::areEqual(key1, key2)          Return true if the specified 'key1'
                                        matches the specified 'key2', and
                                        false otherwise.

  TRAITS::isNull(bucket)                Return true if the specified 'bucket'
                                        has the reserved "null" value, and
                                        false otherwise.

  TRAITS::setToNull(&bucket)            Load the reserved "null" value into
                                        the specified 'bucket'.

  TRAITS::isRemoved(bucket)             Return true if the specified 'bucket'
                                        has the reserved "removed" value, and
                                        false otherwise.

  TRAITS::setToRemoved(&bucket)         Load the reserved "removed" value
                                        into the specified 'bucket'.
Default Traits:
The default traits, identified by bdlc::HashTableDefaultTraits, can be used when KEY and VALUE are either:
  • const char *
  • bsl::string
  • POD types
The following expressions are implemented as:
  Expression                                Implementation
  ----------                                --------------
  TRAITS::load(&dstBucket, srcBucket)       This function is implemented as
                                            '*dstBucket = srcBucket'.

  TRAITS::areEqual(key1, key2)              If 'KEY' is 'const char*', this
                                            function is implemented as
                                            'bsl::strcmp(key1, key2)'.
                                            Otherwise, this function is
                                            implemented as 'key1 == key2'.
The isNull, setToNull, isRemoved, and setToRemoved functions are implemented by checking for and assigning the appropriate "null" or "removed" values, respectively. These values are defined in the following table:
  Bucket Type        Null Value                    Removed Value
  -----------        ----------                    -------------
  const char*        0x00000000 address            0xFFFFFFFF address

  bsl::string        ""                            "(* REMOVED *)"

  All other types    All bytes in the footprint    All bytes in the footprint
                     are 0x00                      are 0xFF
If Bucket is of type bsl::pair<KEY, VALUE>, then the "null" and "removed" values are applied to both the KEY and the VALUE.
Since the default traits may write directly into the footprint of the bucket (except for bsl::string), it is important to note that the KEY and VALUE types should be POD types if the default traits are used.
Hash Functors:
Optional HASH1 and HASH2 parameters can be specified when instantiating the bdlc::HashTable template. This component provides a default hash functors, bdlc::HashTableDefaultHash1 and bdlc::HashTableDefaultHash2, which will be described later.
The HASH1 and HASH2 parameters allow clients to specify hash functor policies for the first and second hash functions, respectively.
In the following description, key refers to an object of type KEY, and functor refers to an immutable object of type HASH1 or HASH2.
The following expression must be supported by the supplied HASH1 and HASH2 parameters:
  Expression      Semantics                                      Return Type
  ----------      ---------                                      -----------
  functor(&key)   Return a hash value for the specified 'key'    unsigned int
Default Hash Functors:
The default hash functors, identified by bdlc::HashTableDefaultHash1 and bdlc::HashTableDefaultHash2, can be used when KEY is either:
    o const char*
    o bsl::string
    o a POD type
The bdlc::HashTableDefaultHash1 functor is implemented using bdlb::HashUtil::hash1 and the bdlc::HashTableDefaultHash2 functor is implemented using bdlb::HashUtil::hash2.
Note that bdlb::HashUtil::hash1 and bdlb::HashUtil::hash2 calculate hash value from a fixed length block of memory. This block of memory is obtained based on the following table:
  KEY Type            Block Data                             Block Length
  --------            ----------                             ------------
  const char*         key                                    bsl::strlen(key)

  bsl::string         key.data()                             key.length()

  All other types     reinterpret_cast<const char *>(&key)   sizeof(key)
Since the default hash functors use the footprint of the key (except for const char* and bsl::string) to compute hash values, it is important to note that the KEY type should be a POD type if the default hash functors are used.
Disabling Support for remove:
By default (i.e., when using the default traits), the remove method can be used to remove an element from the hash table. However, there are cases when it is desirable not to allow elements to be removed. This can be achieved by supplying the bdlc::HashTable template with a TRAITS parameter that:
    o always returns false for the 'TRAITS::isRemoved(bucket)' expression
    o AND does not implemented the 'TRAITS::setToRemoved(&bucket)' expression
This effectively describes a trait that does not define a special "removed" bucket value.
Usage:
The following snippets of code illustrate the usage of this component. Suppose we wanted to store a table of int keys with double values. We will use a capacity hint of 10, default traits, and default hash functors for demonstration purposes:
  #include <bdlc_hashtable.h>

  using namespace BloombergLP;

  void usageExample()
  {
      typedef bdlc::HashTable<int, double> TableType;

      TableType table(10);
Now we can insert elements into this object:
      TableType::Handle handles[3];

      struct {
          int    d_key;
          double d_value;
      } DATA[] = {
          {  10,   2.34   },
          {  92,   94.2   },
          { 236,   9.1    },
      };

      table.insert(&handles[0], DATA[0].d_key, DATA[0].d_value);
      assert(DATA[0].d_key   == table.key(handles[0]));
      assert(DATA[0].d_value == table.value(handles[0]));

      table.insert(&handles[1], DATA[1].d_key, DATA[1].d_value);
      assert(DATA[1].d_key   == table.key(handles[1]));
      assert(DATA[1].d_value == table.value(handles[1]));

      table.insert(&handles[2], DATA[2].d_key, DATA[2].d_value);
      assert(DATA[2].d_key   == table.key(handles[2]));
      assert(DATA[2].d_value == table.value(handles[2]));
Now we can find elements in this object using the key:
      TableType::Handle otherHandles[3];

      table.find(&otherHandles[0], DATA[0].d_key);
      assert(DATA[0].d_key   == table.key(otherHandles[0]));
      assert(DATA[0].d_value == table.value(otherHandles[0]));

      table.find(&otherHandles[1], DATA[1].d_key);
      assert(DATA[1].d_key   == table.key(otherHandles[1]));
      assert(DATA[1].d_value == table.value(otherHandles[1]));

      table.find(&otherHandles[2], DATA[2].d_key);
      assert(DATA[2].d_key   == table.key(otherHandles[2]));
      assert(DATA[2].d_value == table.value(otherHandles[2]));
  }