BDE 4.14.0 Production release
|
Provide a double-hashed table with utility.
This component provides a mechanism, bdlc::HashTable
, for efficiently finding elements identified by a parameterized KEY
. Elements can also have an associated value by specifying an optional VALUE
template parameter. Also, an optional TRAITS
parameter can be supplied so that clients can override the default traits of the hash table, bdlc::HashTableDefaultTraits
.
The bdlc::HashTable
class achieves efficient lookup by using a double-hash algorithm, which will be explained later. Optional HASH1
and HASH2
parameters can be supplied so that clients can override the default hash functions used by the hash table, bdlc::HashTableDefaultHash1
and bdlc::HashTableDefaultHash2
. Hash functors may also optionally be specified at construction time, in case the functors contain state (e.g., if bsl::function
is used).
The constructor for bdlc::HashTable
takes a capacityHint
argument. This capacityHint
is used to calculate the capacity of the hash table (i.e., the maximum number of elements that can be stored at any one time). Once constructed, the capacity cannot be changed. The capacity hint can be either a positive integer or a negative integer. If the capacity hint is positive, then the capacity of the hash table will be the first available prime number larger than, or equal to, the capacity hint. Otherwise, the capacity of the hash table will be the first available prime number smaller than, or equal to, the capacity hint. The list of available prime numbers is obtained from an array in the bdlc_hashtable.cpp
file.
A typical hash table implementation uses only a single hash function to determine the index in the hash table to store a given element. This approach results in constant time access if there are no collisions. To handle cases where there are hash collisions, the hash table needs to maintain a linked list or tree of elements for each index in the table. This data structure is illustrated in the diagram below:
In the diagram above, element1
, element2
, and element3
hash to the index
th bucket in the hash table. Because of this collision, they are maintained in a linked list, which results in linear time complexity.
The double-hash algorithm improves on the traditional algorithm by using a second hash function to compute an increment value. The index is incremented by the increment value until an available bucket is found. This augmented algorithm is illustrated in the following diagrams. Suppose we have a hash table that is initially empty:
Now suppose we insert element1
. The first hash function evaluates to the index
th bucket in the hash table:
Now suppose we want to insert element2
, for which the first hash function also evaluates to the index
th bucket in the hash table; however, there is a collision. So, we will calculate an increment using the second hash function. Suppose the increment value is 3, we will insert element2
at index + 3
:
The entry for element2
is said to be "chained" through node index
.
Now suppose we want to insert element3
, for which the first hash function also evaluates to the index
th bucket in the hash table. Again, there is a collision. So, we will calculate an increment using the second hash function. Suppose the increment value is 5, we will insert element3
at index + 5
:
The entry for element3
is also "chained" through node index
.
If there is a collision even after applying the increment, then the increment can be applied again to form a longer chain, until an available bucket is found. For example, suppose we want to insert element4
, for which the first hash function evaluates to the index
th bucket. Since there is a collision, we calculate an increment using the second hash function. Suppose the increment value is 3, we will get another collision because element2
occupies the bucket at index + 3
. Therefore, we apply the increment again and we get index + 3 + 3
, i.e., index + 6
. This bucket is empty, so we can store element4
here:
The entry for element4
is chained through nodes index
and index + 3
.
If the total number of buckets in the hash table and the increment value are relatively prime (i.e., their greatest common divisor is 1), then it is guaranteed that every bucket will be visited before looping back to index
.
The bdlc::HashTable
container makes sure that the number of buckets in the hash table and the increment values are relatively prime. The bdlc::HashTable
container also keeps track of the maximum chain length, number of collisions, and the total chain length, which can be used for statistical purposes when evaluating different hash functions.
The bdlc::HashTable
class treats individual buckets as value-semantic types. The type of the buckets depends on the KEY
and VALUE
parameters used to instantiate the bdlc::HashTable
template. If the VALUE
parameter is bslmf::Nil
, then the type of the buckets is KEY
. Otherwise, the type of the buckets is bsl::pair<KEY, VALUE>
. For convenience, we will refer to the bucket type as Bucket
throughout this documentation.
The bdlc::HashTable
class reserves two distinct values from Bucket
s value-space to represent a "null" bucket and a "removed" bucket. These values are determined by the TRAITS
parameter, which is described in the next section. Since these two values are reserved for the internal use of the bdlc::HashTable
class, the behavior is undefined if one of these values is inserted into the hash table. Taking these values from the value-space of Bucket
allows the storage space required for each bucket to be as compact as possible.
An optional TRAITS
parameter can be specified when instantiating the bdlc::HashTable
template. This component provides a default traits implementation, bdlc::HashTableDefaultTraits
, which will be described later.
The TRAITS
parameter allows clients to specify how to load a bucket and how to compare keys. It also allows clients to classify two distinct values to represent "null" and "removed" buckets (see "Bucket Type" for more information about these reserved values).
In the following description, key1
and key2
refer to objects of type KEY
. bucket
, dstBucket
, and srcBucket
refer to objects of type Bucket
.
The following expressions must be supported by the TRAITS
parameter:
The default traits, identified by bdlc::HashTableDefaultTraits
, can be used when KEY
and VALUE
are either:
const char *
bsl::string
The following expressions are implemented as:
The isNull
, setToNull
, isRemoved
, and setToRemoved
functions are implemented by checking for and assigning the appropriate "null" or "removed" values, respectively. These values are defined in the following table:
If Bucket
is of type bsl::pair<KEY, VALUE>
, then the "null" and "removed" values are applied to both the KEY
and the VALUE
.
Since the default traits may write directly into the footprint of the bucket (except for bsl::string
), it is important to note that the KEY
and VALUE
types should be POD types if the default traits are used.
Optional HASH1
and HASH2
parameters can be specified when instantiating the bdlc::HashTable
template. This component provides a default hash functors, bdlc::HashTableDefaultHash1
and bdlc::HashTableDefaultHash2
, which will be described later.
The HASH1
and HASH2
parameters allow clients to specify hash functor policies for the first and second hash functions, respectively.
In the following description, key
refers to an object of type KEY
, and functor
refers to an immutable object of type HASH1
or HASH2
.
The following expression must be supported by the supplied HASH1
and HASH2
parameters:
The default hash functors, identified by bdlc::HashTableDefaultHash1
and bdlc::HashTableDefaultHash2
, can be used when KEY
is either:
The bdlc::HashTableDefaultHash1
functor is implemented using bdlb::HashUtil::hash1
and the bdlc::HashTableDefaultHash2
functor is implemented using bdlb::HashUtil::hash2
.
Note that bdlb::HashUtil::hash1
and bdlb::HashUtil::hash2
calculate hash value from a fixed length block of memory. This block of memory is obtained based on the following table:
Since the default hash functors use the footprint of the key (except for const char*
and bsl::string
) to compute hash values, it is important to note that the KEY
type should be a POD type if the default hash functors are used.
By default (i.e., when using the default traits), the remove
method can be used to remove an element from the hash table. However, there are cases when it is desirable not to allow elements to be removed. This can be achieved by supplying the bdlc::HashTable
template with a TRAITS
parameter that:
This effectively describes a trait that does not define a special "removed" bucket value.
This section illustrates intended use of this component.
The following snippets of code illustrate the usage of this component. Suppose we wanted to store a table of int
keys with double
values. We will use a capacity hint of 10, default traits, and default hash functors for demonstration purposes:
Now we can insert elements into this object:
Now we can find elements in this object using the key: