Provide a double-hashed table with utility.
More...
Namespaces |
namespace | bdlc |
Detailed Description
- Outline
-
-
- Purpose:
- Provide a double-hashed table with utility.
-
- Classes:
-
- See also:
- Component bdlb_hashutil
-
- Description:
- This component provides a mechanism,
bdlc::HashTable
, for efficiently finding elements identified by a parameterized KEY
. Elements can also have an associated value by specifying an optional VALUE
template parameter. Also, an optional TRAITS
parameter can be supplied so that clients can override the default traits of the hash table, bdlc::HashTableDefaultTraits
.
- The
bdlc::HashTable
class achieves efficient lookup by using a double-hash algorithm, which will be explained later. Optional HASH1
and HASH2
parameters can be supplied so that clients can override the default hash functions used by the hash table, bdlc::HashTableDefaultHash1
and bdlc::HashTableDefaultHash2
. Hash functors may also optionally be specified at construction time, in case the functors contain state (e.g., if bsl::function
is used).
- The constructor for
bdlc::HashTable
takes a capacityHint
argument. This capacityHint
is used to calculate the capacity of the hash table (i.e., the maximum number of elements that can be stored at any one time). Once constructed, the capacity cannot be changed. The capacity hint can be either a positive integer or a negative integer. If the capacity hint is positive, then the capacity of the hash table will be the first available prime number larger than, or equal to, the capacity hint. Otherwise, the capacity of the hash table will be the first available prime number smaller than, or equal to, the capacity hint. The list of available prime numbers is obtained from an array in the bdlc_hashtable.cpp
file.
-
- Traditional Hash Algorithm:
- A typical hash table implementation uses only a single hash function to determine the index in the hash table to store a given element. This approach results in constant time access if there are no collisions. To handle cases where there are hash collisions, the hash table needs to maintain a linked list or tree of elements for each index in the table. This data structure is illustrated in the diagram below:
Hash Table
----------
: :
: :
:......:
: :
index - 2 : :
:______:
| |
index - 1 | |
|______| ______ ______ ______
| | | | | | | |
index | | -> | | -> | | -> | | -> NULL
|______| |______| |______| |______|
| |
index + 1 | | element1 element2 element3
|______|
| |
index + 2 | |
|______|
: :
: :
:......:
: :
: :
In the diagram above, element1
, element2
, and element3
hash to the index
th bucket in the hash table. Because of this collision, they are maintained in a linked list, which results in linear time complexity.
-
- Double-Hash Algorithm:
- The double-hash algorithm improves on the traditional algorithm by using a second hash function to compute an increment value. The index is incremented by the increment value until an available bucket is found. This augmented algorithm is illustrated in the following diagrams. Suppose we have a hash table that is initially empty:
Hash Table
----------
: :
: :
:......:
: :
index - 2 : :
:______:
| |
index - 1 | |
|______|
| |
index | |
|______|
| |
index + 1 | |
|______|
| |
index + 2 | |
|______|
| |
index + 3 | |
|______|
: :
: :
:......:
: :
: :
Now suppose we insert element1
. The first hash function evaluates to the index
th bucket in the hash table: Hash Table
----------
: :
: :
:......:
: :
index - 2 : :
:______:
| |
index - 1 | |
|______| ______
| | | |
index | | -> | | element1
|______| |______|
| |
index + 1 | |
|______|
| |
index + 2 | |
|______|
| |
index + 3 | |
|______|
: :
: :
:......:
: :
: :
Now suppose we want to insert element2
, for which the first hash function also evaluates to the index
th bucket in the hash table; however, there is a collision. So, we will calculate an increment using the second hash function. Suppose the increment value is 3, we will insert element2
at index + 3
: Hash Table
----------
: :
: :
:......:
: :
index - 2 : :
:______:
| |
index - 1 | |
|______| ______
| | | |
.---- index | | -> | | element1
| |______| |______|
| | |
| index + 1 | |
| |______|
| | |
| index + 2 | |
| |______| ______
| | | | |
`---> index + 3 | | -> | | element2
|______| |______|
| |
index + 4 | |
|______|
| |
index + 5 | |
|______|
| |
index + 6 | |
|______|
| |
index + 7 | |
|______|
: :
: :
:......:
: :
: :
The entry for element2
is said to be "chained" through node index
.
- Now suppose we want to insert
element3
, for which the first hash function also evaluates to the index
th bucket in the hash table. Again, there is a collision. So, we will calculate an increment using the second hash function. Suppose the increment value is 5, we will insert element3
at index + 5
: Hash Table
----------
: :
: :
:......:
: :
index - 2 : :
:______:
| |
index - 1 | |
|______| ______
| | | |
.-----.---- index | | -> | | element1
| | |______| |______|
| | | |
| | index + 1 | |
| | |______|
| | | |
| | index + 2 | |
| | |______| ______
| | | | | |
| `---> index + 3 | | -> | | element2
| |______| |______|
| | |
| index + 4 | |
| |______| ______
| | | | |
`---------> index + 5 | | -> | | element3
|______| |______|
| |
index + 6 | |
|______|
| |
index + 7 | |
|______|
: :
: :
:......:
: :
: :
The entry for element3
is also "chained" through node index
.
- If there is a collision even after applying the increment, then the increment can be applied again to form a longer chain, until an available bucket is found. For example, suppose we want to insert
element4
, for which the first hash function evaluates to the index
th bucket. Since there is a collision, we calculate an increment using the second hash function. Suppose the increment value is 3, we will get another collision because element2
occupies the bucket at index + 3
. Therefore, we apply the increment again and we get index + 3 + 3
, i.e., index + 6
. This bucket is empty, so we can store element4
here: Hash Table
----------
: :
: :
:......:
: :
index - 2 : :
:______:
| |
index - 1 | |
|______| ______
| | | |
.-----.---- index | | -> | | element1
| | |______| |______|
| | | |
| | index + 1 | |
| | |______|
| | | |
| | index + 2 | |
| | |______| ______
| | | | | |
| `---> index + 3 | | -> | | element2
| .---- |______| |______|
| | | |
| | index + 4 | |
| | |______| ______
| | | | | |
`-----+---> index + 5 | | -> | | element3
| |______| |______|
| | | | |
`---> index + 6 | | -> | | element4
|______| |______|
| |
index + 7 | |
|______|
: :
: :
:......:
: :
: :
The entry for element4
is chained through nodes index
and index + 3
.
- If the total number of buckets in the hash table and the increment value are relatively prime (i.e., their greatest common divisor is 1), then it is guaranteed that every bucket will be visited before looping back to
index
.
- The
bdlc::HashTable
container makes sure that the number of buckets in the hash table and the increment values are relatively prime. The bdlc::HashTable
container also keeps track of the maximum chain length, number of collisions, and the total chain length, which can be used for statistical purposes when evaluating different hash functions.
-
- Bucket Type:
- The
bdlc::HashTable
class treats individual buckets as value-semantic types. The type of the buckets depends on the KEY
and VALUE
parameters used to instantiate the bdlc::HashTable
template. If the VALUE
parameter is bslmf::Nil
, then the type of the buckets is KEY
. Otherwise, the type of the buckets is bsl::pair<KEY, VALUE>
. For convenience, we will refer to the bucket type as Bucket
throughout this documentation.
- The
bdlc::HashTable
class reserves two distinct values from Bucket
s value-space to represent a "null" bucket and a "removed" bucket. These values are determined by the TRAITS
parameter, which is described in the next section. Since these two values are reserved for the internal use of the bdlc::HashTable
class, the behavior is undefined if one of these values is inserted into the hash table. Taking these values from the value-space of Bucket
allows the storage space required for each bucket to be as compact as possible.
-
- Traits:
- An optional
TRAITS
parameter can be specified when instantiating the bdlc::HashTable
template. This component provides a default traits implementation, bdlc::HashTableDefaultTraits
, which will be described later.
- The
TRAITS
parameter allows clients to specify how to load a bucket and how to compare keys. It also allows clients to classify two distinct values to represent "null" and "removed" buckets (see "Bucket Type" for more information about these reserved values).
- In the following description,
key1
and key2
refer to objects of type KEY
. bucket
, dstBucket
, and srcBucket
refer to objects of type Bucket
.
- The following expressions must be supported by the
TRAITS
parameter: Expression Semantics
---------- ---------
TRAITS::load(&dstBucket, srcBucket) Load the value of the specified
'srcBucket' into the specified
'dstBucket'.
TRAITS::areEqual(key1, key2) Return true if the specified 'key1'
matches the specified 'key2', and
false otherwise.
TRAITS::isNull(bucket) Return true if the specified 'bucket'
has the reserved "null" value, and
false otherwise.
TRAITS::setToNull(&bucket) Load the reserved "null" value into
the specified 'bucket'.
TRAITS::isRemoved(bucket) Return true if the specified 'bucket'
has the reserved "removed" value, and
false otherwise.
TRAITS::setToRemoved(&bucket) Load the reserved "removed" value
into the specified 'bucket'.
-
- Default Traits:
- The default traits, identified by
bdlc::HashTableDefaultTraits
, can be used when KEY
and VALUE
are either:
-
const char *
-
bsl::string
-
POD types
- The following expressions are implemented as:
Expression Implementation
---------- --------------
TRAITS::load(&dstBucket, srcBucket) This function is implemented as
'*dstBucket = srcBucket'.
TRAITS::areEqual(key1, key2) If 'KEY' is 'const char*', this
function is implemented as
'bsl::strcmp(key1, key2)'.
Otherwise, this function is
implemented as 'key1 == key2'.
The isNull
, setToNull
, isRemoved
, and setToRemoved
functions are implemented by checking for and assigning the appropriate "null" or "removed" values, respectively. These values are defined in the following table: Bucket Type Null Value Removed Value
----------- ---------- -------------
const char* 0x00000000 address 0xFFFFFFFF address
bsl::string "" "(* REMOVED *)"
All other types All bytes in the footprint All bytes in the footprint
are 0x00 are 0xFF
If Bucket
is of type bsl::pair<KEY, VALUE>
, then the "null" and "removed" values are applied to both the KEY
and the VALUE
.
- Since the default traits may write directly into the footprint of the bucket (except for
bsl::string
), it is important to note that the KEY
and VALUE
types should be POD types if the default traits are used.
-
- Hash Functors:
- Optional
HASH1
and HASH2
parameters can be specified when instantiating the bdlc::HashTable
template. This component provides a default hash functors, bdlc::HashTableDefaultHash1
and bdlc::HashTableDefaultHash2
, which will be described later.
- The
HASH1
and HASH2
parameters allow clients to specify hash functor policies for the first and second hash functions, respectively.
- In the following description,
key
refers to an object of type KEY
, and functor
refers to an immutable object of type HASH1
or HASH2
.
- The following expression must be supported by the supplied
HASH1
and HASH2
parameters: Expression Semantics Return Type
---------- --------- -----------
functor(&key) Return a hash value for the specified 'key' unsigned int
-
- Default Hash Functors:
- The default hash functors, identified by
bdlc::HashTableDefaultHash1
and bdlc::HashTableDefaultHash2
, can be used when KEY
is either: o const char*
o bsl::string
o a POD type
The bdlc::HashTableDefaultHash1
functor is implemented using bdlb::HashUtil::hash1
and the bdlc::HashTableDefaultHash2
functor is implemented using bdlb::HashUtil::hash2
.
- Note that
bdlb::HashUtil::hash1
and bdlb::HashUtil::hash2
calculate hash value from a fixed length block of memory. This block of memory is obtained based on the following table: KEY Type Block Data Block Length
-------- ---------- ------------
const char* key bsl::strlen(key)
bsl::string key.data() key.length()
All other types reinterpret_cast<const char *>(&key) sizeof(key)
Since the default hash functors use the footprint of the key (except for const char*
and bsl::string
) to compute hash values, it is important to note that the KEY
type should be a POD type if the default hash functors are used.
-
- Disabling Support for remove:
- By default (i.e., when using the default traits), the
remove
method can be used to remove an element from the hash table. However, there are cases when it is desirable not to allow elements to be removed. This can be achieved by supplying the bdlc::HashTable
template with a TRAITS
parameter that: o always returns false for the 'TRAITS::isRemoved(bucket)' expression
o AND does not implemented the 'TRAITS::setToRemoved(&bucket)' expression
This effectively describes a trait that does not define a special "removed" bucket value.
-
- Usage:
- The following snippets of code illustrate the usage of this component. Suppose we wanted to store a table of
int
keys with double
values. We will use a capacity hint of 10, default traits, and default hash functors for demonstration purposes: Now we can insert elements into this object: TableType::Handle handles[3];
struct {
int d_key;
double d_value;
} DATA[] = {
{ 10, 2.34 },
{ 92, 94.2 },
{ 236, 9.1 },
};
table.insert(&handles[0], DATA[0].d_key, DATA[0].d_value);
assert(DATA[0].d_key == table.key(handles[0]));
assert(DATA[0].d_value == table.value(handles[0]));
table.insert(&handles[1], DATA[1].d_key, DATA[1].d_value);
assert(DATA[1].d_key == table.key(handles[1]));
assert(DATA[1].d_value == table.value(handles[1]));
table.insert(&handles[2], DATA[2].d_key, DATA[2].d_value);
assert(DATA[2].d_key == table.key(handles[2]));
assert(DATA[2].d_value == table.value(handles[2]));
Now we can find elements in this object using the key: TableType::Handle otherHandles[3];
table.find(&otherHandles[0], DATA[0].d_key);
assert(DATA[0].d_key == table.key(otherHandles[0]));
assert(DATA[0].d_value == table.value(otherHandles[0]));
table.find(&otherHandles[1], DATA[1].d_key);
assert(DATA[1].d_key == table.key(otherHandles[1]));
assert(DATA[1].d_value == table.value(otherHandles[1]));
table.find(&otherHandles[2], DATA[2].d_key);
assert(DATA[2].d_key == table.key(otherHandles[2]));
assert(DATA[2].d_value == table.value(otherHandles[2]));
}