BDE 4.14.0 Production release
|
Provide an open-addressed unordered map container.
This component defines a single class template, bdlc::FlatHashMap
, that implements an open-addressed unordered map of items with unique keys.
Unordered maps are useful in situations when there is no meaningful way to order key values, when the order of the keys is irrelevant to the problem domain, or (even if there is a meaningful ordering) the value of ordering the keys is outweighed by the higher performance provided by unordered maps (compared to ordered maps). On platforms that support relevant SIMD instructions (e.g., SSE2), bdlc::FlatHashMap
generally exhibits better performance than bsl::unordered_map
.
An instantiation of bdlc::FlatHashMap
is an allocator-aware, value-semantic type whose salient attributes are the collection of KEY-VALUE
pairs contained, without regard to order. An instantiation may be provided with custom hash and key-equality functors, but those are not salient attributes. In particular, when comparing element values for equality between two different bdlc::FlatHashMap
objects, the elements are compared using operator==
.
The implemented data structure is inspired by Google's flat_hash_map
CppCon presentations (available on YouTube). The implementation draws from Google's open source raw_hash_set.h
file at: https://github.com/abseil/abseil-cpp/blob/master/absl/container/internal.
bdlc::FlatHashMap
is recommended for Intel platforms only (i.e., Linux and Windows, and pre-ARM Macs); on platforms using other processors (i.e., Sun and AIX), bdlc::FlatHashMap
may have slower performance than bsl::unordered_map
. However, note that bdlc::FlatHashMap
will use significantly less memory than bsl::unordered_map
on all platforms. Given the Intel-only performance caveat, it is recommended to benchmark before using bdlc::FlatHashMap
– particularly on non-Intel production environments.
A bdlc::FlatHashMap
meets most of the requirements of an unordered associative container with forward iterators in the C++11 Standard [23.2.5]. It does not have the bucket interface, and locations of elements may change when the container is modified (and therefore iterators become invalid too). Allocator use follows BDE style, and the various allocator propagation attributes are not present (e.g., the allocator trait propagate_on_container_copy_assignment
). The maximum load factor of the container (the ratio of size to capacity) is maintained by the container itself and is not settable (the maximum load factor is implementation defined and fixed).
An invariant of bdlc::FlatHashMap
is that 0 <= load_factor() <= max_load_factor() <= 1.0
. Any operation that would result in load_factor() > max_load_factor()
for a bdlc::FlatHashMap
causes the capacity to increase. This resizing allocates new memory, copies or moves all elements to the new memory, and reclaims the original memory. The transfer of elements involves rehashing each element to determine its new location. As such, all iterators, pointers, and references to elements of the bdlc::FlatHashMap
are invalidated on a resize.
The template parameter type KEY
must be copy or move constructible. The template parameter types HASH
and EQUAL
must be default and copy constructible function objects.
HASH
must support a function-call operator compatible with the following statements for an object key
of type KEY
:
EQUAL
must support a function-call operator compatible with the following statements for objects key1
and key2
of type KEY
:
where the definition of the called function defines an equivalence relationship on keys that is both reflexive and transitive.
HASH
and EQUAL
function objects are further constrained: if the comparator determines that two values are equal, the hasher must produce the same hash value for each.
Any change in capacity of a bdlc::FlatHashMap
invalidates all pointers, references, and iterators. A bdlc::FlatHashMap
manipulator that erases an element invalidates all pointers, references, and iterators to the erased element.
A bdlc::FlatHashMap
is exception neutral, and all of the methods of bdlc::FlatHashMap
provide the basic exception safety guarantee (see {bsldoc_glossary |Basic Guarantee}).
Move-only types are supported by bdlc::FlatHashMap
on C++11, and later, platforms only (where BSLMF_MOVABLEREF_USES_RVALUE_REFERENCES
is defined), and are not supported on C++03 platforms. Unfortunately, in C++03, there are user-defined types where a bslmf::MovableRef
will not safely degrade to an lvalue reference when a move constructor is not available (types providing a constructor template taking any type), so bslmf::MovableRefUtil::move
cannot be used directly on a user-supplied template parameter type.
In this section we show intended use of this component.
Suppose one wished to gather statistics on the words appearing in a large set of documents on disk or in a database. Gathering those statistics is intrusive (as one is competing for access to the documents with the regular users) and must be done as quickly as possible. Moreover, the set of unique words appearing in those documents may be high. The English language has in excess of a million words (albeit many appear infrequently), and, if the documents contain serial numbers, or Social Security numbers, or chemical formulas, etc., then the O[log(n)]
insertion time of ordered maps may well be inadequate. An unordered map, having an O[1]
typical insertion cost, is a viable alternative.
This example illustrates the use of bdlc::FlatHashMap
to gather one simple statistic (counts of unique words) on a portion of a single document. To avoid irrelevant details of acquiring the data, the data is stored in static arrays:
First, we define an alias to make our code more comprehensible:
Next, we create an (empty) flat hash map to hold our word tallies:
Then, we define the set of characters that define word boundaries:
Next, we extract the words from our document. Note that strtok
modifies the document array (which was not made const
).
For each iteration of the loop, a map entry matching the key value parsed by strtok
is obtained. On the first occurrence of a word, the map has no such entry, so one is created with a default value of the mapped value (0, just what we want in this case) and inserted into the map where it is found on any subsequent occurrences of the word. The operator[]
method returns a reference providing modifiable access to the mapped value. Here, we apply the ++
operator to that reference to maintain a tally for the word:
Now that the data has been (quickly) gathered, we can indulge in analysis that is more time consuming. For example, we can define a comparison function, copy the data to another container (e.g., bsl::vector
), sort the entries, and determine the 10 most commonly used words in the given documents:
Notice that partial_sort suffices here since we seek only the 10 most used words, not a complete distribution of word counts.
Finally, we print the sorted portion of array
:
and standard output shows: