BDE 4.14.0 Production release
|
Provide an open-addressed unordered set container.
This component defines a single class template, bdlc::FlatHashSet
, that implements an open-addressed unordered set of items with unique values.
Unordered sets are useful in situations when there is no meaningful way to order key values, when the order of the values is irrelevant to the problem domain, or (even if there is a meaningful ordering) the value of ordering the results is outweighed by the higher performance provided by unordered sets (compared to ordered sets). On platforms that support relevant SIMD instructions (e.g., SSE2), bdlc::FlatHashSet
generally exhibits better performance than bsl::unordered_set
.
An instantiation of bdlc::FlatHashSet
is an allocator-aware, value-semantic type whose salient attributes are the set of values contained, without regard to order. An instantiation may be provided with custom hash and equality functors, but those are not salient attributes. In particular, when comparing element values for equality between two different bdlc::FlatHashSet
objects, the elements are compared using operator==
.
The implemented data structure is inspired by Google's flat_hash_map
CppCon presentations (available on YouTube). The implementation draws from Google's open source raw_hash_set.h
file at: https://github.com/abseil/abseil-cpp/blob/master/absl/container/internal.
bdlc::FlatHashSet
is recommended for Intel platforms only (i.e., Linux and Windows, and pre-ARM Macs); on platforms using other processors (i.e., Sun and AIX), bdlc::FlatHashSet
may have slower performance than bsl::unordered_set
. However, note that bdlc::FlatHashSet
will use significantly less memory than bsl::unordered_set
on all platforms. Given the Intel-only performance caveat, it is recommended to benchmark before using bdlc::FlatHashSet
– particularly on non-Intel production environments.
A bdlc::FlatHashSet
meets most of the requirements of an unordered associative container with forward iterators in the C++11 Standard [23.2.5]. It does not have the bucket interface, and locations of elements may change when the container is modified (and therefore iterators become invalid too). Allocator use follows BDE style, and the various allocator propagation attributes are not present (e.g., the allocator trait propagate_on_container_copy_assignment
). The maximum load factor of the container (the ratio of size to capacity) is maintained by the container itself and is not settable (the maximum load factor is implementation defined and fixed).
An invariant of bdlc::FlatHashSet
is that 0 <= load_factor() <= max_load_factor() <= 1.0
. Any operation that would result in load_factor() > max_load_factor()
for a bdlc::FlatHashSet
causes the capacity to increase. This resizing allocates new memory, copies or moves all elements to the new memory, and reclaims the original memory. The transfer of elements involves rehashing each element to determine its new location. As such, all iterators, pointers, and references to elements of the bdlc::FlatHashSet
are invalidated on a resize.
The template parameter type KEY
must be copy or move constructible. The template parameter types HASH
and EQUAL
must be default and copy constructible function objects.
HASH
must support a function-call operator compatible with the following statements for an object key
of type KEY
:
EQUAL
must support a function-call operator compatible with the following statements for objects key1
and key2
of type KEY
:
where the definition of the called function defines an equivalence relationship on keys that is both reflexive and transitive.
HASH
and EQUAL
function objects are further constrained: if the comparator determines that two values are equal, the hasher must produce the same hash value for each.
Any change in capacity of a bdlc::FlatHashSet
invalidates all pointers, references, and iterators. A bdlc::FlatHashSet
manipulator that erases an element invalidates all pointers, references, and iterators to the erased element.
A bdlc::FlatHashSet
is exception neutral, and all of the methods of bdlc::FlatHashSet
provide the basic exception safety guarantee (see {bsldoc_glossary |Basic Guarantee}).
Move-only types are supported by bdlc::FlatHashSet
on C++11, and later, platforms only (where BSLMF_MOVABLEREF_USES_RVALUE_REFERENCES
is defined), and are not supported on C++03 platforms. Unfortunately, in C++03, there are user-defined types where a bslmf::MovableRef
will not safely degrade to an lvalue reference when a move constructor is not available (types providing a constructor template taking any type), so bslmf::MovableRefUtil::move
cannot be used directly on a user-supplied template parameter type.
In this section we show intended use of this component.
Suppose one is analyzing data on a set of customers, and each customer is categorized by several attributes: customer type, geographic area, and (internal) project code; and that each attribute takes on one of a limited set of values. This data can be handled by creating an enumeration for each of the attributes:
The data set (randomly generated for this example) is provided in a statically initialized array:
Suppose, as the first step in our analysis, we wish to determine the number of unique combinations of customer attributes that exist in our data set. We can do that by inserting each data item into a flat hash set: the first insert of a combination will succeed, the others will fail, but at the end of the process, the set will contain one entry for every unique combination in our data.
First, as there are no standard methods for hashing or comparing our user-defined types, we define CustomerProfileHash
and CustomerProfileEqual
classes, each a stateless functor. Note that there is no meaningful ordering of the attribute values, they are merely arbitrary code numbers; nothing is lost by using an unordered set instead of an ordered set:
The hash function combines the several enumerated values from the class (each a small int
value) into a single, unique int
value, and then applies the default hash function for int
.
Notice that many of the required methods of the hash and comparator types are compiler generated. (The declarations of those methods are commented out and suffixed by an = default
comment.)
Then, we define the type of the flat hash set:
Next, we create a flat hash set and insert each item of customerProfiles
:
Notice that we ignore the status returned by the insert
method. We fully expect some operations to fail.
Finally, the size of profileCategories
matches the number of unique customer profiles in this data set:
Standard output shows: