BDE 4.14.0 Production release
|
Provide an STL-compliant unordered_map
container.
unordered_map
containerCanonical header: bsl_unordered_map.h
This component defines a single class template, bsl::unordered_map
, implementing the standard container holding a collection of unique keys, each mapped to an associated value with no guarantees on ordering.
An instantiation of unordered_map
is an allocator-aware, value-semantic type whose salient attributes are its size (number of keys) and the set of KEY-VALUE
pairs the unordered_map
contains, without regard to their order. If unordered_map
is instantiated with a key type or mapped type that is not itself value-semantic, then it will not retain all of its value-semantic qualities. In particular, if the key or mapped type cannot be tested for equality, then an unordered_map
containing that type cannot be tested for equality. It is even possible to instantiate unordered_map
with types that do not have an accessible copy-constructor, in which case the unordered_map
will not be copyable. Note if a hasher and/or equality-comparison functor are supplied at container construction, they are copied to the container, and those copies, rather than the object(s) supplied, are used for hashing and equality comparison of keys.
When comparing unordered map containers for equality, the keys are compared using operator==
, rather than the EQUALS
parameter function type.
An unordered_map
meets the requirements of an unordered associative container with forward iterators in the C++11 standard [23.2.5]. The unordered_map
implemented here adheres to the C++11 standard when compiled with a C++11 compiler, and makes the best approximation when compiled with a C++03 compiler. In particular, for C++03 we emulate move semantics, but limit forwarding (in emplace
) to const
lvalues, and make no effort to emulate noexcept
or initializer-lists. The unordered_map
implemented here adheres to the C++ standard, except that it may rehash when setting the max_load_factor
in order to preserve the property that the factor is always respected (which is a potentially throwing operation).
An unordered_map
is a fully Value-Semantic Type (see bsldoc_glossary ) only if the supplied KEY
and VALUE
template parameters are themselves fully value-semantic. The alias value_type
is defined as pair<const KEY, VALUE>
. It is possible to instantiate an unordered_map
with KEY
and VALUE
parameter arguments that do not provide a full set of value-semantic operations, but then some methods of the container may not be instantiable. The following terminology, adopted from the C++11 standard, is used in the function documentation of map
to describe a function's requirements for the KEY
and VALUE
template parameters. These terms are also defined in section [17.6.3.1] of the C++11 standard.
The following terms are used to more precisely specify the requirements on template parameter types in function-level documentation.
default-insertable: T
has a default constructor. More precisely, T
is default-insertable
into X
means that the following expression is well-formed: allocator_traits<A>::construct(m, p)
move-insertable: T
provides a constructor that takes an rvalue of type (non-const
) T
. More precisely, T
is move-insertable
into X
means that the following expression is well-formed: allocator_traits<A>::construct(m, p, rv)
Note that since the first
field of T
is const
, T
is not move-insertable unless key_type
is copy-insertable.
copy-insertable: T
provides a constructor that takes an lvalue or rvalue of type (possibly const
) T
. More precisely, T
is copy-insertable
into X
means that the following expression is well-formed: allocator_traits<A>::construct(m, p, v)
move-assignable: T
provides an assignment operator that takes an rvalue of type (non-const
) T
. Note that since the first
element of value_type
is const
, value_type
is not move-assignable
. Note that since the first
field of T
is const
, T
is not move-assignable unless key_type
is copy-assignable.
copy-assignable: T
provides an assignment operator that takes an lvalue or rvalue of type (possibly const
) T
.
emplace-constructible: T
is emplace-constructible
into X
from args
means that the following expression is well-formed: allocator_traits<A>::construct(m, p, args)
erasable: T
provides a destructor. More precisely, T
is erasable
from X
means that the following expression is well-formed: allocator_traits<A>::destroy(m, p)
equality-comparable: The type provides an equality-comparison operator that defines an equivalence relationship and is both reflexive and transitive.
The (template parameter) types HASH
and EQUAL
must be copy-constructible function-objects. Note that this requirement is somewhat stronger than the requirement currently in the standard; see the discussion for Issue 2215 (http://cplusplus.github.com/LWG/lwg-active.html#2215);
Naturally, if either HASH
or EQUAL
is to be the default for its type, it must be default-constructible as well.
HASH
shall support a function call operator compatible with the following statements:
where the definition of the called function meets the requirements of a hash function as specified in {bslstl_hash |Standard Hash Function}.
EQUAL
shall support the a function call operator compatible with the following statements:
where the definition of the called function defines an equivalence relationship on keys that is both reflexive and transitive.
The HASH
and EQUAL
function-objects are further constrained, such that for any two objects whose keys compare equivalent by the comparator, shall also produce the same return value from the hasher.
The type supplied as the ALLOCATOR
template parameter determines how memory will be allocated. The unordered_map
template supports allocators meeting the requirements of the C++11 standard [allocator.requirements], and, in addition, it supports scoped-allocators derived from the bslma::Allocator
memory allocation protocol. Clients intending to use bslma
style allocators should use the template's default ALLOCATOR
type. The default type for the ALLOCATOR
template parameter, bsl::allocator
, provides a C++11 standard-compatible adapter for a bslma::Allocator
object.
If the (template parameter) type ALLOCATOR
of an unordered_map
instantiation is bsl::allocator
, then objects of that unordered map type will conform to the standard behavior of a bslma
-allocator-enabled type. Such an unordered map accepts an optional bslma::Allocator
argument at construction. If the address of a bslma::Allocator
object is explicitly supplied at construction, it is used to supply memory for the unordered_map
throughout its lifetime; otherwise, the unordered_map
will use the default allocator installed at the time of the unordered_map
s construction (see bslma_default ). In addition to directly allocating memory from the indicated bslma::Allocator
, an unordered_map
supplies that allocator's address to the constructors of contained objects of the (template parameter) types KEY
and VALUE
if, respectively, those types define the bslma::UsesBslmaAllocator
trait to true
.
This section describes the run-time complexity of operations on instances of unordered_map
:
No method of unordered_map
invalidates a pointer or reference to an element in the unordered map, unless it also erases that element, such as any erase
overload, clear
, or the destructor (that erases all elements). Pointers and references are stable through a rehash.
Iterators to elements in the container are invalidated by any rehash, so iterators may be invalidated by an insert
or emplace
call if it triggers a rehash (but not otherwise). Iterators to specific elements are also invalidated when that element is erased. Note that although the end
iterator is not an iterator referring to any element in the container, it may be invalidated by any non-const
method.
The unordered map has interfaces that can provide insight into and control of its inner workings. The unordered map is implemented using a hash table (see bslstl_hashtable ), a dynamically sized array of "buckets". If two elements hash to the same bucket (termed a "collision"), then that bucket will house multiple elements. As elements are added to the unordered map, the number of buckets is increased (and the existing elements redistributed) to keep the average number of elements per bucket (the "loading factor") below the specified maximum (the "maximum load factor", 1 by default). {Example 2: Examining and Setting Unordered Map Configuration} illustrates the use of these interfaces.
An important factor in the performance an unordered map (and any of the other unordered containers) is the choice of hash function. In general, one wants the hash function to return uniformly distributed values that can be assigned to buckets (see {Unordered Map Configuration}) with few collisions.
The bsl
package provides general purpose, default hash functions for bsl::string
, bslstl::StringRef
, and the arithmetic types (e.g., int
); however, custom defined hash functions may do better, especially if one has information about the distribution of keys; there is considerable literature on designing hash functions.
When a user-defined class is used as a key, hasher must be provided (and equality functor, if equality is not otherwise defined). Two examples, {Example 3} and {bslstl_unorderedset |Example 1}, address this issue by adapting the existing default hash functions for primitive types, an approach that may not always prove adequate.
In this section we show intended use of this component.
Unordered maps are useful in situations when there is no meaningful way to order the key values, when the order of the keys is irrelevant to the problem domain (see {Example 3}), and (even if there is a meaningful ordering) the value of ordering the results is outweighed by the higher performance provided by unordered maps (compared to ordered maps).
Suppose one wished to gather statistics on the words appearing in a large set of documents on disk or in a data base. Gathering those statistics is intrusive (as one is competing for access to the documents with the regular users) and must be done as quickly as possible. Moreover, the set of unique words appearing in those documents may be high. The English language has in excess of a million words (albeit many appear infrequently), and, if the documents contain serial numbers, or Social Security numbers, or chemical formulas, etc., then the O[log(n)]
insertion time of ordered maps may well be inadequate. The unordered map, having an O[1]
typical insertion cost, is a viable alternative. In many problem domains, sorting, if needed, can be done after the data is gathered.
This example illustrates the use of bsl::unordered_map
to gather one simple statistic (counts of unique words) on a single document. To avoid irrelevant details of acquiring the data, several modestly sized documents are stored in static arrays:
First, we define an alias to make our code more comprehensible.
Next, we create an (empty) unordered map to hold our word tallies. The output from the printf
statements will be discussed in {Example 2}.
Then, we define the set of characters that define word boundaries:
Next, we extract the words from our documents. Note that strtok
modifies the document arrays (which were not made const
).
For each iteration of the inner loop, that method looks for a map entry matching the given key value. On the first occurrence of a word, the map has no such entry, so one is created with a default value of the mapped value (0, just what we want in this case) and inserted into the map where it is found on any subsequent occurrences of the word. The operator[]
method returns a reference providing modifiable access to the mapped value. Here, we apply the ++
operator to that reference to maintain a tally for the word.
Now that the data has been (quickly) gathered, we can indulge in analysis that is more time consuming. For example, we can define a comparison function, copy the data to another container (e.g., bsl::vector
), sort the entries, and determine the 20 most commonly used words in the given documents:
Notice that partial_sort suffices here since we seek only the 20 most used words, not a complete distribution of word counts.
Finally, we print the sorted portion of array
:
and standard output shows:
Notice that "-" (used as an header underscore in our markup) appears in the word count. That could be eliminated by adding -
to the set of delimiters; however, that would partition hyphenated words into separate words. In practice, one defines a "stop list" of common words (e.g., "the", "of", "and", "is") that one does not wish to tally. We could easily add "-" to the stop list.
Suppose we wish to examine (and possibly influence) the performance of an unordered map. The unordered map provides several interfaces that allow us to do so. Several of these were used in {Example 1} (code repeated below):
First, we examine the metrics of this newly created (empty) unordered map:
Notice that even when there are no elements (size
is 0) there is one bucket. Since there are no elements, the average number of elements per bucket (the load_factor above) must be 0.
Next, after wordTally
has been loaded, we examine its metrics:
and find at standard output:
Notice how the number of buckets has increased. (Sampling this metric as the map was loaded would show that the increase was done in several stages.)
Then, we see that the load factor is indeed below the specified maximum; however we obtain further details of how the buckets are used.
Using the bucket_count method, the unordered map's interface for the number of elements in each bucket, we can easily determine the bucket with the greatest number of elements (i.e., the greatest number of collisions):
and find on standard output:
We can also count the number of empty buckets, and the number of buckets at maxBucketSize
.
which shows on standard output:
Suppose we are not satisfied with this distribution. (Perhaps the load factor is too high.) We can create a second, differently configured table.
Next, create a new table wordTally2
with twice the bucket count shown by the first table (wordTally
), and examine its initial metrics.
Standard output shows:
Notice that although we requested 4198 buckets (2 * 2099), we created a table with 4201 buckets. (4201 is the smallest prime number greater than 4198).
Then, we load our new table and examine its metrics. For simplicity, we load data from the first table rather than re-tokenize our documents.
Finally, we see on standard output:
Notice that the loading factor has been (roughly) cut in half; we have achieved our goal. Also notice that the bucket count is unchanged since construction; thus, there were no rehashes during the loading this unordered map. Finally, notice that the number of empty (unused) buckets is significantly higher, and there's been a modest decrease in the largest bucket size, but more instances of them.
Thus, the unordered map provides facilities by which we can make trade-offs in performance characteristics of the containers we create.
If one has a concordance for a set of documents (an index of the position of every unique word in those documents), then words of interest can be efficiently located. Suppose after locating a word of interest one also needs the surrounding words (for context). Searching in the original document requires re-tokenization (time consuming). Alternatively, one can use the concordance to create an inverse concordance to provide a fast lookup of the words at given locations in a document and then examine words near the word of interest.
First, we define the types required (and convenient aliases) to create an unordered map from a word location to the corresponding word. The "key" value will be WordLocation
, a pair of int
values: the first being the document code number (arbitrarily assigned), and second the word offset in that document (the first word of the document is at offset 0). The "mapped" value of each entry is a bsl::string
containing the word at that location.
Notice that the WordLocation
, the type of the key value, has no natural ordering. The assignment of document codes is arbitrary so there is no reason to consider the words on one document to sort below those in any another.
Then, since there is no default hash function for the WordLocation
type, we define one. The document code and the word offset are individually hashed using the default hasher for the int
type and those results bitwise exclusive OR-ed a combined result. This trivial combination formula suffices for this problem, but is not a general solution for combining hashes; see {Practical Requirements on HASH
}.
Notice that many of the required methods of the hash type are compiler generated. (The declaration of those methods are commented out and suffixed by an = default
comment.)
In addition to a hash functor, the unordered map requires an equality comparison functor. In this example, the unordered map uses operator==
method of std::pair
by default. If the mapped type has no such method, a equality-comparison functor must be provided explicitly.
Next, we define the type of the unordered map and associated convenience aliases:
Next, we obtain a concordance for the document set (see {bslstl_unorderedmultimap |Example 1}). Here, the concordance is provided as a statically initialized array:
Then, we create inverseConcordance
, an unordered map, and initialize it with values obtained from concordance
.
Notice that we expect every insert
to be successful, as the concordance should not show more than one word at any location.
Next, suppose we knew the location of the word "unalienable" in the document set (see {bslstl_unorderedmultimap |Example 1}) and want to know its context?
We use the find
method of inverseConcordance
to determine the words within offset delta
of "unalienable". Note that we must check the validity of the returned iterator, in case we probe beyond the boundaries of the document.
Notice that the assertion confirms that "unalienable" is found in our inverse location at the location we obtained from the concordance.
Finally, we find on standard output: