BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bslh_defaulthashalgorithm.h
Go to the documentation of this file.
1/// @file bslh_defaulthashalgorithm.h
2///
3/// The content of this file has been pre-processed for Doxygen.
4///
5
6
7// bslh_defaulthashalgorithm.h -*-C++-*-
8#ifndef INCLUDED_BSLH_DEFAULTHASHALGORITHM
9#define INCLUDED_BSLH_DEFAULTHASHALGORITHM
10
11#include <bsls_ident.h>
12BSLS_IDENT("$Id: $")
13
14/// @defgroup bslh_defaulthashalgorithm bslh_defaulthashalgorithm
15/// @brief Provide a reasonable hashing algorithm for default use.
16/// @addtogroup bsl
17/// @{
18/// @addtogroup bslh
19/// @{
20/// @addtogroup bslh_defaulthashalgorithm
21/// @{
22///
23/// <h1> Outline </h1>
24/// * <a href="#bslh_defaulthashalgorithm-purpose"> Purpose</a>
25/// * <a href="#bslh_defaulthashalgorithm-classes"> Classes </a>
26/// * <a href="#bslh_defaulthashalgorithm-description"> Description </a>
27/// * <a href="#bslh_defaulthashalgorithm-security"> Security </a>
28/// * <a href="#bslh_defaulthashalgorithm-speed"> Speed </a>
29/// * <a href="#bslh_defaulthashalgorithm-hash-distribution"> Hash Distribution </a>
30/// * <a href="#bslh_defaulthashalgorithm-hash-consistency"> Hash Consistency </a>
31/// * <a href="#bslh_defaulthashalgorithm-usage"> Usage </a>
32/// * <a href="#bslh_defaulthashalgorithm-example-creating-and-using-a-hash-table"> Example: Creating and Using a Hash Table </a>
33///
34/// # Purpose {#bslh_defaulthashalgorithm-purpose}
35/// Provide a reasonable hashing algorithm for default use.
36///
37/// # Classes {#bslh_defaulthashalgorithm-classes}
38///
39/// - bslh::DefaultHashAlgorithm: a default hashing algorithm
40///
41/// @see bslh_hash, bslh_defaultseededhashalgorithm
42///
43/// # Description {#bslh_defaulthashalgorithm-description}
44/// `bslh::DefaultHashAlgorithm` provides an unspecified default
45/// hashing algorithm. The supplied algorithm is suitable for general purpose
46/// use in a hash table. The underlying algorithm is subject to change in
47/// future releases.
48///
49/// This class satisfies the requirements for regular `bslh` hashing algorithms,
50/// defined in bslh_hash.h. More information can be found in the package level
51/// documentation for `bslh` (internal users can also find information here
52/// {TEAM BDE:USING MODULAR HASHING<GO>})
53///
54/// ## Security {#bslh_defaulthashalgorithm-security}
55///
56///
57/// In this context "security" refers to the ability of the algorithm to produce
58/// hashes that are not predictable by an attacker. Security is a concern when
59/// an attacker may be able to provide malicious input into a hash table,
60/// thereby causing hashes to collide to buckets, which degrades performance.
61/// There are *no* security guarantees made by `bslh::DefaultHashAlgorithm`,
62/// meaning attackers may be able to engineer keys that will cause a Denial of
63/// Service (DoS) attack in hash tables using this algorithm. If security is
64/// required, an algorithm that documents better secure properties should be
65/// used, such as @ref bslh_siphashalgorithm .
66///
67/// ## Speed {#bslh_defaulthashalgorithm-speed}
68///
69///
70/// The default hash algorithm will compute a hash on the order of O(n) where
71/// `n` is the length of the input data. Note that this algorithm will produce
72/// hashes fast enough to be used to hash keys in a hash table. The chosen
73/// algorithm will be quicker than specialized algorithms such as SipHash, but
74/// not as fast as hashing using the identity function.
75///
76/// ## Hash Distribution {#bslh_defaulthashalgorithm-hash-distribution}
77///
78///
79/// The default hash algorithm will distribute hashes in a pseudo-random
80/// distribution across the key space. The hash function will exhibit avalanche
81/// behavior, meaning changing one bit of input will result in a 50% chance of
82/// each output bit changing. Avalanche behavior is enough to guarantee good
83/// key distribution, even when values are consecutive.
84///
85/// ## Hash Consistency {#bslh_defaulthashalgorithm-hash-consistency}
86///
87///
88/// The default hash algorithm guarantees only that hashes will remain
89/// consistent within a single process, meaning different hashes may be produced
90/// on machines of different endianness or even between runs on the same
91/// machine. Therefor it is not recommended to send hashes from
92/// `bslh::DefaultHashAlgorithm` over a network. It is also not recommended to
93/// write hashes from `bslh::DefaultHashAlgorithm` to any memory accessible by
94/// multiple machines.
95///
96/// ## Usage {#bslh_defaulthashalgorithm-usage}
97///
98///
99/// This section illustrates intended usage of this component.
100///
101/// ### Example: Creating and Using a Hash Table {#bslh_defaulthashalgorithm-example-creating-and-using-a-hash-table}
102///
103///
104/// Suppose we have any array of types that define `operator==`, and we want a
105/// fast way to find out if values are contained in the array. We can create a
106/// `HashTable` data structure that is capable of looking up values in O(1)
107/// time.
108///
109/// Further suppose that we will be storing futures (the financial instruments)
110/// in this table. Since futures have standardized names, we don't have to
111/// worry about any malicious values causing collisions. We will want to use a
112/// general purpose hashing algorithm with a good hash distribution and good
113/// speed. This algorithm will need to be in the form of a hash functor -- an
114/// object that will take objects stored in our array as input, and yield an
115/// integer value. The functor can pass the attributes of the `TYPE` that are
116/// salient to hashing into the hashing algorithm, and then return the hash that
117/// is produced.
118///
119/// We can use the result of the hash function to index into our array of
120/// `buckets`. Each `bucket` is simply a pointer to a value in our original
121/// array of `TYPE` objects. We will resolve hash collisions in our array
122/// through `linear probing`, where we will search consecutive buckets following
123/// the bucket where the collision occurred, testing occupied buckets for
124/// equality with the value we are searching on, and concluding that the value
125/// is not in the table if we encounter an empty bucket before we encounter one
126/// referring to an equal element.
127///
128/// First, we define our `HashTable` template class, with the two type
129/// parameters: `TYPE` (the type being referenced) and `HASHER` (a functor that
130/// produces the hash).
131/// @code
132/// /// This class template implements a hash table providing fast lookup of
133/// /// an external, non-owned, array of values of (template parameter)
134/// /// `TYPE`.
135/// ///
136/// /// The (template parameter) `TYPE` shall have a transitive, symmetric
137/// /// `operator==` function. There is no requirement that it have any
138/// /// kind of creator defined.
139/// ///
140/// /// The `HASHER` template parameter type must be a functor with a method
141/// /// having the following signature:
142/// ///..
143/// /// size_t operator()(TYPE) const;
144/// /// -OR-
145/// /// size_t operator()(const TYPE&) const;
146/// ///..
147/// /// and `HASHER` shall have a publicly accessible default constructor
148/// // and destructor.
149/// ///
150/// /// Note that this hash table has numerous simplifications because we
151/// /// know the size of the array and never have to resize the table.
152/// template <class TYPE, class HASHER>
153/// class HashTable {
154///
155/// // DATA
156/// const TYPE *d_values; // Array of values table is to
157/// // hold
158/// size_t d_numValues; // Length of 'd_values'.
159/// const TYPE **d_bucketArray; // Contains ptrs into 'd_values'
160/// size_t d_bucketArrayMask; // Will always be '2^N - 1'.
161/// HASHER d_hasher; // User supplied hashing algorithm
162///
163/// private:
164/// // PRIVATE ACCESSORS
165///
166/// /// Look up the specified `value`, having the specified `hashValue`,
167/// /// and load its index in `d_bucketArray` into the specified `idx`.
168/// /// If not found, return the vacant entry in `d_bucketArray` where
169/// /// it should be inserted. Return `true` if `value` is found and
170/// /// `false` otherwise.
171/// bool lookup(size_t *idx,
172/// const TYPE& value,
173/// size_t hashValue) const;
174///
175/// public:
176/// // CREATORS
177///
178/// /// Create a hash table referring to the specified `valuesArray`
179/// /// having length of the specified `numValues`. No value in
180/// /// `valuesArray` shall have the same value as any of the other
181/// /// values in `valuesArray`
182/// HashTable(const TYPE *valuesArray,
183/// size_t numValues);
184///
185/// /// Free up memory used by this hash table.
186/// ~HashTable();
187///
188/// // ACCESSORS
189///
190/// /// Return true if the specified 'value' is found in the table and
191/// /// false otherwise.
192/// bool contains(const TYPE& value) const;
193/// };
194/// @endcode
195/// Then, we define a `Future` class, which holds a c-string `name`, char
196/// `callMonth`, and short `callYear`.
197/// @code
198/// /// This class identifies a future contract. It tracks the name, call
199/// /// month and year of the contract it represents, and allows equality
200/// /// comparison.
201/// class Future {
202///
203/// // DATA
204/// const char *d_name; // held, not owned
205/// const char d_callMonth;
206/// const short d_callYear;
207///
208/// public:
209/// // CREATORS
210///
211/// /// Create a `Future` object out of the specified `name`,
212/// /// `callMonth`, and `callYear`.
213/// Future(const char *name, const char callMonth, const short callYear)
214/// : d_name(name), d_callMonth(callMonth), d_callYear(callYear)
215/// {}
216///
217/// /// Create a 'Future' with default values.
218/// Future() : d_name(""), d_callMonth('\0'), d_callYear(0)
219/// {}
220///
221/// // ACCESSORS
222///
223/// /// Return the month that this future expires.
224/// const char * getMonth() const
225/// {
226/// return &d_callMonth;
227/// }
228///
229/// /// Return the name of this future.
230/// const char * getName() const
231/// {
232/// return d_name;
233/// }
234///
235/// /// Return the year that this future expires.
236/// const short * getYear() const
237/// {
238/// return &d_callYear;
239/// }
240///
241/// /// Compare this to the specified `other` object and return true if
242/// /// they are equal.
243/// bool operator==(const Future& other) const
244/// {
245/// return (!strcmp(d_name, other.d_name)) &&
246/// d_callMonth == other.d_callMonth &&
247/// d_callYear == other.d_callYear;
248/// }
249/// };
250///
251/// /// Compare compare the specified `lhs` and `rhs` objects and return
252/// /// true if they are not equal.
253/// bool operator!=(const Future& lhs, const Future& rhs)
254/// {
255/// return !(lhs == rhs);
256/// }
257/// @endcode
258/// Next, we need a hash functor for `Future`. We are going to use the
259/// `DefaultHashAlgorithm` because it is a fast, general purpose hashing
260/// algorithm that will provide an easy way to combine the attributes of
261/// `Future` objects that are salient to hashing into one reasonable hash that
262/// will distribute the items evenly throughout the hash table. Moreover, when
263/// a new hashing algorithm is discovered to be a better default, we can be
264/// automatically be upgraded to use it as soon as `bslh::DefaultHashAlgorithm`
265/// is updated.
266/// @code
267/// /// This struct is a functor that will apply the `DefaultHashAlgorithm`
268/// /// to objects of type `Future`.
269/// struct HashFuture {
270///
271/// /// Return the hash of the of the specified `future`. Note that
272/// /// this uses the `DefaultHashAlgorithm` to quickly combine the
273/// /// attributes of `Future` objects that are salient to hashing into
274/// /// a hash suitable for a hash table.
275/// size_t operator()(const Future& future) const
276/// {
277/// DefaultHashAlgorithm hash;
278///
279/// hash(future.getName(), strlen(future.getName()));
280/// hash(future.getMonth(), sizeof(char));
281/// hash(future.getYear(), sizeof(short));
282///
283/// return static_cast<size_t>(hash.computeHash());
284/// }
285/// };
286/// @endcode
287/// Then, we want to actually use our hash table on `Future` objects. We create
288/// an array of `Future`s based on data that was originally from some external
289/// source:
290/// @code
291/// Future futures[] = { Future("Swiss Franc", 'F', 2014),
292/// Future("US Dollar", 'G', 2015),
293/// Future("Canadian Dollar", 'Z', 2014),
294/// Future("British Pound", 'M', 2015),
295/// Future("Deutsche Mark", 'X', 2016),
296/// Future("Eurodollar", 'Q', 2017)};
297/// enum { NUM_FUTURES = sizeof futures / sizeof *futures };
298/// @endcode
299/// Next, we create our HashTable `hashTable`. We pass the functor that we
300/// defined above as the second argument:
301/// @code
302/// HashTable<Future, HashFuture> hashTable(futures, NUM_FUTURES);
303/// @endcode
304/// Now, we verify that each element in our array registers with count:
305/// @code
306/// for ( int i = 0; i < 6; ++i) {
307/// ASSERT(hashTable.contains(futures[i]));
308/// }
309/// @endcode
310/// Finally, we verify that futures not in our original array are correctly
311/// identified as not being in the set:
312/// @code
313/// ASSERT(!hashTable.contains(Future("French Franc", 'N', 2019)));
314/// ASSERT(!hashTable.contains(Future("Swiss Franc", 'X', 2014)));
315/// ASSERT(!hashTable.contains(Future("US Dollar", 'F', 2014)));
316/// @endcode
317/// @}
318/** @} */
319/** @} */
320
321/** @addtogroup bsl
322 * @{
323 */
324/** @addtogroup bslh
325 * @{
326 */
327/** @addtogroup bslh_defaulthashalgorithm
328 * @{
329 */
330
331#include <bslscm_version.h>
332
333#include <bsls_assert.h>
334
336
337
338
339namespace bslh {
340
341/// This class wraps an unspecified default hashing algorithm that is
342/// appropriate for general purpose use such as generating hashes for a hash
343/// table.
344///
345/// See @ref bslh_defaulthashalgorithm
347
348 private:
349 // PRIVATE TYPES
350
351 /// Typedef indicating the algorithm currently being used by
352 /// `bslh::DefualtHashAlgorithm` to compute hashes. This algorithm is
353 /// subject to change.
355
356 // DATA
357
358 // Object storing the state of the chosen `InternalHashAlgorithm`.
359 InternalHashAlgorithm d_state;
360
361 // NOT IMPLEMENTED
362
363 /// Do not allow copy construction.
364 DefaultHashAlgorithm(const DefaultHashAlgorithm& original); // = delete;
365
366 DefaultHashAlgorithm& operator=(const DefaultHashAlgorithm& rhs);
367 // = delete;
368 // Do not allow assignment.
369
370 public:
371 // TYPES
372
373 /// Typedef indicating the value type returned by this algorithm.
375
376 // CREATORS
377
378 /// Create a `bslh::DefaultHashAlgorithm`, default constructing the
379 /// algorithm being used internally.
381
382 /// Destroy this object.
384
385 // MANIPULATORS
386
387 /// Incorporate the specified `data`, of at least the specified
388 /// `numBytes`, into the internal state of the hashing algorithm. Every
389 /// bit of data incorporated into the internal state of the algorithm
390 /// will contribute to the final hash produced by `computeHash()`. The
391 /// same hash will be produced regardless of whether a sequence of bytes
392 /// is passed in all at once or through multiple calls to this member
393 /// function. Input where `numBytes` is 0 will have no effect on the
394 /// internal state of the algorithm. The behaviour is undefined unless
395 /// `data` points to a valid memory location with at least `numBytes`
396 /// bytes of initialized memory or `numBytes` is zero.
397 void operator()(const void *data, size_t numBytes);
398
399 /// Return the finalized version of the hash that has been accumulated.
400 /// Note that this changes the internal state of the object, so calling
401 /// `computeHash()` multiple times in a row will return different
402 /// results, and only the first result returned will match the expected
403 /// result of the algorithm. Also note that a value will be returned,
404 /// even if data has not been passed into `operator()`
406};
407
408// ============================================================================
409// INLINE DEFINITIONS
410// ============================================================================
411
412// CREATORS
413inline
418
419// MANIPULATORS
420inline
421void DefaultHashAlgorithm::operator()(const void *data, size_t numBytes)
422{
423 BSLS_ASSERT(0 != data || 0 == numBytes);
424 d_state(data, numBytes);
425}
426
427inline
432
433} // close package namespace
434
435
436#endif
437
438// ----------------------------------------------------------------------------
439// Copyright 2017 Bloomberg Finance L.P.
440//
441// Licensed under the Apache License, Version 2.0 (the "License");
442// you may not use this file except in compliance with the License.
443// You may obtain a copy of the License at
444//
445// http://www.apache.org/licenses/LICENSE-2.0
446//
447// Unless required by applicable law or agreed to in writing, software
448// distributed under the License is distributed on an "AS IS" BASIS,
449// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
450// See the License for the specific language governing permissions and
451// limitations under the License.
452// ----------------------------- END-OF-FILE ----------------------------------
453
454/** @} */
455/** @} */
456/** @} */
Definition bslh_defaulthashalgorithm.h:346
InternalHashAlgorithm::result_type result_type
Typedef indicating the value type returned by this algorithm.
Definition bslh_defaulthashalgorithm.h:374
~DefaultHashAlgorithm()=default
Destroy this object.
DefaultHashAlgorithm()
Definition bslh_defaulthashalgorithm.h:414
void operator()(const void *data, size_t numBytes)
Definition bslh_defaulthashalgorithm.h:421
result_type computeHash()
Definition bslh_defaulthashalgorithm.h:428
Definition bslh_wyhashincrementalalgorithm.h:447
result_type computeHash()
Definition bslh_wyhashincrementalalgorithm.h:875
bsls::Types::Uint64 result_type
Typedef indicating the value type returned by this algorithm.
Definition bslh_wyhashincrementalalgorithm.h:459
#define BSLS_ASSERT(X)
Definition bsls_assert.h:1804
#define BSLS_IDENT(str)
Definition bsls_ident.h:195
Definition bslh_defaulthashalgorithm.h:339