BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bslh_defaultseededhashalgorithm.h
Go to the documentation of this file.
1/// @file bslh_defaultseededhashalgorithm.h
2///
3/// The content of this file has been pre-processed for Doxygen.
4///
5
6
7// bslh_defaultseededhashalgorithm.h -*-C++-*-
8#ifndef INCLUDED_BSLH_DEFAULTSEEDEDHASHALGORITHM
9#define INCLUDED_BSLH_DEFAULTSEEDEDHASHALGORITHM
10
11#include <bsls_ident.h>
12BSLS_IDENT("$Id: $")
13
14/// @defgroup bslh_defaultseededhashalgorithm bslh_defaultseededhashalgorithm
15/// @brief Provide a reasonable seeded hashing algorithm for default use.
16/// @addtogroup bsl
17/// @{
18/// @addtogroup bslh
19/// @{
20/// @addtogroup bslh_defaultseededhashalgorithm
21/// @{
22///
23/// <h1> Outline </h1>
24/// * <a href="#bslh_defaultseededhashalgorithm-purpose"> Purpose</a>
25/// * <a href="#bslh_defaultseededhashalgorithm-classes"> Classes </a>
26/// * <a href="#bslh_defaultseededhashalgorithm-description"> Description </a>
27/// * <a href="#bslh_defaultseededhashalgorithm-security"> Security </a>
28/// * <a href="#bslh_defaultseededhashalgorithm-speed"> Speed </a>
29/// * <a href="#bslh_defaultseededhashalgorithm-hash-distribution"> Hash Distribution </a>
30/// * <a href="#bslh_defaultseededhashalgorithm-hash-consistency"> Hash Consistency </a>
31/// * <a href="#bslh_defaultseededhashalgorithm-usage"> Usage </a>
32/// * <a href="#bslh_defaultseededhashalgorithm-example-creating-and-using-a-hash-table"> Example: Creating and Using a Hash Table </a>
33///
34/// # Purpose {#bslh_defaultseededhashalgorithm-purpose}
35/// Provide a reasonable seeded hashing algorithm for default use.
36///
37/// # Classes {#bslh_defaultseededhashalgorithm-classes}
38///
39/// - bslh::DefaultSeededHashAlgorithm: a default seeded hashing algorithm
40///
41/// @see bslh_hash, bslh_siphashalgorithm, bslh_defaulthashalgorithm
42///
43/// # Description {#bslh_defaultseededhashalgorithm-description}
44/// `bslh::DefaultSeededHashAlgorithm` provides an unspecified
45/// default seeded hashing algorithm. The supplied algorithm is suitable for
46/// general purpose use in a hash table. The underlying algorithm is subject to
47/// change in future releases.
48///
49/// This class satisfies the requirements for seeded `bslh` hashing algorithms,
50/// defined in `bslh_seededhash.h`. More information can be found in the
51/// package level documentation for `bslh` (internal users can also find
52/// information here {TEAM BDE:USING MODULAR HASHING<GO>})
53///
54/// ## Security {#bslh_defaultseededhashalgorithm-security}
55///
56///
57/// In this context "security" refers to the ability of the algorithm to produce
58/// hashes that are not predictable by an attacker. Security is a concern when
59/// an attacker may be able to provide malicious input into a hash table,
60/// thereby causing hashes to collide to buckets, which degrades performance.
61/// There are *no* security guarantees made by `bslh::DefaultHashAlgorithm`,
62/// meaning attackers may be able to engineer keys that will cause a Denial of
63/// Service (DoS) attack in hash tables using this algorithm. Note that even if
64/// an attacker does not know the seed used to initialize this algorithm, they
65/// may still be able to produce keys that will cause a DoS attack in hash
66/// tables using this algorithm. If security is required, an algorithm that
67/// documents better secure properties should be used, such as
68/// `bslh::SipHashAlgorithm`.
69///
70/// ## Speed {#bslh_defaultseededhashalgorithm-speed}
71///
72///
73/// The default hash algorithm will compute a hash on the order of O(n) where
74/// `n` is the length of the input data. Note that this algorithm will produce
75/// hashes fast enough to be used to hash keys in a hash table. The chosen
76/// algorithm will be quicker than specialized algorithms such as SipHash, but
77/// not as fast as hashing using the identity function.
78///
79/// ## Hash Distribution {#bslh_defaultseededhashalgorithm-hash-distribution}
80///
81///
82/// The default hash algorithm will distribute hashes in a pseudo-random
83/// distribution across the key space. The hash function will exhibit avalanche
84/// behavior, meaning changing one bit of input will result in a 50% chance of
85/// each output bit changing. Avalanche behavior is enough to guarantee good
86/// key distribution, even when values are consecutive.
87///
88/// ## Hash Consistency {#bslh_defaultseededhashalgorithm-hash-consistency}
89///
90///
91/// The default hash algorithm guarantees only that hashes will remain
92/// consistent within a single process, meaning different hashes may be produced
93/// on machines of different endianness or even between runs on the same
94/// machine. Therefor it is not recommended to send hashes from
95/// `bslh::DefaultSeededHashAlgorithm` over a network. It is also not
96/// recommended to write hashes from `bslh::DefaultSeededHashAlgorithm` to any
97/// memory accessible by multiple machines.
98///
99/// ## Usage {#bslh_defaultseededhashalgorithm-usage}
100///
101///
102/// This section illustrates intended usage of this component.
103///
104/// ### Example: Creating and Using a Hash Table {#bslh_defaultseededhashalgorithm-example-creating-and-using-a-hash-table}
105///
106///
107/// Suppose we have any array of types that define `operator==`, and we want a
108/// fast way to find out if values are contained in the array. We can create a
109/// `HashTable` data structure that is capable of looking up values in O(1)
110/// time.
111///
112/// Further suppose that we will be storing futures (the financial instruments)
113/// in this table. Since futures have standardized names, we don't have to
114/// worry about any malicious values causing collisions. We will want to use a
115/// general purpose hashing algorithm with a good hash distribution and good
116/// speed. This algorithm will need to be in the form of a hash functor -- an
117/// object that will take objects stored in our array as input, and yield an
118/// integer value. The functor can pass the attributes of `TYPE` that are
119/// salient to hashing into the hashing algorithm, and then return the hash that
120/// is produced.
121///
122/// We can use the result of the hash function to index into our array of
123/// `buckets`. Each `bucket` is simply a pointer to a value in our original
124/// array of `TYPE` objects.
125///
126/// First, we define our `HashTable` template class, with the two type
127/// parameters: `TYPE` (the type being referenced) and `HASHER` (a functor that
128/// produces the hash).
129/// @code
130/// /// This class template implements a hash table providing fast lookup of
131/// /// an external, non-owned, array of values of (template parameter) `TYPE`.
132/// ///
133/// /// The (template parameter) `TYPE` shall have a transitive, symmetric
134/// /// `operator==` function. There is no requirement that it have any
135/// /// kind of creator defined.
136/// ///
137/// /// The `HASHER` template parameter type must be a functor with a method
138/// /// having the following signature:
139/// ///..
140/// /// size_t operator()(TYPE) const;
141/// /// -OR-
142/// /// size_t operator()(const TYPE&) const;
143/// ///..
144/// /// and `HASHER` shall have a publicly accessible default constructor
145/// /// and destructor.
146/// ///
147/// /// Note that this hash table has numerous simplifications because we
148/// /// know the size of the array and never have to resize the table.
149/// template <class TYPE, class HASHER>
150/// class HashTable {
151///
152/// // DATA
153/// const TYPE *d_values; // Array of values table is to
154/// // hold
155/// size_t d_numValues; // Length of 'd_values'.
156/// const TYPE **d_bucketArray; // Contains ptrs into d_values'
157/// size_t d_bucketArrayMask; // Will always be '2^N - 1'.
158/// HASHER d_hasher; // User supplied hashing algorithm
159///
160/// private:
161/// // PRIVATE ACCESSORS
162///
163/// /// Look up the specified `value`, having the specified `hashValue`,
164/// /// and load its index in `d_bucketArray` into the specified `idx`.
165/// /// If not found, return the vacant entry in `d_bucketArray` where
166/// /// it should be inserted. Return `true` if `value` is found and
167/// /// `false` otherwise.
168/// bool lookup(size_t *idx,
169/// const TYPE& value,
170/// size_t hashValue) const;
171///
172/// public:
173/// // CREATORS
174///
175/// /// Create a hash table referring to the specified `valuesArray`
176/// /// having length of the specified `numValues`. No value in
177/// /// `valuesArray` shall have the same value as any of the other
178/// /// values in `valuesArray`
179/// HashTable(const TYPE *valuesArray, size_t numValues);
180///
181/// /// Free up memory used by this hash table.
182/// ~HashTable();
183///
184/// // ACCESSORS
185///
186/// /// Return true if the specified `value` is found in the table and
187/// /// false otherwise.
188/// bool contains(const TYPE& value) const;
189/// };
190/// @endcode
191/// Then, we define a `Future` class, which holds a c-string `name`, char
192/// `callMonth`, and short `callYear`.
193/// @code
194/// /// This class identifies a future contract. It tracks the name, call
195/// /// month and year of the contract it represents, and allows equality
196/// /// comparison.
197/// class Future {
198///
199/// // DATA
200/// const char *d_name; // held, not owned
201/// const char d_callMonth;
202/// const short d_callYear;
203///
204/// public:
205/// // CREATORS
206///
207/// /// Create a `Future` object out of the specified `name`,
208/// /// `callMonth`, and `callYear`.
209/// Future(const char *name, const char callMonth, const short callYear)
210/// : d_name(name), d_callMonth(callMonth), d_callYear(callYear)
211/// {}
212///
213/// /// Create a `Future` with default values.
214/// Future() : d_name(""), d_callMonth('\0'), d_callYear(0)
215/// {}
216///
217/// // ACCESSORS
218///
219/// /// Return the month that this future expires.
220/// const char * getMonth() const
221/// {
222/// return &d_callMonth;
223/// }
224///
225/// /// Return the name of this future.
226/// const char * getName() const
227/// {
228/// return d_name;
229/// }
230///
231/// /// Return the year that this future expires.
232/// const short * getYear() const
233/// {
234/// return &d_callYear;
235/// }
236///
237/// /// Compare this to the specified `other` object and return true if
238/// /// they are equal.
239/// bool operator==(const Future& other) const
240/// {
241/// return (!strcmp(d_name, other.d_name)) &&
242/// d_callMonth == other.d_callMonth &&
243/// d_callYear == other.d_callYear;
244/// }
245/// };
246///
247/// /// Compare compare the specified `lhs` and `rhs` objects and return true
248/// /// if they are not equal.
249/// bool operator!=(const Future& lhs, const Future& rhs)
250/// {
251/// return !(lhs == rhs);
252/// }
253///
254/// @endcode
255/// Next, we need a hash functor for `Future`. We are going to use the
256/// `DefaultSeededHashAlgorithm` because it is a fast, general purpose hashing
257/// algorithm that will provide an easy way to combine the attributes of
258/// `Future` objects that are salient to hashing into one reasonable hash that
259/// will distribute the items evenly throughout the hash table. Moreover, when
260/// a new hashing algorithm is discovered to be a better default, we can be
261/// automatically be upgraded to use it as soon as
262/// `bslh::DefaultSeededHashAlgorithm` is updated.
263/// @code
264/// /// This struct is a functor that will apply the
265/// /// `DefaultSeededHashAlgorithm` to objects of type `Future`, using a
266/// /// generated seed.
267/// struct HashFuture {
268///
269/// /// Return the hash of the of the specified `future`. Note that
270/// /// this uses the `DefaultSeededHashAlgorithm` to quickly combine
271/// /// the attributes of `Future` objects that are salient to hashing
272/// /// into a hash suitable for a hash table.
273/// size_t operator()(const Future& future) const
274/// {
275/// @endcode
276/// Then, we use a `bslh::SeedGenerator` combined with a RNG (implementation not
277/// shown), to generate the seeds for our algorithm.
278/// @code
279/// char seed[DefaultSeededHashAlgorithm::k_SEED_LENGTH];
280/// SeedGenerator<SomeRNG> seedGenerator;
281/// seedGenerator.generateSeed(seed,
282/// DefaultSeededHashAlgorithm::k_SEED_LENGTH);
283///
284/// DefaultSeededHashAlgorithm hash(seed);
285/// @endcode
286/// Next, after seeding our algorithm, we pass data into it and operate on it
287/// just as easily as for a non-seeded algorithm
288/// @code
289///
290/// hash(future.getName(), strlen(future.getName()));
291/// hash(future.getMonth(), sizeof(char));
292/// hash(future.getYear(), sizeof(short));
293///
294/// return static_cast<size_t>(hash.computeHash());
295/// }
296/// };
297/// @endcode
298/// Then, we want to actually use our hash table on `Future` objects. We create
299/// an array of `Future`s based on data that was originally from some external
300/// source:
301/// @code
302/// Future futures[] = { Future("Swiss Franc", 'F', 2014),
303/// Future("US Dollar", 'G', 2015),
304/// Future("Canadian Dollar", 'Z', 2014),
305/// Future("British Pound", 'M', 2015),
306/// Future("Deutsche Mark", 'X', 2016),
307/// Future("Eurodollar", 'Q', 2017)};
308/// enum { NUM_FUTURES = sizeof futures / sizeof *futures };
309/// @endcode
310/// Next, we create our HashTable `hashTable`. We pass the functor that we
311/// defined above as the second argument:
312/// @code
313/// HashTable<Future, HashFuture> hashTable(futures, NUM_FUTURES);
314/// @endcode
315/// Now, we verify that each element in our array registers with count:
316/// @code
317/// for ( int i = 0; i < 6; ++i) {
318/// ASSERT(hashTable.contains(futures[i]));
319/// }
320/// @endcode
321/// Finally, we verify that futures not in our original array are correctly
322/// identified as not being in the set:
323/// @code
324/// ASSERT(!hashTable.contains(Future("French Franc", 'N', 2019)));
325/// ASSERT(!hashTable.contains(Future("Swiss Franc", 'X', 2014)));
326/// ASSERT(!hashTable.contains(Future("US Dollar", 'F', 2014)));
327/// @endcode
328/// @}
329/** @} */
330/** @} */
331
332/** @addtogroup bsl
333 * @{
334 */
335/** @addtogroup bslh
336 * @{
337 */
338/** @addtogroup bslh_defaultseededhashalgorithm
339 * @{
340 */
341
342#include <bslscm_version.h>
343
345
346#include <bslmf_assert.h>
347
348#include <bsls_assert.h>
349
350
351
352namespace bslh {
353
354/// This class wraps an unspecified default hashing algorithm, which takes a
355/// seed, that is appropriate for general purpose use such as generating
356/// hashes for a hash table.
357///
358/// See @ref bslh_defaultseededhashalgorithm
360
361 private:
362 // PRIVATE TYPES
363
364 /// Typedef indicating the algorithm currently being used by
365 /// `bslh::DefualtHashAlgorithm` to compute hashes. This algorithm is
366 /// subject to change.
368
369 // DATA
370
371 // Object storing the state of the chosen `InternalHashAlgorithm`.
372 InternalHashAlgorithm d_state;
373
374 // NOT IMPLEMENTED
376 // = delete;
377 // Do not allow copy construction.
378
379 /// Do not allow assignment.
381 const DefaultSeededHashAlgorithm& rhs); // = delete;
382
383 public:
384 // TYPES
385
386 /// Typedef indicating the value type returned by this algorithm.
388
389 // CONSTANTS
390
391 /// Seed length in bytes.
393
395
396 // CREATORS
397
398 /// Create a `bslh::DefaultSeededHashAlgorithm`, seeded with the
399 /// (`k_SEED_LENGTH` bytes) seed pointed to by the specified `seed`.
400 /// Each bit of the supplied seed will contribute to the final hash
401 /// produced by `computeHash()`. The behaviour is undefined unless
402 /// `seed` points to at least `k_SEED_LENGTH` bytes of initialized
403 /// memory.
404 explicit DefaultSeededHashAlgorithm(const char *seed);
405
406 /// Destroy this object.
408
409 // MANIPULATORS
410
411 /// Incorporate the specified `data`, of at least the specified
412 /// `numBytes`, into the internal state of the hashing algorithm. Every
413 /// bit of data incorporated into the internal state of the algorithm
414 /// will contribute to the final hash produced by `computeHash()`. The
415 /// same hash will be produced regardless of whether a sequence of bytes
416 /// is passed in all at once or through multiple calls to this member
417 /// function. Input where `numBytes` is 0 will have no effect on the
418 /// internal state of the algorithm. The behaviour is undefined unless
419 /// `data` points to a valid memory location with at least `numBytes`
420 /// bytes of initialized memory or `numBytes` is zero.
421 void operator()(const void *data, size_t numBytes);
422
423 /// Return the finalized version of the hash that has been accumulated.
424 /// Note that this changes the internal state of the object, so calling
425 /// `computeHash()` multiple times in a row will return different
426 /// results, and only the first result returned will match the expected
427 /// result of the algorithm. Also note that a value will be returned,
428 /// even if data has not been passed into `operator()`.
430};
431
432// ============================================================================
433// INLINE DEFINITIONS
434// ============================================================================
435
436// CREATORS
437inline
438DefaultSeededHashAlgorithm::DefaultSeededHashAlgorithm(const char *seed)
439: d_state(seed)
440{
441 BSLS_ASSERT(seed);
442}
443
444// MANIPULATORS
445inline
446void DefaultSeededHashAlgorithm::operator()(const void *data, size_t numBytes)
447{
448 BSLS_ASSERT(0 != data || 0 == numBytes);
449 d_state(data, numBytes);
450}
451
452inline
455{
456 return d_state.computeHash();
457}
458
459} // close package namespace
460
461
462#endif
463
464// ----------------------------------------------------------------------------
465// Copyright 2014 Bloomberg Finance L.P.
466//
467// Licensed under the Apache License, Version 2.0 (the "License");
468// you may not use this file except in compliance with the License.
469// You may obtain a copy of the License at
470//
471// http://www.apache.org/licenses/LICENSE-2.0
472//
473// Unless required by applicable law or agreed to in writing, software
474// distributed under the License is distributed on an "AS IS" BASIS,
475// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
476// See the License for the specific language governing permissions and
477// limitations under the License.
478// ----------------------------- END-OF-FILE ----------------------------------
479
480/** @} */
481/** @} */
482/** @} */
Definition bslh_defaultseededhashalgorithm.h:359
@ k_SEED_LENGTH
Definition bslh_defaultseededhashalgorithm.h:392
InternalHashAlgorithm::result_type result_type
Typedef indicating the value type returned by this algorithm.
Definition bslh_defaultseededhashalgorithm.h:387
void operator()(const void *data, size_t numBytes)
Definition bslh_defaultseededhashalgorithm.h:446
~DefaultSeededHashAlgorithm()=default
Destroy this object.
result_type computeHash()
Definition bslh_defaultseededhashalgorithm.h:454
Definition bslh_wyhashincrementalalgorithm.h:447
result_type computeHash()
Definition bslh_wyhashincrementalalgorithm.h:875
@ k_SEED_LENGTH
Definition bslh_wyhashincrementalalgorithm.h:461
bsls::Types::Uint64 result_type
Typedef indicating the value type returned by this algorithm.
Definition bslh_wyhashincrementalalgorithm.h:459
#define BSLS_ASSERT(X)
Definition bsls_assert.h:1804
#define BSLS_IDENT(str)
Definition bsls_ident.h:195
Definition bslh_defaulthashalgorithm.h:339