BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bslh_spookyhashalgorithm.h
Go to the documentation of this file.
1/// @file bslh_spookyhashalgorithm.h
2///
3/// The content of this file has been pre-processed for Doxygen.
4///
5
6
7// bslh_spookyhashalgorithm.h -*-C++-*-
8#ifndef INCLUDED_BSLH_SPOOKYHASHALGORITHM
9#define INCLUDED_BSLH_SPOOKYHASHALGORITHM
10
11#include <bsls_ident.h>
12BSLS_IDENT("$Id: $")
13
14/// @defgroup bslh_spookyhashalgorithm bslh_spookyhashalgorithm
15/// @brief Provide an implementation of the SpookyHash algorithm.
16/// @addtogroup bsl
17/// @{
18/// @addtogroup bslh
19/// @{
20/// @addtogroup bslh_spookyhashalgorithm
21/// @{
22///
23/// <h1> Outline </h1>
24/// * <a href="#bslh_spookyhashalgorithm-purpose"> Purpose</a>
25/// * <a href="#bslh_spookyhashalgorithm-classes"> Classes </a>
26/// * <a href="#bslh_spookyhashalgorithm-description"> Description </a>
27/// * <a href="#bslh_spookyhashalgorithm-security"> Security </a>
28/// * <a href="#bslh_spookyhashalgorithm-speed"> Speed </a>
29/// * <a href="#bslh_spookyhashalgorithm-hash-distribution"> Hash Distribution </a>
30/// * <a href="#bslh_spookyhashalgorithm-hash-consistency"> Hash Consistency </a>
31/// * <a href="#bslh_spookyhashalgorithm-subdivision-invariance"> Subdivision-Invariance </a>
32/// * <a href="#bslh_spookyhashalgorithm-usage"> Usage </a>
33/// * <a href="#bslh_spookyhashalgorithm-example-creating-and-using-a-hash-table"> Example: Creating and Using a Hash Table </a>
34///
35/// # Purpose {#bslh_spookyhashalgorithm-purpose}
36/// Provide an implementation of the SpookyHash algorithm.
37///
38/// # Classes {#bslh_spookyhashalgorithm-classes}
39///
40/// - bslh::SpookyHashAlgorithm: functor implementing the SpookyHash algorithm
41///
42/// @see bslh_hash, bslh_spookyhashalgorithmimp
43///
44/// # Description {#bslh_spookyhashalgorithm-description}
45/// `bslh::SpookyHashAlgorithm` implements the SpookyHash algorithm
46/// by Bob Jenkins. This algorithm is a general purpose algorithm that is known
47/// to quickly reach good avalanche performance and execute in time that is
48/// comparable to or faster than other industry standard algorithms such as
49/// CityHash. It is a good default choice for hashing values in unordered
50/// associative containers. For more information, see:
51/// http://burtleburtle.net/bob/hash/spooky.html
52///
53/// This class satisfies the requirements for regular `bslh` hashing algorithms
54/// and seeded `bslh` hashing algorithms, defined in `bslh_hash.h` and
55/// `bslh_seededhash.h` respectively. More information can be found in the
56/// package level documentation for `bslh` (internal users can also find
57/// information here {TEAM BDE:USING MODULAR HASHING<GO>})
58///
59/// ## Security {#bslh_spookyhashalgorithm-security}
60///
61///
62/// In this context "security" refers to the ability of the algorithm to produce
63/// hashes that are not predictable by an attacker. Security is a concern when
64/// an attacker may be able to provide malicious input into a hash table,
65/// thereby causing hashes to collide to buckets, which degrades performance.
66/// There are *no* security guarantees made by `bslh::SpookyHashAlgorithm`,
67/// meaning attackers may be able to engineer keys that will cause a Denial of
68/// Service (DoS) attack in hash tables using this algorithm. Note that even if
69/// an attacker does not know the seed used to initialize this algorithm, they
70/// may still be able to produce keys that will cause a DoS attack in hash
71/// tables using this algorithm. If security is required, an algorithm that
72/// documents better secure properties should be used, such as
73/// `bslh::SipHashAlgorithm`.
74///
75/// ## Speed {#bslh_spookyhashalgorithm-speed}
76///
77///
78/// This algorithm will compute a hash on the order of O(n) where `n` is the
79/// length of the input data. Note that this algorithm will produce hashes fast
80/// enough to be used to hash keys in a hash table. It is quicker than
81/// specialized algorithms such as SipHash, but not as fast as hashing using the
82/// identity function.
83///
84/// ## Hash Distribution {#bslh_spookyhashalgorithm-hash-distribution}
85///
86///
87/// Output hashes will be well distributed and will avalanche, which means
88/// changing one bit of the input will change approximately 50% of the output
89/// bits. This will prevent similar values from funneling to the same hash or
90/// bucket.
91///
92/// ## Hash Consistency {#bslh_spookyhashalgorithm-hash-consistency}
93///
94///
95/// This hash algorithm is endian-specific. It is designed for little-endian
96/// machines, however, it will run on big-endian machines. On big-endian
97/// machines, the Performance and Security Guarantees still apply, however the
98/// hashes produced will be different from those produced by the canonical
99/// implementation. The creator of this algorithm acknowledges this and says
100/// that the big-endian hashes are just as good as the little-endian ones. It
101/// is not recommended to send hashes from `bslh::SpookyHashAlgorihtm` over a
102/// network because of the differences in hashes across architectures. It is
103/// also not recommended to write hashes from `bslh::SpookyHashAlgorihtm` to any
104/// memory accessible by multiple machines.
105///
106/// ## Subdivision-Invariance {#bslh_spookyhashalgorithm-subdivision-invariance}
107///
108///
109/// Note that this algorithm is *subdivision-invariant* (see
110/// {@ref bslh_hash |Subdivision-Invariance}).
111///
112/// ## Usage {#bslh_spookyhashalgorithm-usage}
113///
114///
115/// This section illustrates intended usage of this component.
116///
117/// ### Example: Creating and Using a Hash Table {#bslh_spookyhashalgorithm-example-creating-and-using-a-hash-table}
118///
119///
120/// Suppose we have any array of types that define `operator==`, and we want a
121/// fast way to find out if values are contained in the array. We can create a
122/// `HashTable` data structure that is capable of looking up values in O(1)
123/// time.
124///
125/// Further suppose that we will be storing futures (the financial instruments)
126/// in this table. Since futures have standardized names, we don't have to
127/// worry about any malicious values causing collisions. We will want to use a
128/// general purpose hashing algorithm with a good hash distribution and good
129/// speed. This algorithm will need to be in the form of a hash functor -- an
130/// object that will take objects stored in our array as input, and yield a
131/// 64-bit int value. The functor can pass the attributes of the `TYPE` that
132/// are salient to hashing into the hashing algorithm, and then return the hash
133/// that is produced.
134///
135/// We can use the result of the hash function to index into our array of
136/// `buckets`. Each `bucket` is simply a pointer to a value in our original
137/// array of `TYPE` objects.
138///
139/// First, we define our `HashTable` template class, with the two type
140/// parameters: `TYPE` (the type being referenced) and `HASHER` (a functor that
141/// produces the hash).
142/// @code
143/// template <class TYPE, class HASHER>
144/// class HashTable {
145/// // This class template implements a hash table providing fast lookup of
146/// // an external, non-owned, array of values of (template parameter)
147/// // 'TYPE'.
148/// //
149/// // The (template parameter) 'TYPE' shall have a transitive, symmetric
150/// // 'operator==' function. There is no requirement that it have any
151/// // kind of creator defined.
152/// //
153/// // The 'HASHER' template parameter type must be a functor with a method
154/// // having the following signature:
155/// //..
156/// // size_t operator()(TYPE) const;
157/// // -OR-
158/// // size_t operator()(const TYPE&) const;
159/// //..
160/// // and 'HASHER' shall have a publicly accessible default constructor
161/// // and destructor.
162/// //
163/// // Note that this hash table has numerous simplifications because we
164/// // know the size of the array and never have to resize the table.
165///
166/// // DATA
167/// const TYPE *d_values; // Array of values table is to
168/// // hold
169/// size_t d_numValues; // Length of 'd_values'.
170/// const TYPE **d_bucketArray; // Contains ptrs into
171/// // 'd_values'
172/// size_t d_bucketArrayMask; // Will always be '2^N - 1'.
173/// HASHER d_hasher;
174///
175/// private:
176/// // PRIVATE ACCESSORS
177/// bool lookup(size_t *idx,
178/// const TYPE& value,
179/// size_t hashValue) const;
180/// // Look up the specified 'value', having the specified 'hashValue',
181/// // and load its index in 'd_bucketArray' into the specified 'idx'.
182/// // If not found, return the vacant entry in 'd_bucketArray' where
183/// // it should be inserted. Return 'true' if 'value' is found and
184/// // 'false' otherwise.
185///
186/// public:
187/// // CREATORS
188/// HashTable(const TYPE *valuesArray,
189/// size_t numValues);
190/// // Create a hash table referring to the specified 'valuesArray'
191/// // having length of the specified 'numValues'. No value in
192/// // 'valuesArray' shall have the same value as any of the other
193/// // values in 'valuesArray'
194///
195/// ~HashTable();
196/// // Free up memory used by this hash table.
197///
198/// // ACCESSORS
199/// bool contains(const TYPE& value) const;
200/// // Return true if the specified 'value' is found in the table and
201/// // false otherwise.
202/// };
203/// @endcode
204/// Then, we define a `Future` class, which holds a c-string `name`, char
205/// `callMonth`, and short `callYear`.
206/// @code
207/// class Future {
208/// // This class identifies a future contract. It tracks the name, call
209/// // month and year of the contract it represents, and allows equality
210/// // comparison.
211///
212/// // DATA
213/// const char *d_name; // held, not owned
214/// const char d_callMonth;
215/// const short d_callYear;
216///
217/// public:
218/// // CREATORS
219/// Future(const char *name, const char callMonth, const short callYear)
220/// : d_name(name), d_callMonth(callMonth), d_callYear(callYear)
221/// // Create a 'Future' object out of the specified 'name',
222/// // 'callMonth', and 'callYear'.
223/// {}
224///
225/// Future() : d_name(""), d_callMonth('\0'), d_callYear(0)
226/// // Create a 'Future' with default values.
227/// {}
228///
229/// // ACCESSORS
230/// const char * getMonth() const
231/// // Return the month that this future expires.
232/// {
233/// return &d_callMonth;
234/// }
235///
236/// const char * getName() const
237/// // Return the name of this future
238/// {
239/// return d_name;
240/// }
241///
242/// const short * getYear() const
243/// // Return the year that this future expires
244/// {
245/// return &d_callYear;
246/// }
247///
248/// bool operator==(const Future& other) const
249/// // Compare this to the specified 'other' object and return true if
250/// // they are equal
251/// {
252/// return (!strcmp(d_name, other.d_name)) &&
253/// d_callMonth == other.d_callMonth &&
254/// d_callYear == other.d_callYear;
255/// }
256/// };
257///
258/// bool operator!=(const Future& lhs, const Future& rhs)
259/// // Compare compare the specified 'lhs' and 'rhs' objects and return
260/// // true if they are not equal
261/// {
262/// return !(lhs == rhs);
263/// }
264/// @endcode
265/// Next, we need a hash functor for `Future`. We are going to use the
266/// `SpookyHashAlgorithm` because it is a fast, general purpose hashing
267/// algorithm that will provide an easy way to combine the attributes of
268/// `Future` objects that are salient to hashing into one reasonable hash that
269/// will distribute the items evenly throughout the hash table.
270/// @code
271/// struct HashFuture {
272/// // This struct is a functor that will apply the SpookyHashAlgorithm to
273/// // objects of type 'Future'.
274///
275/// size_t operator()(const Future& future) const
276/// // Return the hash of the of the specified 'future'. Note that
277/// // this uses the 'SpookyHashAlgorithm' to quickly combine the
278/// // attributes of 'Future' objects that are salient to hashing into
279/// // a hash suitable for a hash table.
280/// {
281/// SpookyHashAlgorithm hash;
282///
283/// hash(future.getName(), strlen(future.getName()));
284/// hash(future.getMonth(), sizeof(char));
285/// hash(future.getYear(), sizeof(short));
286///
287/// return static_cast<size_t>(hash.computeHash());
288/// }
289/// };
290/// @endcode
291/// Then, we want to actually use our hash table on `Future` objects. We create
292/// an array of `Future`s based on data that was originally from some external
293/// source:
294/// @code
295/// Future futures[] = { Future("Swiss Franc", 'F', 2014),
296/// Future("US Dollar", 'G', 2015),
297/// Future("Canadian Dollar", 'Z', 2014),
298/// Future("British Pound", 'M', 2015),
299/// Future("Deutsche Mark", 'X', 2016),
300/// Future("Eurodollar", 'Q', 2017)};
301/// enum { NUM_FUTURES = sizeof futures / sizeof *futures };
302/// @endcode
303/// Next, we create our HashTable `hashTable`. We pass the functor that we
304/// defined above as the second argument:
305/// @code
306/// HashTable<Future, HashFuture> hashTable(futures, NUM_FUTURES);
307/// @endcode
308/// Now, we verify that each element in our array registers with count:
309/// @code
310/// for ( int i = 0; i < 6; ++i) {
311/// assert(hashTable.contains(futures[i]));
312/// }
313/// @endcode
314/// Finally, we verify that futures not in our original array are correctly
315/// identified as not being in the set:
316/// @code
317/// assert(!hashTable.contains(Future("French Franc", 'N', 2019)));
318/// assert(!hashTable.contains(Future("Swiss Franc", 'X', 2014)));
319/// assert(!hashTable.contains(Future("US Dollar", 'F', 2014)));
320///
321/// @endcode
322/// @}
323/** @} */
324/** @} */
325
326/** @addtogroup bsl
327 * @{
328 */
329/** @addtogroup bslh
330 * @{
331 */
332/** @addtogroup bslh_spookyhashalgorithm
333 * @{
334 */
335
336#include <bslscm_version.h>
337
339
341
342#include <bsls_assert.h>
343#include <bsls_byteorder.h>
344#include <bsls_platform.h>
345#include <bsls_types.h>
346
347#include <stddef.h> // for 'size_t'
348#include <string.h> // for 'memcpy'
349
350
351namespace bslh {
352
353
354 // ===============================
355 // class bslh::SpookyHashAlgorithm
356 // ===============================
357
358/// This class wraps an implementation of the "SpookyHash" hash algorithm in
359/// an interface that is usable in the modular hashing system in `bslh`.
360///
361/// See @ref bslh_spookyhashalgorithm
363
364 private:
365 // PRIVATE TYPES
366
367 /// Typedef for a 64-bit integer type used in the hashing algorithm.
368 typedef bsls::Types::Uint64 Uint64;
369
370 // DATA
371
372 // Object that contains the actual implementation of the SpookHash
373 // algorithm.
375
376 // NOT IMPLEMENTED
377
378 /// Do not allow copy construction.
379 SpookyHashAlgorithm(const SpookyHashAlgorithm& original); // = delete;
380
381 /// Do not allow assignment.
382 SpookyHashAlgorithm& operator=(const SpookyHashAlgorithm& rhs);// = delete;
383
384 public:
385 // TYPES
386
387 /// Typedef indicating the value type returned by this algorithm.
389
390
391 // CONSTANTS
392 enum { k_SEED_LENGTH = 16 }; // Seed length in bytes.
393
394 private:
395 // PRIVATE CLASS METHODS
396
397 /// The specified `seed` points to a possibly non-aligned `Uint64`,
398 /// return the value of that `Uint64` without violating the host
399 /// machine's alignment requirements.
400 static
401 Uint64 getSeed(const char *seed);
402
403 public:
404 // CREATORS
405
406 /// Create a `SpookyHashAlgorithm` using a default initial seed.
408
409 /// Create a `bslh::SpookyHashAlgorithm`, seeded with a 128-bit
410 /// (`k_SEED_LENGTH` bytes) seed pointed to by the specified `seed`.
411 /// Each bit of the supplied seed will contribute to the final hash
412 /// produced by `computeHash()`. The behaviour is undefined unless
413 /// `seed` points to at least 16 bytes of initialized memory.
414 explicit SpookyHashAlgorithm(const char *seed);
415
417 // Destroy this object.
418
419 // MANIPULATORS
420
421 /// Incorporate the specified `data`, of at least the specified
422 /// `numBytes`, into the internal state of the hashing algorithm. Every
423 /// bit of data incorporated into the internal state of the algorithm
424 /// will contribute to the final hash produced by `computeHash()`. The
425 /// same hash value will be produced regardless of whether a sequence of
426 /// bytes is passed in all at once or through multiple calls to this
427 /// member function. Input where `numBytes` is 0 will have no effect on
428 /// the internal state of the algorithm. The behaviour is undefined
429 /// unless `data` points to a valid memory location with at least
430 /// `numBytes` bytes of initialized memory or `numBytes` is zero.
431 void operator()(const void *data, size_t numBytes);
432
433 /// Return the finalized version of the hash that has been accumulated.
434 /// Note that this changes the internal state of the object, so calling
435 /// `computeHash()` multiple times in a row will return different
436 /// results, and only the first result returned will match the expected
437 /// result of the algorithm. Also note that a value will be returned,
438 /// even if data has not been passed into `operator()`
440};
441
442// ============================================================================
443// INLINE DEFINITIONS
444// ============================================================================
445
446// PRIVATE CLASS METHODS
447inline
448SpookyHashAlgorithm::Uint64 SpookyHashAlgorithm::getSeed(const char *seed)
449{
450 Uint64 ret;
451 ::memcpy(&ret, seed, sizeof(ret));
452 return ret;
453}
454
455// CREATORS
456inline
460
461inline
463: d_state(getSeed(seed), getSeed(seed + sizeof(Uint64)))
464{}
465
466// MANIPULATORS
467inline
468void SpookyHashAlgorithm::operator()(const void *data, size_t numBytes)
469{
470 BSLS_ASSERT(0 != data || 0 == numBytes);
471
472 d_state.update(data, numBytes);
473}
474
475inline
477{
478 bsls::Types::Uint64 h1, h2;
479 d_state.finalize(&h1, &h2);
480 return h1;
481}
482
483} // close package namespace
484
485// ============================================================================
486// TYPE TRAITS
487// ============================================================================
488
489namespace bslmf {
490template <>
491struct IsBitwiseMoveable<bslh::SpookyHashAlgorithm>
492 : bsl::true_type {};
493} // close namespace bslmf
494
495
496
497#endif
498
499// ----------------------------------------------------------------------------
500// Copyright 2014 Bloomberg Finance L.P.
501//
502// Licensed under the Apache License, Version 2.0 (the "License");
503// you may not use this file except in compliance with the License.
504// You may obtain a copy of the License at
505//
506// http://www.apache.org/licenses/LICENSE-2.0
507//
508// Unless required by applicable law or agreed to in writing, software
509// distributed under the License is distributed on an "AS IS" BASIS,
510// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
511// See the License for the specific language governing permissions and
512// limitations under the License.
513// ----------------------------- END-OF-FILE ----------------------------------
514
515/** @} */
516/** @} */
517/** @} */
Definition bslh_spookyhashalgorithmimp.h:291
void update(const void *message, size_t length)
void finalize(Uint64 *hash1, Uint64 *hash2)
Definition bslh_spookyhashalgorithm.h:362
void operator()(const void *data, size_t numBytes)
Definition bslh_spookyhashalgorithm.h:468
@ k_SEED_LENGTH
Definition bslh_spookyhashalgorithm.h:392
SpookyHashAlgorithm()
Create a SpookyHashAlgorithm using a default initial seed.
Definition bslh_spookyhashalgorithm.h:457
result_type computeHash()
Definition bslh_spookyhashalgorithm.h:476
bsls::Types::Uint64 result_type
Typedef indicating the value type returned by this algorithm.
Definition bslh_spookyhashalgorithm.h:388
#define BSLS_ASSERT(X)
Definition bsls_assert.h:1804
#define BSLS_IDENT(str)
Definition bsls_ident.h:195
Definition bslh_defaulthashalgorithm.h:339
Definition bdlbb_blob.h:576
Definition bslmf_isbitwisemoveable.h:718
unsigned long long Uint64
Definition bsls_types.h:137