BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bslh.h
Go to the documentation of this file.
1
/// @file bslh.h
2
///
3
///
4
/// @defgroup bslh Package bslh
5
/// @brief Basic Standard Library Hashing (bslh)
6
/// @addtogroup bsl
7
/// @{
8
/// @addtogroup bslh
9
/// [bslh]: group__bslh.html
10
/// @{
11
///
12
/// # Purpose {#bslh-purpose}
13
/// Provide a framework for hashing types using swappable algorithms.
14
///
15
/// # Mnemonic {#bslh-mnemonic}
16
/// Basic Standard Library Hashing (bslh)
17
///
18
/// # Description {#bslh-description}
19
/// The 'bslh' package provides components for a more modular hashing
20
/// implementation than is found in the standard. This implementation is based on
21
/// ISO C++ Proposal N3980. An internal proposal for this is available at {TEAM
22
/// BDE:MODULAR HASHING<GO>}. This package provides hashing algorithms as well as
23
/// 'Hash' and 'SeededHash' structs which allow different algorithms to be applied
24
/// to any type that has a 'hashAppend' function. This document will explain the
25
/// overall benefits of the system, what type implementers need to do to make
26
/// their types hashable in this system, what type users need to do to hash
27
/// different types, how to extend different pieces of this system, and will
28
/// provide a summary of all the components in this package. All sections are
29
/// independent and can be read and used without having read the other sections.
30
///
31
/// ## Table of Contents
32
///
33
/// * Terminology
34
/// * Avalanche
35
/// * Denial of Service (DoS)
36
/// * Funneling
37
/// * Salient to Hashing
38
///
39
/// * Why Use This System?
40
/// * Better and More Predictable Performance
41
/// * Less Code Duplication
42
/// * Easier to Make Types Hashable
43
/// * Hash Combining
44
/// * Swappable Algorithms
45
///
46
/// * Type Implementers
47
/// * 'hashAppend'
48
/// * Determining what to Hash
49
/// * Hashing Other User Defined Types
50
/// * Hashing Pointers (especially 'const char *')
51
/// * Converting Existing Types
52
///
53
/// * Type Users
54
/// * bsl::hash
55
/// * bslh::Hash
56
/// * bslh::SeededHash, bslh::SeedGenerator, and Secure Hashing
57
/// * Seeded Algorithms
58
/// * bslh::SeedGenerator
59
/// * bslh::SeededHash
60
/// * Hashing Performance and Fundamental Integer Types
61
/// * Choosing a Hashing Algorithm
62
///
63
/// * Extending the System
64
/// * Hashing Algorithm Functors
65
/// * Hashing Algorithm Wrappers (bslh::Hash)
66
/// * Seed Generator
67
///
68
/// * Hierarchical Synopsis
69
///
70
/// * Component Synopsis
71
///
72
/// * Component Overview
73
/// * @ref bslh_defaulthashalgorithm
74
/// * @ref bslh_defaultseededhashalgorithm
75
/// * @ref bslh_hash
76
/// * @ref bslh_seededhash
77
/// * @ref bslh_seedgenerator
78
/// * @ref bslh_siphashalgorithm
79
/// * @ref bslh_spookyhashalgorithm
80
/// * @ref bslh_spookyhashalgorithmimp
81
/// * @ref bslh_wyhashincrementalalgorithm
82
///
83
/// ## Terminology
84
///
85
///
86
/// ### Avalanche
87
///
88
/// Changing one bit in the input to the hashing algorithm results in an
89
/// "avalanche", which causes each output bit to have a 50% probability of
90
/// changing. The avalanche property means that two very similar values will
91
/// produce completely dissimilar hashes.
92
///
93
/// ### Denial of Service (DoS)
94
///
95
/// Within the context of hash tables, Denial of Service (DoS) attacks refer to an
96
/// attacker causing many hash table keys to collide to the same bucket. These
97
/// collisions cause look-ups in the hash table to become very time consuming
98
/// linear searches.
99
///
100
/// ### Funneling
101
///
102
/// An undesirable property of hashing algorithms that results in a large number
103
/// of collisions when the inputs differ by only a few bits. Funneling is related
104
/// to the avalanche property of a hashing algorithm and is essentially the
105
/// opposite of avalanche. More information can be found at
106
/// 'http://burtleburtle.net/bob/hash/evahash.html'.
107
///
108
/// ### Salient to Hashing
109
///
110
/// A property of attributes (or fields) of a type. An attribute is considered
111
/// salient to hashing if it should be incorporated into the bytes that are used
112
/// to produce a hash of a given object. As a general rule is that every
113
/// attribute used in 'operator==' is usually salient to hashing. This term is
114
/// explored more in depth later.
115
///
116
/// ## Why Use This System
117
///
118
/// There are numerous benefits to both type creators and users in this new
119
/// system:
120
///
121
/// ### Better and More Predictable Performance
122
///
123
/// This modular hashing system allows users to use known good hashing algorithms
124
/// to avoid performance issues. The graph below depicts 'unordered_map'
125
/// (expected case O(1)) taking longer to find elements that map. This is a
126
/// real-world benchmark. Internal users can read more about these tests at {TEAM
127
/// BDE:FLAT MAP<GO>}. The anomalous behavior arose from the use of a poorly
128
/// written, non-standard, hashing algorithm, which caused similar strings to hash
129
/// to the same value resulting in many collisions:
130
/// @code
131
/// Time to Find a pair<string, string> in map vs unordered_map
132
///
133
/// 18 |
134
/// | O = map X = unordered_map
135
/// 16 |
136
/// T | X
137
/// i 14 | |
138
/// m | /
139
/// e 12 | /-O
140
/// | /
141
/// i 10 | - /
142
/// n | - /
143
/// 8 | O /
144
/// S | -- /
145
/// e 6 | - /
146
/// c | -- /
147
/// o 4 | ----O X
148
/// n | ----O--- --
149
/// d 2 | ---O----- ----
150
/// s | XO-------XO-------X--------X--------X
151
/// |____________________________________________________________________
152
/// | | | | | | |
153
/// 1 10 100 1000 10000 100000 1000000
154
/// Elements in Collection
155
/// @endcode
156
///
157
/// ### Less Code Duplication
158
///
159
/// In the standard C++03 hashing system, hashing algorithms were often copied
160
/// into the 'bsl::hash' template specialization on each type. This resulted in
161
/// lots of code duplication. In the new system, algorithms are not implemented
162
/// directly on the type, so no algorithm duplication occurs. The little bit of
163
/// code that is written on the type is specific to that type and wouldn't be
164
/// copied elsewhere.
165
///
166
/// ### Easier to Make Types Hashable
167
///
168
/// In the standard C++03 hashing system, type implementers had to worry about
169
/// writing a hashing algorithm for their type. This required figuring out what
170
/// constituted a good hash value and how to generate one with a given set of data
171
/// members. Thinking about what makes a good hash is no longer the job of the
172
/// type implementer. They simply have to declare what data members they want to
173
/// contribute the hash and then they are done.
174
///
175
/// ### Hash Combining
176
///
177
/// In the standard C++03 hashing system, if a type needed to use more than one
178
/// data member in its hash calculation, there was no proper way for them to
179
/// combine the hashes from multiple data members into one final hash. Poor
180
/// methods of hash combining such as XORing could result in huge numbers of
181
/// collisions. The new, modular hashing system offers 'hashAppend' to combine an
182
/// unlimited number of data members into one, good, hash.
183
///
184
/// ### Swappable Algorithms
185
///
186
/// In the standard C++03 hashing system, anyone who wanted to hash a type had to
187
/// use the algorithm written by the type implementer, or they had to write their
188
/// own algorithm in a functor (without access to private data members of course).
189
/// Unfortunately, hashing algorithms are not one size fits all. In some cases, a
190
/// fast identity hash is fine. In other cases a slower hash that is secure
191
/// against Denial of Service (DoS) attacks is required. The new, modular hashing
192
/// system offers a suite of hashing algorithms that have been vetted and are know
193
/// to have good characteristics for different situations. Swapping out one
194
/// algorithm for another only requires a type's users to change one line of code,
195
/// as shown here:
196
/// @code
197
/// // Uses implicitly 'DefaultHashAlgorithm'.
198
/// bsl::unordered_map<MyType, int, bslh::Hash<>> unorderedMap;
199
///
200
/// // Only a single line change is required to use 'SpookyHashAlgorithm'
201
/// bsl::unordered_map<MyType,
202
/// int,
203
/// bslh::Hash<bslh::SpookyHashAlgorithm>> unorderedMap;
204
/// @endcode
205
///
206
/// ## Type Implementers
207
///
208
/// It is the type implementer's job to implement 'hashAppend' (explained below)
209
/// on their type, much like they have to implement 'swap' on their type. It is
210
/// important to note that implementing 'hashAppend' on 'MyType' is fully backward
211
/// compatible with 'bsl::hash<MyType>'. That means that when you implement
212
/// 'hashAppend' on 'MyNewType', 'bsl::hash<MyNewType>' will automatically pick it
213
/// up without you ever having to write your own template specialization. For
214
/// existing types that already have a 'bsl::hash<MyExistingType>' specialization,
215
/// it is recommended that type implementers delete 'bsl::hash<MyExistingType>'
216
/// once they have implemented 'hashAppend'. Deleting the 'bsl::hash' template
217
/// specialization for 'MyExistingType' will not break any code because
218
/// 'bsl::hash<TYPE>' automatically redirects to 'bslh::Hash<>'. Note that
219
/// deleting the 'bsl::hash' template specialization for 'MyExistingType' will
220
/// cause the hash value returned by calls to 'bsl::hash<MyExistingType>' to
221
/// change, but this is explicitly allowed in the 'bsl::hash' contract.
222
///
223
/// ### hashAppend
224
///
225
/// The fundamental piece of this system, at least for type implementers, is the
226
/// 'hashAppend' free function. What this free function does is pass the data
227
/// members of a class which need to be hashed, into a hashing algorithm. It
228
/// removes the need for the type implementer to actually write the hashing
229
/// algorithm. An example implementation can be seen below:
230
/// @code
231
/// namespace BloombergLP {
232
/// namespace NamespaceForBoxes {
233
///
234
/// class Box {
235
/// // A value semantic type that represents a box drawn on to a Cartesian
236
/// // plane.
237
/// private:
238
/// Point d_position;
239
/// int d_length;
240
/// int d_width;
241
/// public:
242
/// Box(Point position, int length, int width);
243
/// // Create a box with the specified 'length' and 'width', with its
244
/// // upper left corner at the specified 'position'
245
///
246
/// friend bool operator==(const Box &left, const Box &right);
247
///
248
/// template <class HASH_ALGORITHM>
249
/// friend
250
/// void hashAppend(HASH_ALGORITHM &hashAlg, const Box &box);
251
/// // Apply the specified 'hashAlg' to the specified 'box'
252
/// };
253
///
254
/// Box::Box(Point position, int length, int width)
255
/// : d_position(position)
256
/// , d_length(length)
257
/// , d_width(width)
258
/// {
259
/// }
260
///
261
/// bool operator==(const Box &left, const Box &right)
262
/// {
263
/// return (left.d_position == right.d_position)
264
/// && (left.d_length == right.d_length)
265
/// && (left.d_width == right.d_width);
266
/// }
267
///
268
/// template <class HASH_ALGORITHM>
269
/// void hashAppend(HASH_ALGORITHM &hashAlg, const Box &box)
270
/// {
271
/// using bslh::hashAppend;
272
/// hashAppend(hashAlg, box.d_position);
273
/// hashAppend(hashAlg, box.d_length);
274
/// hashAppend(hashAlg, box.d_width);
275
/// }
276
///
277
/// } // close package namespace
278
/// } // close enterprise namespace
279
/// @endcode
280
/// A few key features of the 'hashAppend' fuction:
281
///
282
/// 1. 'hashAppend' is a free function that can be picked up through argument
283
/// dependent look-up (ADL).
284
///
285
/// 1. In order to ensure 'hashAppend' can be found through ADL, the function
286
/// must begin with 'using bslh::hashAppend'. Note that most of the time
287
/// this isn't required because most 'HASH_ALGORITHM's will be implemented in
288
/// 'bslh' and thus ADL will already be looking in 'bslh'. The using
289
/// statement is still required, however, to support algorithms that are
290
/// implemented outside of 'blsh'.
291
///
292
/// 2. 'hashAppend' is defined the header file of the type for which it is
293
/// implemented.
294
///
295
/// 3. If 'hashAppend' requires access to private data, it is declared as a friend
296
/// to the type which it is hashing.
297
///
298
/// 4. 'hashAppend' accepts a reference to a templated hashing algorithm functor,
299
/// and a 'const' reference to the type which it is hashing.
300
///
301
/// 5. 'hashAppend' recursively calls 'hashAppend' on each of the
302
/// salient-attributes of the type that it is hashing, propagating the hash
303
/// algorithm functor with each invocation.
304
///
305
/// This is the extent of the work for a type implementer. Once a type
306
/// implementer knows what members need to contribute to a hash value, the type
307
/// implementer simply calls 'hashAppend' on members as shown above. Those
308
/// members will then be passed into and used by whatever algorithm a consumer of
309
/// the type wants to apply.
310
///
311
/// ### Determining what to Hash
312
///
313
/// Type implementers should be familiar with the rules for hash functions. There
314
/// are two main rules:
315
///
316
/// 1. If 'x == y', then both 'x' and 'y' shall produce the same hash.
317
///
318
/// 2. If 'x != y', then 'x' and 'y' should not produce the same hash.
319
///
320
/// For example, if the first rule is violated, 'unordered_map' will encounter
321
/// runtime errors. If the second rule is violated, collisions will occur and the
322
/// performance of 'unordered_map' will suffer.
323
///
324
/// Since the rule about hashing are predicated on 'operator==', the easiest way
325
/// to determine what to hash is to look at 'operator=='. For example, lets
326
/// reexamine 'operator==' from our earlier code sample:
327
/// @code
328
/// bool operator==(const Box &left, const Box &right)
329
/// {
330
/// return (left.d_position == right.d_position)
331
/// && (left.d_length == right.d_length)
332
/// && (left.d_width == right.d_width);
333
/// }
334
/// @endcode
335
/// In order to ensure that the first rule of hashing if followed, we must only
336
/// include data members in our hash if one of the following is true:
337
///
338
/// * The data member is used in the 'operator==' comparison.
339
///
340
/// * The data member is a entirely dependent on data that is used in the
341
/// 'operator==' comparison.
342
///
343
/// * The data member is constant.
344
///
345
/// If we do not meet any of the above criteria, we open ourselves to the
346
/// possibility that two objects that compare equal will hash to different values.
347
/// If two equal objects hash to different values, hash-based data structures
348
/// ('unordered_map' being the primary concern) will break. And example of
349
/// something not to include in 'hashAppend' would be the 'capacity()' of a
350
/// vector.
351
///
352
/// In order to make the second rule of hashing remain true, we should include
353
/// everything that appears in 'operator==' in our hash. The more data we can put
354
/// into the hashing algorithm, the higher the chances of us getting unique
355
/// outputs. By following these suggestions, we produced the 'hashAppend'
356
/// function from our earlier code example:
357
/// @code
358
/// template <class HASH_ALGORITHM>
359
/// void hashAppend(HASH_ALGORITHM &hashAlg, const Box &box)
360
/// {
361
/// using bslh::hashAppend;
362
/// hashAppend(hashAlg, box.d_position);
363
/// hashAppend(hashAlg, box.d_length);
364
/// hashAppend(hashAlg, box.d_width);
365
/// }
366
/// @endcode
367
/// It is worth noting that sometimes even data members that don't actually
368
/// contribute entropy can still help us produce a more unique unique outputs.
369
/// For example consider 'vector<vector<int> >'. If we just hashed the elements
370
/// in the vector, then an empty 'vector<vector<int> >' would generate the same
371
/// hash value as a 'vector<vector<int> >' containing one (or more) empty
372
/// 'vector<int>' objects. This violates the second rule of hashing (above). By
373
/// also passing 'vector.length()' (which is used in equality) into the algorithm,
374
/// the two vectors will hash to different values.
375
///
376
/// ### Hashing Other User Defined Types
377
///
378
/// As we can see in code sample above showing 'hashAppend', 'hashAppend' is
379
/// called on a data member of the user-defined type, 'Point'. This code will not
380
/// compile if 'Point' is a type for which 'hashAppend' has not been implemented.
381
/// The best route to take is to have the creator of 'Point' go back and add a
382
/// 'hashAppend' function. If the creator, or somebody knowledgeable about the
383
/// type, is not available to add the 'hashAppend' function, there is a work
384
/// around. Assuming 'Point' supports the old system and has a 'bsl::hash<Point>'
385
/// specialization, you can just hash the 'Point' and then pass the resulting
386
/// 'size_t' into 'hashAppend' just like you would with any other integer data
387
/// member. The following code sample shows how we can modify 'Box's 'hashAppend'
388
/// function to handle 'Point' without a 'hashAppend' free function for 'Point':
389
/// @code
390
/// template <class HASH_ALGORITHM>
391
/// void hashAppend(HASH_ALGORITHM &hashAlg, const Box &box)
392
/// {
393
/// using bslh::hashAppend;
394
/// hashAppend(hashAlg, bsl::hash<Point>()(box.d_position));
395
/// hashAppend(hashAlg, box.d_length);
396
/// hashAppend(hashAlg, box.d_width);
397
/// }
398
/// @endcode
399
/// In the above code sample, the first call to 'hashAppend' is now effectively
400
/// calling 'hashAppend' on an integer type (the resulting hash from 'bsl::hash'),
401
/// rather than on 'Point'. This trick allows new development to use this modular
402
/// hashing system without having to wait for existing code to be upgraded.
403
///
404
/// ### Hashing Pointers (especially const char )
405
///
406
/// Pointers, particularly C-Strings (in the 'const char *' form), are an
407
/// unfortunate exception in hashing. Because the standard mandates that we must
408
/// be able to hash pointers, there is no way to distinguish a null terminated
409
/// 'const char *' (C-String) from a regular pointer to a 'char'. Because of
410
/// this, hashing C-Strings has slightly different semantics. Instead of calling
411
/// 'hashAppend' on the pointer, we must pass our C-String directly into the
412
/// hashing algorithm functor. All of the hashing algorithm functors in 'bslh'
413
/// have the function signature shown below:
414
/// @code
415
/// void operator()(const char *data, size_t length);
416
/// // Incorporates the specified 'data' of 'length' bytes into the internal
417
/// // state of the hashing algorithm.
418
/// @endcode
419
/// As we can see, the hashing algorithm functor takes a pointer to the start of
420
/// the data and a length in bytes. To hash a C-String, we call the hashing
421
/// algorithm functor with our pointer and the length of the C-String which we
422
/// have stored or pre-calculated. An example of this is the 'hashAppend'
423
/// function for string, shown here:
424
/// @code
425
/// template <class HASHALG,
426
/// class CHAR_TYPE,
427
/// class CHAR_TRAITS,
428
/// class ALLOCATOR>
429
/// void hashAppend(HASHALG& hashAlg,
430
/// const basic_string<CHAR_TYPE,
431
/// CHAR_TRAITS,
432
/// ALLOCATOR>& input)
433
/// {
434
/// using bslh::hashAppend;
435
/// hashAlg(input.data(), sizeof(CHAR_TYPE)*input.size());
436
/// }
437
/// @endcode
438
/// This technique can be applied to any case where you want to hash contiguous
439
/// data. It is especially important that the your data is completely contiguous,
440
/// because padding bits and the like could result in the same data hashing to
441
/// different values, which violates the first rule of hashing.
442
///
443
/// ### Converting Existing Types
444
///
445
/// The same process as above should be followed when implementing 'hashAppend' on
446
/// user defined types which already specialize 'bsl::hash'. One extra step that
447
/// is required is the 'bsl::hash' template specialization must be removed once
448
/// 'hashAppend' has been implemented. 'unordered_map's will still function after
449
/// the 'bsl::hash' specialization has been removed, since 'bsl::hash<TYPE>'
450
/// automatically calls 'bslh::Hash<>' when no specialization exists. Note that
451
/// deleting the 'bsl::hash' template specialization will cause the hash value
452
/// returned by 'bsl::hash<YourType>' to change, however, this is explicitly
453
/// allowed by the contract of 'bsl::hash'.
454
///
455
/// ## Type Users
456
///
457
/// The primary use case for 'bslh::Hash' is producing hash values for hash tables
458
/// ('unordered_map's), so we will be focusing on that for the next example, but
459
/// this section does apply generally to any use of the modular hashing system.
460
/// The basic background knowledge required for this section is:
461
///
462
/// * 'hashAppend' is a free function, implemented by type implementers, which
463
/// makes a type hashable.
464
///
465
/// * 'bslh::Hash<>' is a hashing functor that has the same interface as
466
/// 'bsl::hash<SomeType>', but allows you to supply a hashing algorithm as a
467
/// template parameter.
468
///
469
/// * Standard hashing algorithm functors are available in the 'bslh' package.
470
///
471
/// Beyond this basic summary, some knowledge of how 'bslh::Hash' and 'bsl::hash'
472
/// interact is required for type users to get the most out of this system.
473
///
474
/// ### bsl::hash
475
///
476
/// Unfortunately, 'bsl::hash' is impossible to completely escape, even with the
477
/// modular hashing system. The standard mandates that 'unordered_map's with a
478
/// key of 'SomeType' will call 'bsl::hash<SomeType>' by default. We have done
479
/// our best to make 'bsl::hash' and 'bslh::Hash' interact nicely in most
480
/// scenarios, and the section below shows what behavior can be expected from
481
/// calling 'bsl::hash<SomeType>'.
482
///
483
/// * 'bsl::hash' is specialized for 'SomeType' *and* 'hashAppend' is implemented
484
/// for 'SomeType'.
485
///
486
/// * Calling 'bsl::hash<SomeType>' will go to the 'bsl::hash' template
487
/// specialization. Note that 'bsl::hash<SomeType>' should normally be
488
/// deleted once 'hashAppend' has been implemented on 'SomeType'. If
489
/// 'bsl::hash<SomeType>' has not been deleted, please contact the type's
490
/// creator and ask them to do so.
491
///
492
/// * 'bslh::Hash<AnyImplementedAlgorithm>' can be called directly
493
///
494
/// * 'bsl::hash' is not specialized for 'SomeType' *and* 'hashAppend' is
495
/// implemented for 'SomeType'.
496
///
497
/// * Calling 'bsl::hash<SomeType>' will automatically redirect to
498
/// 'bslh::Hash<>'.
499
///
500
/// * 'bslh::Hash<AnyImplementedAlgorithm>' can be called directly
501
///
502
/// * 'bsl::hash' is specialized for 'SomeType' *and* 'hashAppend' is not
503
/// implemented for 'SomeType'.
504
///
505
/// * Calling 'bsl::hash<SomeType>' will go to the 'bsl::hash' template
506
/// specialization. Please implement 'hashAppend' on 'SomeType'.
507
///
508
/// * 'bsl::hash' is not specialized for 'SomeType' *and* 'hashAppend' is not
509
/// implemented for 'SomeType'.
510
///
511
/// * Does not compile, please implement 'hashAppend' on 'SomeType'.
512
///
513
/// Note that when 'bsl::hash<SomeType>' redirects to 'bslh::Hash<>',
514
/// 'bslh::Hash<>' is always using the defualt hashing algorithm. This is fine
515
/// for most use cases, but if we need to use a special algorithm, such as a
516
/// secure one to prevent Denial of Service (DoS) attacks in a hash table, we must
517
/// directly use 'bslh::Hash' in order to swap out the algorithm template
518
/// parameter.
519
///
520
/// ### bslh::Hash
521
///
522
/// There are various algorithms that can be swapped into 'bslh::Hash' as template
523
/// parameters. Algorithms such as SipHash and SpookyHash
524
/// ('bslh::SipHashAlgorithm' and 'bslh::SpookyHashAlgorithm' respectively) are
525
/// implemented and can be swapped into 'bslh::Hash'. There are also a number of
526
/// wrapper classes such as 'bslh::DefaultHashAlgorithm' and
527
/// 'balh::DefaultSeededHashAlgorithm' which are named to allow you to pick them
528
/// based on what you need, meaning you don't need an in depth knowledge of the
529
/// individual hashing algorithms. The usage of these algorithms can be seen
530
/// below:
531
/// @code
532
/// // Implicitly uses 'bsl::hash<MyType>', may or may not redirect to
533
/// // 'bslh::Hash<>'.
534
/// bsl::unordered_map<MyType, int> unorderedMap;
535
///
536
/// // Implicitly uses 'bslh::DefaultHashAlgorithm', which redirects to the
537
/// // current best default algorithm for hashing.
538
/// bsl::unordered_map<MyType, int, bslh::Hash<>> unorderedMap;
539
///
540
/// // Explicitly uses 'bslh::DefaultHashAlgorithm', which redirects to the
541
/// // current best default algorithm for hashing.
542
/// bsl::unordered_map<MyType, int, bslh::Hash<bslh::DefaultHashAlgorithm>>
543
/// unorderedMap;
544
///
545
/// // Explicitly uses 'bslh::SpookyHashAlgorithm', one of the algorithms
546
/// // available in 'bslh'.
547
/// bsl::unordered_map<MyType, int, bslh::Hash<bslh::SpookyHashAlgorithm>>
548
/// unorderedMap;
549
/// @endcode
550
///
551
/// ### bslh::SeededHash, bslh::SeedGenerator, and Secure Hashing
552
///
553
/// Some hashing algorithms, such as 'bslh::SipHashAlgorithm', require seeds to
554
/// function. 'bslh::SipHashAlgorithm' was designed by its creators to provide
555
/// protection against hash table DoS attacks. In order to get the most
556
/// protection, 'bslh::SipHashAlgorithm' requires a cryptographically secure
557
/// random number in order to produce hashes that will be secure against an
558
/// attacker. 'bslh::SeededHash' and 'bslh::SeedGenerator' exist to facilitate
559
/// passing seeds into hashing algorithms.
560
///
561
////Seeded Algorithms
562
//// - - - - - -
563
/// Different algorithms have different seed requirements. Some require a seed to
564
/// function and others take one optionally. The table below shows the seed
565
/// requirements of the algorithms in 'bslh':
566
/// @code
567
///+-----------------------------------+-----------------------------------------+
568
///| Algorithm | Takes Seed? | Requires Seed? | Crypto?* |
569
///+-----------------------------------+-------------+----------------+----------+
570
///|'bslh::DefaultHashAlgorithm' | N | N | N |
571
///+-----------------------------------+-----------------------------------------+
572
///|'bslh::DefaultSeededHashAlgorithm' | Y | Y | N |
573
///+-----------------------------------+-----------------------------------------+
574
///|'bslh::SipHashAlgorithm' | Y | Y | Y |
575
///+-----------------------------------+-----------------------------------------+
576
///|'bslh::SpookyHashAlgorithm' | Y | N | N |
577
///+-----------------------------------+-----------------------------------------+
578
/// [*] "Crypto" is reverting to the requirement on the seed, not the quality of
579
/// the algorithm. I.e., 'bslh::SipHashAlgorithm' is not a cryptographically
580
/// secure algorithm, but it *is* a cryptographically strong pseudo-random
581
/// function, *if* it's provided a cryptographically secure seed (see
582
/// @ref bslh_siphashalgorithm for more information).
583
/// @endcode
584
/// Algorithms can require different sized seeds, and different quality of seeds.
585
/// These variances are handled by 'bslh::SeedGenerator'.
586
///
587
////'bslh::SeedGenerator'
588
//// - - - - - - -
589
/// bslh::SeedGenerator allows users to choose their Random Number Generator (RNG)
590
/// and then handles the actual seed generation for algorithms. It presents the
591
/// following interface:
592
/// @code
593
/// template<class RANDOM_NUM_GEN>
594
/// class SeedGenerator : private RANDOM_NUM_GEN {
595
/// private:
596
/// // PRIVATE TYPES
597
/// typedef typename RANDOM_NUM_GEN::result_type result_type;
598
///
599
/// // DATA
600
/// enum { k_RNGOUTPUTSIZE = sizeof(typename RANDOM_NUM_GEN::result_type)};
601
///
602
/// public:
603
/// // CREATORS
604
/// SeedGenerator();
605
///
606
/// explicit SeedGenerator(const RANDOM_NUM_GEN &randomNumberGenerator);
607
///
608
/// // MANIPULATORS
609
/// void generateSeed(char *seedLocation, size_t seedLength);
610
/// };
611
/// @endcode
612
/// Note that 'bslh::SeedGenerator' takes advantage of the empty base optimization
613
/// when possible.
614
///
615
/// 'bslh::SeedGenerator' takes a RNG as a template parameter. The quality of
616
/// this RNG will determine the quality of the seed produced. That is, if the RNG
617
/// is cryptographically secure, the seed will be as well. 'bslh::SeedGenerator'
618
/// can be either default constructed or constructed with an instance of the
619
/// (template parameter) type 'RANDOM_NUM_GEN'. Default construction is preferred
620
/// if possible, but the parameterized constructor exists for cases when
621
/// 'RANDOM_NUM_GEN' is not default constructable, or when you need to pass in an
622
/// instance of 'RANDOM_NUM_GEN' with a particular state.
623
///
624
/// The 'generateSeed' method will be used by 'bslh::SeededHash' to generate seeds
625
/// for any algorithm that takes a seed.
626
///
627
////'bslh::SeededHash'
628
//// - - - - - -
629
/// 'bslh::SeededHash' is very similar to 'bslh::Hash'. Both are wrappers for the
630
/// hashing algorithm functors in 'bslh and both present an interface that meets
631
/// the requirements of the standard for 'std::hash'. The interface of
632
/// 'bslh::SeededHash' can be seen below.
633
/// @code
634
/// template <class SEED_GENERATOR, class HASH_ALGORITHM =
635
/// bslh::DefaultSeededHashAlgorithm>
636
/// struct SeededHash {
637
/// private:
638
/// // DATA
639
/// char seed[HASH_ALGORITHM::k_SEED_LENGTH];
640
///
641
/// public:
642
/// // TYPES
643
/// typedef size_t result_type;
644
///
645
/// // CREATORS
646
/// SeededHash();
647
///
648
/// explicit SeededHash(SEED_GENERATOR& seedGenerator);
649
///
650
/// // ACCESSORS
651
/// template <class TYPE>
652
/// result_type operator()(const TYPE& type) const;
653
/// };
654
/// @endcode
655
/// Like 'bslh::SeedGenerator', 'bslh::SeededHash' has both default and
656
/// parameterized constructors. If the (template parameter) type 'SEED_GENERATOR'
657
/// is default constructible, then 'bslh:SeededHash' will be default
658
/// constructible, and it can be used in the exact same way as 'bslh::Hash', as
659
/// shown here:
660
/// @code
661
/// // Construct an unordered map with a secure hashing algorithm
662
/// bsl::unordered_map<MyType,
663
/// int,
664
/// bslh::SeededHash<bslh::SeedGenerator<CryptoRNG>,
665
/// bslh::SecureHashAlgorithm> > unorderedMap;
666
/// @endcode
667
/// If 'bslh::SeededHash' is not default constructable, or you want to use a
668
/// specific instance of a RNG or seed generator, then 'unordered_map' can no
669
/// longer be default constructed as previously shown. Instead,
670
/// 'bslh::SeededHash' must be passed through the constructor of the unordered
671
/// map, seen here:
672
/// @code
673
/// // typedefs to make the code smaller
674
/// typedef bslh::SeedGenerator<CryptoRNG> CryptoSeedGen;
675
/// typedef bslh::SeededHash<CryptoSeedGen, bslh::SecureHashAlgorithm>
676
/// CryptoHasher;
677
///
678
/// // Construct the required seed generator and hashing algorithm wrapper from
679
/// // 'someRNGObject'
680
/// CryptoSeedGen seedGenerator(someRNGObject);
681
/// CryptoHasher hashAlg(seedGenerator);
682
///
683
/// // Construct our unordered map
684
/// bsl::unordered_map<MyType, int, CryptoHasher> secureUnorderedMap(
685
/// startIterator,
686
/// endIterator,
687
/// hashAlg);
688
/// @endcode
689
/// One important difference between 'bslh::Hash' and 'bslh::SeededHash' is that
690
/// 'bslh::SeededHash' needs to hold the seed, so it can not benefit from the
691
/// empty base optimization.
692
///
693
/// ### Hashing Performance and Fundamental Integer Types
694
///
695
/// Fundamental integer types are a notable exception to the pattern of
696
/// redirecting 'bsl::hash' specializations to 'bslh::Hash'. Fundamental integer
697
/// types will retain their 'bsl::hash' template specializations that are identity
698
/// functions (they return the supplied integer value its own hash value). This
699
/// is done for performance reasons, as identity hashing is the fastest possible
700
/// hash. Please note that this is not a good hashing algorithm and should only
701
/// be used in cases where performance is critical and the data is predicable
702
/// enough that we know minimal numbers of bucket collisions will occur.
703
///
704
/// ### Choosing a Hashing Algorithm
705
///
706
/// For most purposes, the default supplied hashing algorithm will be best. It
707
/// has a good combination of speed and key distribution. In cases where user
708
/// input is directly included in the 'unordered_map', it is recommended to use a
709
/// secure hashing algorithm instead, to prevent Denial of Service (DoS) attacks
710
/// where an attacker causes all of the keys to collide to the same bucket. Make
711
/// sure to read the component level documentation when looking for an algorithm,
712
/// to be sure that a hashing algorithm has the right trade offs for your use
713
/// case.
714
///
715
/// ## Extending the System
716
///
717
/// Every piece of the modular hashing system can be extended and swapped out in
718
/// favor of user defined pieces. This section defines how to extend the various
719
/// pieces, and the canonical interfaces that must be adhered to. Both type
720
/// implementers and users may have need of this section.
721
///
722
/// ### Hashing Algorithm Functors
723
///
724
/// Users are free write their own hashing algorithms and make them available via
725
/// functors. In order to plug into 'bslh::Hash' the algorithms must implement
726
/// the following interface:
727
/// @code
728
/// class SomeHashAlgorithm
729
/// {
730
/// public:
731
/// // TYPES
732
/// typedef uint64 result_type;
733
///
734
/// // CREATORS
735
/// SomeHashAlgorithm();
736
///
737
/// // MANIPULATORS
738
/// void operator()(const char * key, size_t len);
739
///
740
/// result_type computeHash();
741
/// };
742
/// @endcode
743
/// The @ref result_type 'typedef' must define the return type of this particular
744
/// algorithm. A default constructor (either implicit or explicit) must be
745
/// supplied that creates an algorithm functor that is in a usable state. An
746
/// 'operator()' must be supplied that takes a 'const char' pointer to the data to
747
/// be hashed and a 'size_t' length of bytes to be hashed. This operator must
748
/// operate on all data uniformly, meaning that regardless of whether data is
749
/// passed in all at once, or one byte at a time, the result returned by
750
/// 'computeHash()' will be the same. 'computeHash()' will return the final
751
/// result of the hashing algorithm, in the form of a @ref result_type .
752
/// 'computeHash()' is allowed to modify the internal state of the algorithm,
753
/// meaning calling 'computeHash()' more than once might not return the correct
754
/// value.
755
///
756
/// Hashing algorithm functors containing algorithms that require seeds must
757
/// implement the interface shown above, with the exception of the default
758
/// constructor. Seeded algorithm functors must also implement the following
759
/// interface:
760
/// @code
761
/// class SomeHashAlgorithm
762
/// {
763
/// public:
764
/// // CONSTANTS
765
/// enum { k_SEED_LENGTH = XXX };
766
///
767
/// // CREATORS
768
/// explicit SomeHashAlgorithm(const char *seed);
769
/// };
770
/// @endcode
771
/// The 'k_SEED_LENGTH' enum must be in the public interface, and 'XXX' must be
772
/// replaced with an integer literal indicating the number of bytes of seed the
773
/// algorithm requires. The parameterized constructor must accept a 'const char'
774
/// pointer. This pointer will point to a seed of 'XXX' bytes in size.
775
///
776
/// ### Hashing Algorithm Wrappers (bslh::Hash)
777
///
778
/// Users are free to write their own versions of 'bslh::Hash<>' for whatever use
779
/// case they require (in fact, 'bslh::SeededHash<>' is one such example).
780
/// Because no other parts of the modular hashing system rely on 'bslh::Hash<>',
781
/// you are free to use whatever interface is necessary. The recommended
782
/// interface for maintaining compatibility with components that previously used
783
/// 'bsl::hash<>' can be seen here:
784
/// @code
785
/// template <class HASH_ALGORITHM>
786
/// struct YourHash
787
/// {
788
/// // TYPES
789
/// typedef size_t result_type;
790
///
791
/// // ACCESSORS
792
/// template <class TYPE>
793
/// result_type operator()(TYPE const& type) const;
794
/// };
795
/// @endcode
796
/// The hashing algorithm wrapper that you create to replace 'blsh::Hash<>' should
797
/// be templated to operate using various 'HASH_ALGORITHM's. Whether or not there
798
/// is a default option for the template is optional. The @ref result_type should
799
/// define the type of the hash value that you will return. The 'operator()'
800
/// should be templated to operate on any 'TYPE'. 'operator()' should take a
801
/// single 'const' reference to 'TYPE' and should return a @ref result_type . Given
802
/// two 'TYPEs' that compare equal with 'operator==', 'operator()' *must* return
803
/// the same hash.
804
///
805
/// ### Seed Generator
806
///
807
/// Users are free to write their own seed generator, a class of component
808
/// required by 'bslh::SeededHash'. The seed generator must conform to the
809
/// interface shown here:
810
/// @code
811
/// class YourSeedGenerator
812
/// {
813
/// // ACCESSORS
814
/// void generateSeed(char *seedLocation, size_t seedLength);
815
/// };
816
/// @endcode
817
/// The only mandatory piece of the seed generator interface is the 'generateSeed'
818
/// method which accepts a 'char' pointer to memory to be written and a 'size_t'
819
/// length in bytes. The generateSeed method must fill the size_t bytes of the
820
/// memory pointed to by the 'char' pointer with a seed. If possible, it is
821
/// better for the seed generator to be default constructible, however, any sort
822
/// of constructor is acceptable because the seed generator can be constructed and
823
/// passed directly into 'bslh::SeededHash' if required. Be aware that having a
824
/// non-default constructor makes it more difficult to use the seed generator (see
825
/// the 'bslh::SeededHash' section above to see the difference).
826
///
827
/// ## Hierarchical Synopsis
828
///
829
/// The 'bslh' package currently has 15 components having 4 levels of physical
830
/// dependency. The list below shows the hierarchical ordering of the components.
831
/// The order of components within each level is not architecturally significant,
832
/// just alphabetical.
833
/// @code
834
/// 4. bslh_filesystem
835
/// bslh_hashoptional
836
/// bslh_hashpair
837
/// bslh_hashtuple
838
/// bslh_hashvariant
839
/// bslh_seededhash
840
///
841
/// 3. bslh_hash
842
///
843
/// 2. bslh_defaulthashalgorithm
844
/// bslh_defaultseededhashalgorithm
845
/// bslh_spookyhashalgorithm
846
///
847
/// 1. bslh_fibonaccibadhashwrapper
848
/// bslh_seedgenerator
849
/// bslh_siphashalgorithm
850
/// bslh_spookyhashalgorithmimp
851
/// bslh_wyhashincrementalalgorithm
852
/// @endcode
853
///
854
/// ## Component Synopsis
855
///
856
/// @ref bslh_defaulthashalgorithm :
857
/// Provide a reasonable hashing algorithm for default use.
858
///
859
/// @ref bslh_defaultseededhashalgorithm :
860
/// Provide a reasonable seeded hashing algorithm for default use.
861
///
862
/// @ref bslh_fibonaccibadhashwrapper :
863
/// Provide a wrapper to improve "bad" hash algorithms.
864
///
865
/// @ref bslh_filesystem :
866
/// Provide `hash` for `std::filesystem::path`.
867
///
868
/// @ref bslh_hash :
869
/// Provide a struct to run `bslh` hash algorithms on supported types.
870
///
871
/// @ref bslh_hashoptional :
872
/// Provide `hashAppend` for `std::optional`.
873
///
874
/// @ref bslh_hashpair :
875
/// Provide `hashAppend` for `std::pair`.
876
///
877
/// @ref bslh_hashtuple :
878
/// Provide `hashAppend` for `std::tuple`.
879
///
880
/// @ref bslh_hashvariant :
881
/// Provide `hashAppend` for `std::variant`.
882
///
883
/// @ref bslh_seededhash :
884
/// Provide a struct to run seeded `bslh` hash algorithms on types.
885
///
886
/// @ref bslh_seedgenerator :
887
/// Provide a class to generate arbitrary length seeds for algorithms.
888
///
889
/// @ref bslh_siphashalgorithm :
890
/// Provide an implementation of the SipHash algorithm.
891
///
892
/// @ref bslh_spookyhashalgorithm :
893
/// Provide an implementation of the SpookyHash algorithm.
894
///
895
/// @ref bslh_spookyhashalgorithmimp :
896
/// Provide BDE style encapsulation of 3rd party SpookyHash code.
897
///
898
/// @ref bslh_wyhashincrementalalgorithm :
899
/// Provide an implementation of the WyHash algorithm final v3.
900
///
901
/// ## Component Overview
902
///
903
/// This section provides a brief introduction to each of the components in the
904
/// 'bslh' package. Full details are available in the documentation of each
905
/// component.
906
///
907
/// ### @ref bslh_defaulthashalgorithm
908
///
909
/// The @ref bslh_defaulthashalgorithm component provides an unspecified default
910
/// hashing algorithm. The supplied algorithm is suitable for general purpose use
911
/// in a hash table. The underlying algorithm is subject to change in future
912
/// releases.
913
///
914
/// This class satisfies the requirements for regular 'bslh' hashing algorithms,
915
/// as defined in @ref bslh_hash .
916
///
917
/// ### @ref bslh_defaultseededhashalgorithm
918
///
919
/// The @ref bslh_defaultseededhashalgorithm component provides an unspecified
920
/// default seeded hashing algorithm. The supplied algorithm is suitable for
921
/// general purpose use in a hash table. The underlying algorithm is subject to
922
/// change in future releases.
923
///
924
/// This class satisfies the requirements for seeded 'bslh' hashing algorithms, as
925
/// defined in @ref bslh_seededhash .
926
///
927
/// ### @ref bslh_hash
928
///
929
/// The {@ref bslh_hash } component provides a templated 'struct', 'bslh::Hash', which
930
/// provides hashing functionality. This struct is a drop in replacement for
931
/// 'bsl::hash'. 'bslh::Hash' is a wrapper that adapts hashing algorithms from
932
/// 'bslh' and 'hashAppend' free functions to match the interface of 'bsl::hash'.
933
/// This component also contains 'hashAppend' definitions for fundamental types,
934
/// which are required to make the hashing algorithms in 'bslh' work.
935
///
936
/// ### @ref bslh_seededhash
937
///
938
/// The {@ref bslh_seededhash } component provides a templated struct,
939
/// 'bslh::SeededHash', which provides hashing functionality. This 'struct' is a
940
/// drop in replacement for 'bsl::hash'. It is similar to 'bslh::Hash', however,
941
/// it is meant for hashes that require a seed. It takes a seed generator and
942
/// uses that to create seeds to give the hashing algorithm. 'bslh::SeededHash'
943
/// is a wrapper which adapts hashing algorithms from 'bslh' to match the
944
/// interface of 'bsl::hash'. 'bslh::SeededHash' is a universal hashing functor
945
/// that will hash any type that implements 'hashAppend' using the hashing
946
/// algorithm provided as a template parameter.
947
///
948
/// ### @ref bslh_seedgenerator
949
///
950
/// The {@ref bslh_seedgenerator } component provides a class, 'bslh::SeedGenerator',
951
/// which utilizes a user-supplied random number generator (RNG) to generate
952
/// arbitrary length seeds. The quality of the seeds will only be as good as the
953
/// quality of the supplied RNG. A cryptographically secure RNG must be supplied
954
/// in order for 'SeedGenerator' to produce seeds suitable for a cryptographically
955
/// secure algorithm.
956
///
957
/// This class satisfies the requirements for a seed generator, as defined in
958
/// @ref bslh_seededhash .
959
///
960
/// ### @ref bslh_siphashalgorithm
961
///
962
/// The @ref bslh_siphashalgorithm component provides an implementation of the
963
/// SipHash algorithm. SipHash is an algorithm designed for speed and security.
964
/// A primary use case for this algorithm is to provide an extra line of defense
965
/// in hash tables (such as the underlying implementation of 'unordered_map')
966
/// against malicious input that could cause Denial of Service (DoS) attacks. It
967
/// is based on one of the finalists for the SHA-3 cryptographic hash standard.
968
/// Full details of the hash function can be found here:
969
/// '{https://131002.net/siphash/siphash.pdf}'. This particular implementation
970
/// has been derived from 'siphash.h' in Howard Hinnant's work here:
971
/// '{https://github.com/HowardHinnant/hash_append}' and as much of the original
972
/// code as possible, including comment headers, has been preserved.
973
///
974
/// This class satisfies the requirements for seeded 'bslh' hashing algorithms, as
975
/// defined in @ref bslh_seededhash .
976
///
977
/// ### @ref bslh_spookyhashalgorithm
978
///
979
/// The @ref bslh_spookyhashalgorithm component provides an implementation of the
980
/// SpookyHash algorithm by Bob Jenkins. This algorithm is a general purpose
981
/// algorithm that is known to quickly reach good avalanche performance and
982
/// execute in time that is comparable to or faster than other industry standard
983
/// algorithms such as CityHash. It is a good default choice for hashing values
984
/// in unordered associative containers. For more information, see
985
/// 'http://burtleburtle.net/bob/hash/spooky.html'.
986
///
987
/// This class satisfies the requirements for regular 'bslh' hashing algorithms
988
/// and seeded 'bslh' hashing algorithms, as defined in @ref bslh_hash and
989
/// @ref bslh_seededhash respectively.
990
///
991
/// ### @ref bslh_spookyhashalgorithmimp
992
///
993
/// The @ref bslh_spookyhashalgorithmimp component provides BDE-style encapsulation
994
/// of Bob Jenkins canonical SpookyHash implementation. SpookyHash provides a way
995
/// to hash contiguous data all at once, or non-contiguous data in pieces. More
996
/// information is available at 'http://burtleburtle.net/bob/hash/spooky.html'.
997
///
998
/// ### @ref bslh_wyhashincrementalalgorithm
999
///
1000
/// The 'WyHash' algorithm, 'wyhash_final_version_3' with the 'incremental'
1001
/// property added, meaning that hashing a segment in a single pass will yield the
1002
/// same result as hashing it in contiguous pieces, but, unlike the original
1003
/// 'WyHash', this algorithm yields different results depending on the byte order
1004
/// of the host.
1005
///
1006
/// @}
1007
/** @} */
doxygen_input
bde
groups
bsl
bslh
doc
bslh.h
Generated by
1.9.8