BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bslh_siphashalgorithm.h
Go to the documentation of this file.
1/// @file bslh_siphashalgorithm.h
2///
3/// The content of this file has been pre-processed for Doxygen.
4///
5
6
7// bslh_siphashalgorithm.h -*-C++-*-
8#ifndef INCLUDED_BSLH_SIPHASHALGORITHM
9#define INCLUDED_BSLH_SIPHASHALGORITHM
10
11#include <bsls_ident.h>
12BSLS_IDENT("$Id: $")
13
14/// @defgroup bslh_siphashalgorithm bslh_siphashalgorithm
15/// @brief Provide an implementation of the SipHash algorithm.
16/// @addtogroup bsl
17/// @{
18/// @addtogroup bslh
19/// @{
20/// @addtogroup bslh_siphashalgorithm
21/// @{
22///
23/// <h1> Outline </h1>
24/// * <a href="#bslh_siphashalgorithm-purpose"> Purpose</a>
25/// * <a href="#bslh_siphashalgorithm-classes"> Classes </a>
26/// * <a href="#bslh_siphashalgorithm-description"> Description </a>
27/// * <a href="#bslh_siphashalgorithm-security"> Security </a>
28/// * <a href="#bslh_siphashalgorithm-denial-of-service-protection"> Denial of Service (DoS) Protection </a>
29/// * <a href="#bslh_siphashalgorithm-speed"> Speed </a>
30/// * <a href="#bslh_siphashalgorithm-hash-distribution"> Hash Distribution </a>
31/// * <a href="#bslh_siphashalgorithm-hash-consistency"> Hash Consistency </a>
32/// * <a href="#bslh_siphashalgorithm-usage"> Usage </a>
33/// * <a href="#bslh_siphashalgorithm-example-creating-and-using-a-hash-table-containing-user-input"> Example: Creating and Using a Hash Table containing User Input </a>
34/// * <a href="#bslh_siphashalgorithm-changes"> Changes </a>
35/// * <a href="#bslh_siphashalgorithm-third-party-documentation"> Third-Party Documentation </a>
36///
37/// # Purpose {#bslh_siphashalgorithm-purpose}
38/// Provide an implementation of the SipHash algorithm.
39///
40/// # Classes {#bslh_siphashalgorithm-classes}
41///
42/// - bslh::SipHashAlgorithm: functor implementing the SipHash algorithm
43///
44/// @see bslh_hash
45///
46/// # Description {#bslh_siphashalgorithm-description}
47/// `bslh::SipHashAlgorithm` implements the SipHash algorithm.
48/// SipHash is an algorithm designed for speed and security. A primary use case
49/// for this algorithm is to provide an extra line of defense in hash tables
50/// (such as the hash table that is used to implement `unordered_map`) against
51/// malicious input that could cause Denial of Service (DoS) attacks. It is
52/// based on one of the finalists for the SHA-3 cryptographic hash standard.
53/// Full details of the hash function can be found here:
54/// `{https://131002.net/siphash/siphash.pdf}`. This particular implementation
55/// has been derived from `siphash.h` in Howard Hinnant's work here:
56/// `{https://github.com/HowardHinnant/hash_append}` and as much of the original
57/// code as possible, including comment headers, has been preserved.
58///
59/// This class satisfies the requirements for seeded `bslh` hashing algorithms,
60/// defined in `bslh_seededhash.h`. More information can be found in the
61/// package level documentation for `bslh` (internal users can also find
62/// information here {TEAM BDE:USING MODULAR HASHING<GO>})
63///
64/// ## Security {#bslh_siphashalgorithm-security}
65///
66///
67/// SipHash is *not* a cryptographically secure hash. In the paper linked
68/// above, the creators of this hash function describe it as "Cryptographically
69/// Strong", but explicitly avoid calling it cryptographically secure. In order
70/// to be cryptographically secure, an algorithm must, among other things,
71/// provide "collision resistance". "Collision resistance" means that it should
72/// be difficult to find two different messages m1 and m2 such that
73/// `hash(m1) == hash(m2)`. Because of the limited sized output (only 2^64
74/// possibilities) and the fast execution time of the algorithm, it is feasible
75/// to find collisions by brute force, making the algorithm not
76/// cryptographically secure.
77///
78/// SipHash *is*, however, a cryptographically strong PRF (pseudo-random
79/// function). This means, assuming a cryptographically secure random seed is
80/// given, the output of this algorithm will be indistinguishable from a uniform
81/// random distribution. This property is enough for the algorithm to be able
82/// to protect a hash table from malicious Denial of Service (DoS) attacks.
83///
84/// ### Denial of Service (DoS) Protection {#bslh_siphashalgorithm-denial-of-service-protection}
85///
86///
87/// Given a cryptographically secure seed, this algorithm will produce hashes
88/// with a distribution that is indistinguishable from random. This
89/// distribution means that there is no way for an attacker to predict which
90/// keys will cause collisions, meaning that this algorithm can help mitigate
91/// Denial of Service (DoS) attacks on a hash table. DoS attacks occur when an
92/// attacker deliberately degrades the performance of the hash table by
93/// inserting data that will collide to the same bucket, causing an average
94/// constant time lookup to become a linear search. This protection is only
95/// effective if the seed provided is a cryptographically secure random number
96/// that is not available to the attacker.
97///
98/// ## Speed {#bslh_siphashalgorithm-speed}
99///
100///
101/// This algorithm is designed to be fast in comparison to other algorithms
102/// making similar guarantees. It is still slower than other commonly accepted
103/// and used hashes such as WyHash or SpookyHash. This algorithm should only be
104/// used when protection from malicious input is required. Otherwise, an
105/// algorithm that documents better performance properties should be used, such
106/// as `bslh::WyHashAlgorithm`.
107///
108/// ## Hash Distribution {#bslh_siphashalgorithm-hash-distribution}
109///
110///
111/// Output hashes will be well distributed and will avalanche, which means
112/// changing one bit of the input will change approximately 50% of the output
113/// bits. This will prevent similar values from funneling to the same hash or
114/// bucket.
115///
116/// ## Hash Consistency {#bslh_siphashalgorithm-hash-consistency}
117///
118///
119/// This hash algorithm is endian-independent. The hashes produced for a given
120/// 16-byte key sequence and given data will be the same on big-endian and
121/// little-endian platforms. (In the literature the key is sometimes presented
122/// as a large integer or sequence of integers, such as
123/// `0xDEADBEEF, 0xCAFEBABE, 0x8BADF00D, 0x1BADB002`, in which case care must be
124/// taken to supply the individual key bytes in the same order on both platforms
125/// if the same hash results are desired.) However, if the "given data" is not
126/// just a character string but has internal structure, such as being integral
127/// or floating-point, it is likely ordered in different ways depending on the
128/// platform, and thus will not hash to the same value.
129///
130/// ## Usage {#bslh_siphashalgorithm-usage}
131///
132///
133/// This section illustrates intended usage of this component.
134///
135/// ### Example: Creating and Using a Hash Table containing User Input {#bslh_siphashalgorithm-example-creating-and-using-a-hash-table-containing-user-input}
136///
137///
138/// Suppose we have any array of types that define `operator==`, and we want a
139/// fast way to find out if values are contained in the array. We can create a
140/// `HashTable` data structure that is capable of looking up values in O(1)
141/// time.
142///
143/// Further suppose that we will be storing arbitrary user input in our table.
144/// It is possible that an attacker with knowledge of the hashing algorithm we
145/// are using could specially craft input that will cause collisions in our hash
146/// table, degrading performance to O(n). To avoid this we will need to use a
147/// secure hash algorithm with a random seed. This algorithm will need to be in
148/// the form of a hash functor -- an object that will take objects stored in our
149/// array as input, and yield an integer value which is hard enough for an
150/// outside observer to predict that it appear random. The functor can pass the
151/// attributes of the `TYPE` that are salient to hashing into the hashing
152/// algorithm, and then return the hash that is produced.
153///
154/// We can use the result of the hash function to index into our array of
155/// `buckets`. Each `bucket` is simply a pointer to a value in our original
156/// array of `TYPE` objects.
157///
158/// First, we define our `HashTable` template class, with the two type
159/// parameters: `TYPE` (the type being referenced) and `HASHER` (a functor that
160/// produces the hash).
161/// @code
162///
163/// template <class TYPE, class HASHER>
164/// class HashTable {
165/// // This class template implements a hash table providing fast lookup of
166/// // an external, non-owned, array of values of (template parameter)
167/// // 'TYPE'.
168/// //
169/// // The (template parameter) 'TYPE' shall have a transitive, symmetric
170/// // 'operator==' function. There is no requirement that it have any
171/// // kind of creator defined.
172/// //
173/// // The 'HASHER' template parameter type must be a functor with a method
174/// // having the following signature:
175/// //..
176/// // size_t operator()(TYPE) const;
177/// // -OR-
178/// // size_t operator()(const TYPE&) const;
179/// //..
180/// // and 'HASHER' shall have a publicly accessible default constructor
181/// // and destructor.
182/// //
183/// // Note that this hash table has numerous simplifications because we
184/// // know the size of the array and never have to resize the table.
185///
186/// // DATA
187/// const TYPE *d_values; // Array of values table is to
188/// // hold
189/// size_t d_numValues; // Length of 'd_values'.
190/// const TYPE **d_bucketArray; // Contains ptrs into
191/// // 'd_values'
192/// size_t d_bucketArrayMask; // Will always be '2^N - 1'.
193/// HASHER d_hasher;
194///
195/// private:
196/// // PRIVATE ACCESSORS
197/// bool lookup(size_t *idx,
198/// const TYPE& value,
199/// size_t hashValue) const;
200/// // Look up the specified 'value', having the specified 'hashValue',
201/// // and load its index in 'd_bucketArray' into the specified 'idx'.
202/// // If not found, return the vacant entry in 'd_bucketArray' where
203/// // it should be inserted. Return 'true' if 'value' is found and
204/// // 'false' otherwise.
205///
206/// public:
207/// // CREATORS
208/// HashTable(const TYPE *valuesArray,
209/// size_t numValues);
210/// // Create a hash table referring to the specified 'valuesArray'
211/// // having length of the specified 'numValues'. No value in
212/// // 'valuesArray' shall have the same value as any of the other
213/// // values in 'valuesArray'
214///
215/// ~HashTable();
216/// // Free up memory used by this hash table.
217///
218/// // ACCESSORS
219/// bool contains(const TYPE& value) const;
220/// // Return true if the specified 'value' is found in the table and
221/// // false otherwise.
222/// };
223///
224/// @endcode
225/// Then, we define a `Future` class, which holds a cstring `name`, char
226/// `callMonth`, and short `callYear`. This class can be used to store custom
227/// futures that the users have uploaded.
228/// @code
229///
230/// class Future {
231/// // This class identifies a future contract. It tracks the name, call
232/// // month and year of the contract it represents, and allows equality
233/// // comparison.
234///
235/// // DATA
236/// const char *d_name; // held, not owned
237/// const char d_callMonth;
238/// const short d_callYear;
239///
240/// public:
241/// // CREATORS
242/// Future(const char *name, const char callMonth, const short callYear)
243/// : d_name(name), d_callMonth(callMonth), d_callYear(callYear)
244/// // Create a 'Future' object out of the specified 'name',
245/// // 'callMonth', and 'callYear'.
246/// {}
247///
248/// Future() : d_name(""), d_callMonth('\0'), d_callYear(0)
249/// // Create a 'Future' with default values.
250/// {}
251///
252/// // ACCESSORS
253/// const char * getMonth() const
254/// // Return the month that this future expires.
255/// {
256/// return &d_callMonth;
257/// }
258///
259/// const char * getName() const
260/// // Return the name of this future
261/// {
262/// return d_name;
263/// }
264///
265/// const short * getYear() const
266/// // Return the year that this future expires
267/// {
268/// return &d_callYear;
269/// }
270///
271/// bool operator==(const Future& other) const
272/// // Compare this to the specified 'other' object and return true if
273/// // they are equal
274/// {
275/// return (!strcmp(d_name, other.d_name)) &&
276/// d_callMonth == other.d_callMonth &&
277/// d_callYear == other.d_callYear;
278/// }
279/// };
280///
281/// bool operator!=(const Future& lhs, const Future& rhs)
282/// // Compare compare the specified 'lhs' and 'rhs' objects and return
283/// // true if they are not equal
284/// {
285/// return !(lhs == rhs);
286/// }
287///
288/// @endcode
289/// Next, we need a hash functor for `Future`. We are going to use the
290/// `SipHashAlgorithm` because, it is a secure hash algorithm that will provide
291/// a way to securely combine the attributes of `Future` objects that are
292/// salient to hashing into one reasonable hash that an malicious user will not
293/// be able to predict.
294/// @code
295///
296/// struct HashFuture {
297/// // This struct is a functor that will apply the SipHashAlgorithm to
298/// // objects of type 'Future'.
299///
300/// size_t operator()(const Future& future) const
301/// // Return the hash of the of the specified 'future'. Note that
302/// // this uses the 'SipHashAlgorithm' to safely combine the
303/// // attributes of 'Future' objects that are salient to hashing into
304/// // a hash that is not predictable by an attacker.
305/// {
306/// char seed[SipHashAlgorithm::k_SEED_LENGTH];
307/// SeedGenerator<CryptoSecureRNG> seedGenerator;
308/// seedGenerator.generateSeed(seed, SipHashAlgorithm::k_SEED_LENGTH);
309///
310/// SipHashAlgorithm hash(seed);
311///
312/// hash(future.getName(), strlen(future.getName()));
313/// hash(future.getMonth(), sizeof(char));
314/// hash(future.getYear(), sizeof(short));
315///
316/// return static_cast<size_t>(hash.computeHash());
317/// }
318/// };
319///
320/// @endcode
321/// Then, we want to actually use our hash table on `Future` objects. We create
322/// an array of `Future`s based on data that was originally from some external
323/// source:
324/// @code
325/// Future futures[] = { Future("Swiss Franc", 'F', 2014),
326/// Future("US Dollar", 'G', 2015),
327/// Future("Canadian Dollar", 'Z', 2014),
328/// Future("British Pound", 'M', 2015),
329/// Future("Deutsche Mark", 'X', 2016),
330/// Future("Eurodollar", 'Q', 2017)};
331/// enum { NUM_FUTURES = sizeof futures / sizeof *futures };
332///
333/// Next, we create our HashTable 'hashTable'. We pass the functor that we
334/// defined above as the second argument:
335///
336/// HashTable<Future, HashFuture> hashTable(futures, NUM_FUTURES);
337/// @endcode
338/// Now, we verify that each element in our array registers with count:
339/// @code
340/// for ( int i = 0; i < 6; ++i) {
341/// ASSERT(hashTable.contains(futures[i]));
342/// }
343/// @endcode
344/// Finally, we verify that futures not in our original array are correctly
345/// identified as not being in the set:
346/// @code
347/// ASSERT(!hashTable.contains(Future("French Franc", 'N', 2019)));
348/// ASSERT(!hashTable.contains(Future("Swiss Franc", 'X', 2014)));
349/// ASSERT(!hashTable.contains(Future("US Dollar", 'F', 2014)));
350/// @endcode
351///
352/// ## Changes {#bslh_siphashalgorithm-changes}
353///
354///
355/// The third party code begins with the `siphash.h` header below, and continues
356/// until the TYPE TRAITS banner below. Changes made to the original code
357/// include:
358///
359/// 1. Adding `BloombergLP` and `bslh` namespaces
360/// 2. Renaming `siphash` to `SipHashAlgorithm`
361/// 3. Whitespace changes for formatting
362/// 4. Added comments
363/// 5. Removed C++11 features including class member initializer, `noexcept`,
364/// `std::Uint64_t`, explicit conversion operator, and an `= default`
365/// constructor.
366/// 6. Added `typedef` to replace removed `std::Uint64_t`
367/// 7. Added `computeHash()` to replace the removed explicit conversion
368/// 8. Added `k_SEED_LENGTH` and changed the constructor to accept a
369/// `const char *`
370/// 9. Included headers and added `include` guards
371/// 10. Changed variables to use `size_t` rather than `unsigned int`
372///
373/// ## Third-Party Documentation {#bslh_siphashalgorithm-third-party-documentation}
374///
375///
376///------------------------------- siphash.h -----------------------------------
377///
378/// This software is in the public domain. The only restriction on its use is
379/// that no one can remove it from the public domain by claiming ownership of
380/// it, including the original authors.
381///
382/// There is no warranty of correctness on the software contained herein. Use
383/// at your own risk.
384///
385/// Derived from:
386///
387/// SipHash reference C implementation
388///
389/// Written in 2012 by Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
390/// Daniel J. Bernstein <djb@cr.yp.to>
391///
392/// To the extent possible under law, the author(s) have dedicated all copyright
393/// and related and neighboring rights to this software to the public domain
394/// worldwide. This software is distributed without any warranty.
395///
396/// You should have received a copy of the CC0 Public Domain Dedication along
397/// with this software. If not, see
398/// <http://creativecommons.org/publicdomain/zero/1.0/>.
399///
400///-----------------------------------------------------------------------------
401/// @}
402/** @} */
403/** @} */
404
405/** @addtogroup bsl
406 * @{
407 */
408/** @addtogroup bslh
409 * @{
410 */
411/** @addtogroup bslh_siphashalgorithm
412 * @{
413 */
414
415#include <bslscm_version.h>
416
418
419#include <bsls_types.h>
420
421#include <stddef.h> // for 'size_t'
422
423
424
425
426namespace bslh {
427
428 // ============================
429 // class bslh::SipHashAlgorithm
430 // ============================
431
432/// This class wraps an implementation of the "SipHash" algorithm in an
433/// interface that is usable in the modular hashing system in `bslh`.
434///
435/// See @ref bslh_siphashalgorithm
437
438 private:
439 // PRIVATE TYPES
440
441 /// Typedef for a 64-bit integer type used in the hashing algorithm.
442 typedef bsls::Types::Uint64 Uint64;
443
444 // DATA
445 Uint64 d_v0;
446 Uint64 d_v1;
447 Uint64 d_v2;
448
449 // Stores the intermediate state of the algorithm as values are
450 // accumulated
451 Uint64 d_v3;
452
453 /// Provides alignment
454 union {
456
457 /// Used to buffer data until we have enough to do a full round of
458 /// computation as specified by the algorithm.
459 unsigned char d_buf [8];
460 };
461
462 // The length of the data currently stored in the buffer.
463 size_t d_bufSize;
464
465 // The total length of all data that has been passed into the
466 // algorithm.
467 size_t d_totalLength;
468
469 // NOT IMPLEMENTED
470
471 /// Do not allow copy construction.
472 SipHashAlgorithm(const SipHashAlgorithm& original); // = delete;
473
474 /// Do not allow assignment.
475 SipHashAlgorithm& operator=(const SipHashAlgorithm& rhs); // = delete;
476
477 public:
478 // TYPES
479
480 /// Typedef indicating the value type returned by this algorithm.
481 typedef Uint64 result_type;
482
483 // CONSTANTS
484 enum { k_SEED_LENGTH = 16 }; // Seed length in bytes.
485
486 // CREATORS
487
488 /// Create a `bslh::SipHashAlgorithm`, seeded with a 128-bit
489 /// (`k_SEED_LENGTH` bytes) seed pointed to by the specified `seed`.
490 /// Each bit of the supplied seed will contribute to the final hash
491 /// produced by `computeHash()`. The behaviour is undefined unless
492 /// `seed` points to at least 16 bytes of initialized memory. Note that
493 /// if data in `seed` is not random, all guarantees of security and
494 /// Denial of Service (DoS) protection are void.
495 explicit SipHashAlgorithm(const char *seed);
496
497 ~SipHashAlgorithm() = default;
498 // Destroy this object.
499
500 // MANIPULATORS
501
502 /// Incorporate the specified `data`, of at least the specified
503 /// `numBytes`, into the internal state of the hashing algorithm. Every
504 /// bit of data incorporated into the internal state of the algorithm
505 /// will contribute to the final hash produced by `computeHash()`. The
506 /// same hash will be produced regardless of whether a sequence of bytes
507 /// is passed in all at once or through multiple calls to this member
508 /// function. Input where `numBytes` is 0 will have no effect on the
509 /// internal state of the algorithm. The behaviour is undefined unless
510 /// `data` points to a valid memory location with at least `numBytes`
511 /// bytes of initialized memory.
512 void operator()(const void *data, size_t numBytes);
513
514 /// Return the finalized version of the hash that has been accumulated.
515 /// Note that this changes the internal state of the object, so calling
516 /// `computeHash()` multiple times in a row will return different
517 /// results, and only the first result returned will match the expected
518 /// result of the algorithm. Also note that a value will be returned,
519 /// even if data has not been passed into `operator()`
521};
522
523} // close package namespace
524
525// ============================================================================
526// TYPE TRAITS
527// ============================================================================
528
529namespace bslmf {
530template <>
531struct IsBitwiseMoveable<bslh::SipHashAlgorithm>
532 : bsl::true_type {};
533} // close namespace bslmf
534
535
536
537#endif
538
539// ----------------------------------------------------------------------------
540// Copyright 2014 Bloomberg Finance L.P.
541//
542// Licensed under the Apache License, Version 2.0 (the "License");
543// you may not use this file except in compliance with the License.
544// You may obtain a copy of the License at
545//
546// http://www.apache.org/licenses/LICENSE-2.0
547//
548// Unless required by applicable law or agreed to in writing, software
549// distributed under the License is distributed on an "AS IS" BASIS,
550// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
551// See the License for the specific language governing permissions and
552// limitations under the License.
553// ----------------------------- END-OF-FILE ----------------------------------
554
555/** @} */
556/** @} */
557/** @} */
Definition bslh_siphashalgorithm.h:436
Uint64 d_alignment
Definition bslh_siphashalgorithm.h:455
unsigned char d_buf[8]
Definition bslh_siphashalgorithm.h:459
void operator()(const void *data, size_t numBytes)
result_type computeHash()
Uint64 result_type
Typedef indicating the value type returned by this algorithm.
Definition bslh_siphashalgorithm.h:481
@ k_SEED_LENGTH
Definition bslh_siphashalgorithm.h:484
SipHashAlgorithm(const char *seed)
#define BSLS_IDENT(str)
Definition bsls_ident.h:195
Definition bslh_defaulthashalgorithm.h:339
Definition bdlbb_blob.h:576
Definition bslmf_isbitwisemoveable.h:718
unsigned long long Uint64
Definition bsls_types.h:137