BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bslh_spookyhashalgorithmimp

Macros

#define BSLH_SPOOKYHASHALGORITHMIMP_INLINE   inline
 See implementation notes for the explanation of this macro.
 

Detailed Description

Outline

Purpose

Provide BDE style encapsulation of 3rd party SpookyHash code.

Classes

See also
bslh_hash, bslh_spookyhashalgorithm

Description

bslh::SpookyHashAlgorithmImp provides BDE style encapsulation around Bob Jenkins' canonical SpookyHash implementation. SpookyHash provides a way to hash contiguous data all at once, or discontiguous data in pieces. More information is available at: http://burtleburtle.net/bob/hash/spooky.html

Usage

This section illustrates intended usage of this component.

Example: Creating 128-bit checksums

Suppose we have a library of 4 billion pieces of data and we want to store checksums for this data. For a 64-bit hash, there is a 35% chance of two of these checksums colliding (according to the approximation found here: http://en.wikipedia.org/wiki/Birthday_problem). We want to avoid checksum collision, so we will use the 128-bit hashing functionality provided by SpookyHashAlgorithmImp.

First, we will declare a class CheckedData which will store some data as well as the checksum associated with it.

class CheckedData {
// This class holds a pointer to data and provides a way of verifying
// that the data has not changed.
// TYPES
typedef bsls::Types::Uint64 Uint64;
// DATA
size_t d_length;
const char *d_data;
Uint64 d_checksum1;
Uint64 d_checksum2;
public:
CheckedData(const char *data, size_t length);
// Creates an instance of this class having the specified 'length'
// bytes of 'data'. The behavior is undefined unless 'data' is
// initialized with at least 'length' bytes or 'length' is zero,
// and remains valid for the lifetime of this object. Note that
// only a pointer to the data will be maintained, it will not be
// copied.
const char *getData();
// Return a pointer to the data being tracked by this class.
bool isDataValid();
// Return 'true' if the data stored in this class matches the
// stored checksum, and 'false' otherwise.
};
unsigned long long Uint64
Definition bsls_types.h:137

Then, we define the CheckedData constructor. Here we will use SpookyHashImp to calculate a 128-bit checksum.

CheckedData::CheckedData(const char *data, size_t length)
: d_length(length)
, d_data(data)
, d_checksum1(0)
, d_checksum2(0)
{
BSLS_ASSERT(0 != data || 0 == length);
SpookyHashAlgorithmImp hashAlg(1, 2);
hashAlg.hash128(d_data, d_length, &d_checksum1, &d_checksum2);
}
const char *CheckedData::getData() {
return d_data;
}
#define BSLS_ASSERT(X)
Definition bsls_assert.h:1804

Next, we define isDataValid. We will generate a checksum from the contained data and then compare it to the checksum we generated when the class was created. If the two hashes match, then we can be reasonably certain that the data is still in a valid state (the chance of an accidental collision is very low). If the checksums do not match, we know that the data has been corrupted. We will not be able to restore it, but we will know not to trust it.

bool CheckedData::isDataValid() {
SpookyHashAlgorithmImp hashAlg(1, 2);
Uint64 checksum1 = 0;
Uint64 checksum2 = 0;
hashAlg.hash128(d_data, d_length, &checksum1, &checksum2);
return (d_checksum1 == checksum1) && (d_checksum2 == checksum2);
}

Then, we store some data in our CheckedData class for safekeeping.

char data[] = "To be, or not to be--that is the question:"
"Whether 'tis nobler in the mind to suffer"
"The slings and arrows of outrageous fortune"
"Or to take arms against a sea of troubles"
"And by opposing end them. To die, to sleep--"
"No more--and by a sleep to say we end"
"The heartache, and the thousand natural shocks"
"That flesh is heir to. 'Tis a consummation"
"Devoutly to be wished. To die, to sleep--"
"To sleep--perchance to dream: ay, there's the rub,"
"For in that sleep of death what dreams may come"
"When we have shuffled off this mortal coil,"
"Must give us pause. There's the respect"
"That makes calamity of so long life."
"For who would bear the whips and scorns of time,"
"Th' oppressor's wrong, the proud man's contumely"
"The pangs of despised love, the law's delay,"
"The insolence of office, and the spurns"
"That patient merit of th' unworthy takes,"
"When he himself might his quietus make"
"With a bare bodkin? Who would fardels bear,"
"To grunt and sweat under a weary life,"
"But that the dread of something after death,"
"The undiscovered country, from whose bourn"
"No traveller returns, puzzles the will,"
"And makes us rather bear those ills we have"
"Than fly to others that we know not of?"
"Thus conscience does make cowards of us all,"
"And thus the native hue of resolution"
"Is sicklied o'er with the pale cast of thought,"
"And enterprise of great pitch and moment"
"With this regard their currents turn awry"
"And lose the name of action. -- Soft you now,"
"The fair Ophelia! -- Nymph, in thy orisons"
"Be all my sins remembered.";
CheckedData checkedData(data, strlen(data));

Now, we check that the CheckedData recognizes that it is still valid.

ASSERT(checkedData.isDataValid());

Finally, we tamper with the data and check that our CheckedData class can detect this.

data[34] = 'z';
ASSERT(!checkedData.isDataValid());

Changes

The third party code begins with the "SpookyHash" header below, and continues until the BloombergLP copyright notice. Changes made to the original code include:

  1. Added BloombergLP and bslh namespaces
  2. Renamed SpookyHash to SpookyHashAlgorithmImp
  3. Removed usage of stdint.h (which might not be available on all platforms) and updated associated typedefs
  4. Added include guards
  5. Made some methods private
  6. Reformatted comments and added comments
  7. Updated indenting to BDE style
  8. Moved typedefs within class
  9. Changed C-style casts to static_cast s
  10. Reordered methods according to BDE style
  11. Added inline to Hash32 and Hash64
  12. Changed static constants to enums to avoid storage overhead
  13. Added constructor in place of init
  14. Made function names lower case (had to change Final to finalize and Short to shortHash to avoid using a keyword)
  15. Reformatted comments
  16. Replaced some uses of 'inline' with the new BSLH_SPOOKYHASHALGORITHMIMP_INLINE macro, which forces inlining on GCC

Third-Party Documentation

SpookyHash: a 128-bit non cryptographic hash function
By Bob Jenkins, public domain
Oct 31 2010: alpha, framework + SpookyHash::Mix appears right
Oct 31 2011: alpha again, Mix only good to 2^^69 but rest appears right
Dec 31 2011: beta, improved Mix, tested it for 2-bit deltas
Feb 2 2012: production, same bits as beta
Feb 5 2012: adjusted definitions of uint* to be more portable
Mar 30 2012: 3 bytes/cycle, not 4. Alpha was 4 but wasn't thorough enough.
August 5 2012: SpookyV2 (different results)
Up to 3 bytes/cycle for long messages. Reasonably fast for short messages.
All 1 or 2 bit deltas achieve avalanche within 1% bias per output bit.
This was developed for and tested on 64-bit x86-compatible processors. It
assumes the processor is little-endian. There is a macro controlling
whether unaligned reads are allowed (by default they are). This should be
an equally good hash on big-endian machines, but it will compute different
results on them than on little-endian machines.
Google's CityHash has similar specs to SpookyHash, and CityHash is faster on
new Intel boxes. MD4 and MD5 also have similar specs, but they are orders
of magnitude slower. CRCs are two or more times slower, but unlike
SpookyHash, they have nice math for combining the CRCs of pieces to form the
CRCs of wholes. There are also cryptographic hashes, but those are even
slower than MD5.

Macro Definition Documentation

◆ BSLH_SPOOKYHASHALGORITHMIMP_INLINE

#define BSLH_SPOOKYHASHALGORITHMIMP_INLINE   inline