Quick Links:

bal | bbl | bdl | bsl

Namespaces

Component bslh_spookyhashalgorithmimp
[Package bslh]

Provide BDE style encapsulation of 3rd party SpookyHash code. More...

Namespaces

namespace  bslh

Detailed Description

Outline
Purpose:
Provide BDE style encapsulation of 3rd party SpookyHash code.
Classes:
bslh::SpookyHashAlgorithmImp encapsulation of 3rd party SpookyHash code
See also:
Component bslh_hash, Component bslh_spookyhashalgorithm
Description:
bslh::SpookyHashAlgorithmImp provides BDE style encapsulation around Bob Jenkins' canonical SpookyHash implementation. SpookyHash provides a way to hash contiguous data all at once, or discontiguous data in pieces. More information is available at: http://burtleburtle.net/bob/hash/spooky.html
Usage:
This section illustrates intended usage of this component.
Example: Creating 128-bit checksums:
Suppose we have a library of 4 billion pieces of data and we want to store checksums for this data. For a 64-bit hash, there is a 35% chance of two of these checksums colliding (according to the approximation found here: http://en.wikipedia.org/wiki/Birthday_problem). We want to avoid checksum collision, so we will use the 128-bit hashing functionality provided by SpookyHashAlgorithmImp.
First, we will declare a class CheckedData which will store some data as well as the checksum associated with it.
  class CheckedData {
      // This class holds a pointer to data and provides a way of verifying
      // that the data has not changed.

      // TYPES
      typedef bsls::Types::Uint64 Uint64;

      // DATA
      size_t      d_length;
      const char *d_data;
      Uint64      d_checksum1;
      Uint64      d_checksum2;

    public:
      CheckedData(const char *data, size_t length);
          // Creates an instance of this class having the specified 'length'
          // bytes of 'data'.  The behavior is undefined unless 'data' is
          // initialized with at least 'length' bytes or 'length' is zero,
          // and remains valid for the lifetime of this object.  Note that
          // only a pointer to the data will be maintained, it will not be
          // copied.

      const char *getData();
          // Return a pointer to the data being tracked by this class.

      bool isDataValid();
          // Return 'true' if the data stored in this class matches the
          // stored checksum, and 'false' otherwise.
  };
Then, we define the CheckedData constructor. Here we will use SpookyHashImp to calculate a 128-bit checksum.
  CheckedData::CheckedData(const char *data, size_t length)
  : d_length(length)
  , d_data(data)
  , d_checksum1(0)
  , d_checksum2(0)
  {
      BSLS_ASSERT(0 != data || 0 == length);

      SpookyHashAlgorithmImp hashAlg(1, 2);

      hashAlg.hash128(d_data, d_length, &d_checksum1, &d_checksum2);
  }

  const char *CheckedData::getData() {
      return d_data;
  }
Next, we define isDataValid. We will generate a checksum from the contained data and then compare it to the checksum we generated when the class was created. If the two hashes match, then we can be reasonably certain that the data is still in a valid state (the chance of an accidental collision is very low). If the checksums do not match, we know that the data has been corrupted. We will not be able to restore it, but we will know not to trust it.
  bool CheckedData::isDataValid() {
      SpookyHashAlgorithmImp hashAlg(1, 2);
      Uint64 checksum1 = 0;
      Uint64 checksum2 = 0;

      hashAlg.hash128(d_data, d_length, &checksum1, &checksum2);

      return (d_checksum1 == checksum1) && (d_checksum2 == checksum2);
  }
Then, we store some data in our CheckedData class for safekeeping.
      char data[] = "To be, or not to be--that is the question:"
                    "Whether 'tis nobler in the mind to suffer"
                    "The slings and arrows of outrageous fortune"
                    "Or to take arms against a sea of troubles"
                    "And by opposing end them.  To die, to sleep--"
                    "No more--and by a sleep to say we end"
                    "The heartache, and the thousand natural shocks"
                    "That flesh is heir to.  'Tis a consummation"
                    "Devoutly to be wished.  To die, to sleep--"
                    "To sleep--perchance to dream: ay, there's the rub,"
                    "For in that sleep of death what dreams may come"
                    "When we have shuffled off this mortal coil,"
                    "Must give us pause.  There's the respect"
                    "That makes calamity of so long life."
                    "For who would bear the whips and scorns of time,"
                    "Th' oppressor's wrong, the proud man's contumely"
                    "The pangs of despised love, the law's delay,"
                    "The insolence of office, and the spurns"
                    "That patient merit of th' unworthy takes,"
                    "When he himself might his quietus make"
                    "With a bare bodkin? Who would fardels bear,"
                    "To grunt and sweat under a weary life,"
                    "But that the dread of something after death,"
                    "The undiscovered country, from whose bourn"
                    "No traveller returns, puzzles the will,"
                    "And makes us rather bear those ills we have"
                    "Than fly to others that we know not of?"
                    "Thus conscience does make cowards of us all,"
                    "And thus the native hue of resolution"
                    "Is sicklied o'er with the pale cast of thought,"
                    "And enterprise of great pitch and moment"
                    "With this regard their currents turn awry"
                    "And lose the name of action. -- Soft you now,"
                    "The fair Ophelia! -- Nymph, in thy orisons"
                    "Be all my sins remembered.";
      CheckedData checkedData(data, strlen(data));
Now, we check that the CheckedData recognizes that it is still valid.
      ASSERT(checkedData.isDataValid());
Finally, we tamper with the data and check that our CheckedData class can detect this.
      data[34] = 'z';
      ASSERT(!checkedData.isDataValid());
Changes:
The third party code begins with the "SpookyHash" header below, and continues until the BloombergLP copyright notice. Changes made to the original code include:
  1. Added BloombergLP and bslh namespaces
  2. Renamed SpookyHash to SpookyHashAlgorithmImp
  3. Removed usage of stdint.h (which might not be available on all platforms) and updated associated typedefs
  4. Added include guards
  5. Made some methods private
  6. Reformatted comments and added comments
  7. Updated indenting to BDE style
  8. Moved typedefs within class
  9. Changed C-style casts to static_casts
10 Reordered methods according to BDE style
11 Added inline to Hash32 and Hash64
12 Changed static constants to enums to avoid storage overhead
13 Added constructor in place of init
14 Made function names lower case (had to change Final to finalize and Short to shortHash to avoid using a keyword)
Third-Party Documentation:
 SpookyHash: a 128-bit non cryptographic hash function

 By Bob Jenkins, public domain
   Oct 31 2010: alpha, framework + SpookyHash::Mix appears right
   Oct 31 2011: alpha again, Mix only good to 2^^69 but rest appears right
   Dec 31 2011: beta, improved Mix, tested it for 2-bit deltas
   Feb  2 2012: production, same bits as beta
   Feb  5 2012: adjusted definitions of uint* to be more portable
   Mar 30 2012: 3 bytes/cycle, not 4. Alpha was 4 but wasn't thorough enough.
   August 5 2012: SpookyV2 (different results)

 Up to 3 bytes/cycle for long messages.  Reasonably fast for short messages.
 All 1 or 2 bit deltas achieve avalanche within 1% bias per output bit.

 This was developed for and tested on 64-bit x86-compatible processors.  It
 assumes the processor is little-endian.  There is a macro controlling
 whether unaligned reads are allowed (by default they are).  This should be
 an equally good hash on big-endian machines, but it will compute different
 results on them than on little-endian machines.

 Google's CityHash has similar specs to SpookyHash, and CityHash is faster on
 new Intel boxes.  MD4 and MD5 also have similar specs, but they are orders
 of magnitude slower.  CRCs are two or more times slower, but unlike
 SpookyHash, they have nice math for combining the CRCs of pieces to form the
 CRCs of wholes.  There are also cryptographic hashes, but those are even
 slower than MD5.