Quick Links: |
Provide BDE style encapsulation of 3rd party SpookyHash code. More...
Namespaces | |
namespace | bslh |
bslh::SpookyHashAlgorithmImp | encapsulation of 3rd party SpookyHash code |
bslh::SpookyHashAlgorithmImp
provides BDE style encapsulation around Bob Jenkins' canonical SpookyHash implementation. SpookyHash provides a way to hash contiguous data all at once, or discontiguous data in pieces. More information is available at: http://burtleburtle.net/bob/hash/spooky.html SpookyHashAlgorithmImp
. CheckedData
which will store some data as well as the checksum associated with it. class CheckedData { // This class holds a pointer to data and provides a way of verifying // that the data has not changed. // TYPES typedef bsls::Types::Uint64 Uint64; // DATA size_t d_length; const char *d_data; Uint64 d_checksum1; Uint64 d_checksum2; public: CheckedData(const char *data, size_t length); // Creates an instance of this class having the specified 'length' // bytes of 'data'. The behavior is undefined unless 'data' is // initialized with at least 'length' bytes or 'length' is zero, // and remains valid for the lifetime of this object. Note that // only a pointer to the data will be maintained, it will not be // copied. const char *getData(); // Return a pointer to the data being tracked by this class. bool isDataValid(); // Return 'true' if the data stored in this class matches the // stored checksum, and 'false' otherwise. };
CheckedData
constructor. Here we will use SpookyHashImp
to calculate a 128-bit checksum. CheckedData::CheckedData(const char *data, size_t length) : d_length(length) , d_data(data) , d_checksum1(0) , d_checksum2(0) { BSLS_ASSERT(0 != data || 0 == length); SpookyHashAlgorithmImp hashAlg(1, 2); hashAlg.hash128(d_data, d_length, &d_checksum1, &d_checksum2); } const char *CheckedData::getData() { return d_data; }
isDataValid
. We will generate a checksum from the contained data and then compare it to the checksum we generated when the class was created. If the two hashes match, then we can be reasonably certain that the data is still in a valid state (the chance of an accidental collision is very low). If the checksums do not match, we know that the data has been corrupted. We will not be able to restore it, but we will know not to trust it. bool CheckedData::isDataValid() { SpookyHashAlgorithmImp hashAlg(1, 2); Uint64 checksum1 = 0; Uint64 checksum2 = 0; hashAlg.hash128(d_data, d_length, &checksum1, &checksum2); return (d_checksum1 == checksum1) && (d_checksum2 == checksum2); }
CheckedData
class for safekeeping. char data[] = "To be, or not to be--that is the question:" "Whether 'tis nobler in the mind to suffer" "The slings and arrows of outrageous fortune" "Or to take arms against a sea of troubles" "And by opposing end them. To die, to sleep--" "No more--and by a sleep to say we end" "The heartache, and the thousand natural shocks" "That flesh is heir to. 'Tis a consummation" "Devoutly to be wished. To die, to sleep--" "To sleep--perchance to dream: ay, there's the rub," "For in that sleep of death what dreams may come" "When we have shuffled off this mortal coil," "Must give us pause. There's the respect" "That makes calamity of so long life." "For who would bear the whips and scorns of time," "Th' oppressor's wrong, the proud man's contumely" "The pangs of despised love, the law's delay," "The insolence of office, and the spurns" "That patient merit of th' unworthy takes," "When he himself might his quietus make" "With a bare bodkin? Who would fardels bear," "To grunt and sweat under a weary life," "But that the dread of something after death," "The undiscovered country, from whose bourn" "No traveller returns, puzzles the will," "And makes us rather bear those ills we have" "Than fly to others that we know not of?" "Thus conscience does make cowards of us all," "And thus the native hue of resolution" "Is sicklied o'er with the pale cast of thought," "And enterprise of great pitch and moment" "With this regard their currents turn awry" "And lose the name of action. -- Soft you now," "The fair Ophelia! -- Nymph, in thy orisons" "Be all my sins remembered."; CheckedData checkedData(data, strlen(data));
CheckedData
recognizes that it is still valid. ASSERT(checkedData.isDataValid());
CheckedData
class can detect this. data[34] = 'z';
ASSERT(!checkedData.isDataValid());
BloombergLP
and bslh
namespaces SpookyHash
to SpookyHashAlgorithmImp
stdint.h
(which might not be available on all platforms) and updated associated typedef
s include
guards typedef
s within class static_cast
s Hash32
and Hash64
enum
s to avoid storage overheadinit
Final
to finalize
and Short
to shortHash
to avoid using a keyword)SpookyHash: a 128-bit non cryptographic hash function By Bob Jenkins, public domain Oct 31 2010: alpha, framework + SpookyHash::Mix appears right Oct 31 2011: alpha again, Mix only good to 2^^69 but rest appears right Dec 31 2011: beta, improved Mix, tested it for 2-bit deltas Feb 2 2012: production, same bits as beta Feb 5 2012: adjusted definitions of uint* to be more portable Mar 30 2012: 3 bytes/cycle, not 4. Alpha was 4 but wasn't thorough enough. August 5 2012: SpookyV2 (different results) Up to 3 bytes/cycle for long messages. Reasonably fast for short messages. All 1 or 2 bit deltas achieve avalanche within 1% bias per output bit. This was developed for and tested on 64-bit x86-compatible processors. It assumes the processor is little-endian. There is a macro controlling whether unaligned reads are allowed (by default they are). This should be an equally good hash on big-endian machines, but it will compute different results on them than on little-endian machines. Google's CityHash has similar specs to SpookyHash, and CityHash is faster on new Intel boxes. MD4 and MD5 also have similar specs, but they are orders of magnitude slower. CRCs are two or more times slower, but unlike SpookyHash, they have nice math for combining the CRCs of pieces to form the CRCs of wholes. There are also cryptographic hashes, but those are even slower than MD5.