Quick Links: |
Provide an STL-compliant boyer_moore_horspool_searcher
class.
More...
Namespaces | |
namespace | bslstl |
namespace | bsl |
boyer_moore_horspool_searcher
class. bsl::boyer_moore_horspool_searcher | class template to search via BMH |
bslstl::BoyerMooreHorspoolSearcher | class template to search via BMH |
bsl::boyer_moore_horspool_searcher
and bslstl::BoyerMooreHorspoolSearcher
. Both are compliant with section [func.search.bmh]
of the C++ Standard (C++17 and later). bsl::boyer_moore_horspool_searcher
is strictly limited to the Standard and is provided for clients for whom standard compliance is a priority. bslslt::BoyerMoreHorspoolSearcher
provides several accessors that are not mentioned in the Standard. Moreover, bslslt::BoyerMoreHorspoolSearcher
is "plumbed" for BDE allocators and can be used with BDE standard containers whereas the compliant strict class always uses the currently installed default allocator. See Example 4 below. RNDACC_ITR_NEEDLE
: HASH
: HASH
and EQUAL
.EQUAL
: HASH
and EQUAL
.O(N)
for a haystack of length N
. EQUAL
, must meet the requirements of BinaryPredicate: operator()
method that, given a *RandomAccessIterator', iterator
, can be invoked as operator()(*iterator, *iterator)
. bool
. EQUAL
functor generate the same value by the HASH
functor. That is: true == searcher.equal()(a, b);
searcher.hash()(a) == searcher.hash()(b);
value_type
is char
(either signed
or unsigned
flavors). For needles of typical size, this choice results in somewhat more memory use than it would have if some dynamically sized container were used; however, the faster access during searches warrants the tradeoff. char
values. const char document[] = " IN CONGRESS, July 4, 1776.\n" // 28 "\n" // 1 " The unanimous Declaration of the thirteen united States of America,\n" //----^----|----^----|----^----|----^----|---- // 44 // // -- // // 73rd character // "\n" // ... " declare, That these United Colonies are, and of Right ought to be\n" // ... "Honor."; const char *word = "United";
default_searcher
object (a functor) using the given word
: bsl::boyer_moore_horspool_searcher<const char *> searchForUnited( word, word + bsl::strlen(word));
searchForUnited
will use bsl::equal_to<char>
by default. bsl::pair<const char *, const char *> result = searchForUnited( document, document + sizeof document); bsl::size_t offset = result.first - document; assert(120 == offset); assert(static_cast<bsl::size_t>(result.second - result.first) == bsl::strlen(word));
bslstl_defaultsearcher
|Example 1 shows how the same problem is addressed using bsl::default_searcher
. struct MyCaseInsensitiveCharComparer { bool operator()(const char& a, const char& b) const { return bsl::tolower(a) == bsl::tolower(b); } };
struct MyCaseInsensitiveCharHasher { bool operator()(const char& value) const { static bsl::hash<char> s_hash; return s_hash(static_cast<char>(bsl::tolower(value))); } };
bsl::boyer_moore_horspool_searcher
type for and create a searcher object to search for word
: bsl::boyer_moore_horspool_searcher<const char *, MyCaseInsensitiveCharHasher, MyCaseInsensitiveCharComparer> searchForUnitedInsensitive( word, word + bsl::strlen(word));
MyCaseInsensitiveCharHasher
and MyCaseInsensitiveCharComparer
classes. If stateful functors are required such objects can be passed in the optional constructor arguments. bsl::pair<const char *, const char *> resultInsensitive = searchForUnitedInsensitive( document, document + sizeof document); bsl::size_t offsetInsensitive = resultInsensitive.first - document; assert( 72 == offsetInsensitive); assert(static_cast<bsl::size_t>(resultInsensitive.second - resultInsensitive.first) == bsl::strlen(word));
word
by reusing the same searcher object, this time instructing it to begin its search just after the previous occurrence of word
was found: bslstl_defaultsearcher
|Example 2 shows how the same problem is addressed using bsl::default_searcher
. char
values. Searches can be done on other types (see Iterator Requirements). Moreover the container of the sequence being sought (the "needle") need not the same as the sequence being searched (the "haystack"). float
values and that inserts the sequence { FLT_MAX, FLT_MIN, FLT_MAX }
as a marker for the start and end of a test run. We can assume the probably of the instrument reporting this sequence as readings is negligible and that data reported outside of the test runs is random noise. Here is how we can search for the first test run data in the data sequence. const float markerSequence[] = { FLT_MAX , FLT_MIN , FLT_MAX }; const bsl::size_t markerSequenceLength = sizeof markerSequence / sizeof *markerSequence;
bsl::vector<float> data; // Container provides random access iterators. doTestRun(&data);
bsl::boyer_moore_horspool_searcher<const float *> searchForMarker(markerSequence, markerSequence + markerSequenceLength);
searchForMarker
will use bsl::equal_to<float>
by default. typedef bsl::vector<float>::const_iterator DataConstItr; const bsl::pair<DataConstItr, DataConstItr> notFound(data.cend(), data.cend()); bsl::pair<DataConstItr, DataConstItr> markerPosition = searchForMarker( data.cbegin(), data.cend()); assert(notFound != markerPosition); DataConstItr startOfTestRun = markerPosition.second;
markerPosition = searchForMarker(markerPosition.second, data.cend()); assert(notFound != markerPosition); DataConstItr endOfTestRun = markerPosition.first; processTestRun(startOfTestRun, endOfTestRun);
bslstl_defaultsearcher
|Example 3 shows how the same problem is addressed using bsl::default_searcher
. Notice that other example uses data
from a container that provides bidirectional iterators (and forward iterators would have sufficed), whereas here random access iterators are required. bsl::boyer_moore_horspool_searcher
objects is small (the needle must be scanned, meta-data calculated, and results saved) but can be non-neglibible when one needs a great number of them. When there is a reasonable chance that one will have to repeat a given search, it can be worthwhile to cache the searcher objects for reuse. bsl::unordered_map
container. Allocating types must meet certain requirements to work properly with allocator-enabled containers such as bsl::unordered_map
. bsl::boyer_moore_horspool_searcher
does not, so we will use bslstl::BoyerMooreHorspoolSearcher
, which does. // ==================================== // class MyCaseInsensitiveSearcherCache // ==================================== class MyCaseInsensitiveSearcherCache { // TYPES public: typedef bslstl::BoyerMooreHorspoolSearcher< bsl::string::const_iterator, MyCaseInsensitiveCharHasher, MyCaseInsensitiveCharComparer> Searcher; // PRIVATE TYPES private: typedef bsl::unordered_map<bsl::string, Searcher> Map; // DATA Map d_map; // PRIVATE MANIPULATORS const Searcher& insertSearcher(const bsl::string& key); // Insert into this cache a key-value pair where the key is the // specified 'key' and the value is a 'Searcher' object created to // seek the needle specified by the key part. Note that this // arrangement guarantees that the iterators used by this cached // searcher object remain valid for the life of the searcher // object. public: // CREATORS explicit MyCaseInsensitiveSearcherCache(bslma::Allocator *basicAllocator = 0); // Create an empty 'MyCaseInsensitiveSearcherCache' object. // Optionally specify a 'basicAllocator' used to supply memory. If // 'basicAllocator' is 0, the currently installed default // allocator is used. // MANIPULATORS const Searcher& getSearcher(const char *needle); // Return a 'const'-reference to the cached server that can do a // case-insensitive search for the specified 'needle'. If such a // searcher does not exist in the cache on entry, such a searcher // is constructed, added to the cache, and returned (by // 'const'-reference). // ACCESSORS bsl::size_t numSearchers() const; // Return the number of searcher objects in this cache. };
typedef
for Searcher
) that we reuse the hash functor, MyCaseInsensitiveCharHasher
, and equality comparison functor, MyCaseInsensitiveCharComparer
, that were defined in Example 2. MyCaseInsensitiveSearcherCache
itself is an allocating type. If we needed to make it compatible with BDE containers (e.g., to allow a cache of caches) a few additional features are needed. As we have no such need, those features are deferred. // ------------------------------------ // class MyCaseInsensitiveSearcherCache // ------------------------------------ // CREATORS MyCaseInsensitiveSearcherCache:: MyCaseInsensitiveSearcherCache(bslma::Allocator *basicAllocator) : d_map(basicAllocator) { }
basicAllocator
is simply forwarded to d_map
. // MANIPULATORS const MyCaseInsensitiveSearcherCache::Searcher& MyCaseInsensitiveSearcherCache::getSearcher(const char *needle) { bsl::string key(needle); Map::iterator findResult = d_map.find(key); if (d_map.end() == findResult) { return insertSearcher(key); // RETURN } else { return findResult->second; // RETURN } } // ACCESSORS bsl::size_t MyCaseInsensitiveSearcherCache::numSearchers() const { return d_map.size(); }
// PRIVATE MANIPULATORS const MyCaseInsensitiveSearcherCache::Searcher& MyCaseInsensitiveSearcherCache::insertSearcher(const bsl::string& key) { Searcher dummy(key.begin(), key.begin()); // to be overwritten Map::value_type value(key, dummy); bsl::pair<Map::iterator, bool> insertResult = d_map.insert(value); assert(true == insertResult.second); Map::iterator iterator = insertResult.first; iterator->second = Searcher(iterator->first.begin(), iterator->first.end()); return iterator->second; }
struct { const char *d_givenName_p; const char *d_surname_p; } DATA[] = { // GIVEN SURNAME // -------- ------------ { "Donald" , "McDonald" } , { "John" , "Johnson" } , { "John" , "Saint Johns" } , { "Jon" , "Literjon" } , { "Jean" , "Valjean" } , { "James" , "Jameson" } , { "Will" , "Freewill" } , { "John" , "Johns" } , { "John" , "John" } , { "John" , "Jones" } , { "J'onn" , "J'onzz" } , { "Donald" , "Donalds" } , { "Donald" , "Mac Donald" } , { "William", "Williams" } , { "Durand" , "Durand" } , { "John" , "Johnstown" } , { "Major" , "Major" } , { "Donald" , "MacDonald" } , { "Patrick", "O'Patrick" } , { "Chris", "Christie" } , { "Don", "London" } // ... , { "Ivan" , "Ivanovich" } }; bsl::size_t NUM_NAMES = sizeof DATA / sizeof *DATA; typedef bsl::pair<const char *, const char *> Result; MyAllocator myAllocator; MyCaseInsensitiveSearcherCache searcherCache(&myAllocator); bsl::string output; for (bsl::size_t ti = 0; ti < NUM_NAMES; ++ti) { const char * const givenName = DATA[ti].d_givenName_p; const char * const surname = DATA[ti].d_surname_p; const MyCaseInsensitiveSearcherCache::Searcher& searcher = searcherCache.getSearcher(givenName); assert(&myAllocator == searcher.allocator());
const Result result = searcher(surname, surname + bsl::strlen(surname)); const Result notFound = bsl::make_pair(surname + bsl::strlen(surname), surname + bsl::strlen(surname)); char buffer[32]; if (notFound == result) { sprintf(buffer, "ng: %-10s %-11s\n", givenName, surname); } else { sprintf(buffer, "OK: %-10s %-11s\n", givenName, surname); } output.append(buffer); }
output
and confirm that our code is properly identifying the names of interest. assert(0 == bsl::strcmp(output.c_str(), "OK: Donald McDonald \n" "OK: John Johnson \n" "OK: John Saint Johns\n" "OK: Jon Literjon \n" "OK: Jean Valjean \n" "OK: James Jameson \n" "OK: Will Freewill \n" "OK: John Johns \n" "OK: John John \n" "ng: John Jones \n" "ng: J'onn J'onzz \n" "OK: Donald Donalds \n" "OK: Donald Mac Donald \n" "OK: William Williams \n" "OK: Durand Durand \n" "OK: John Johnstown \n" "OK: Major Major \n" "OK: Donald MacDonald \n" "OK: Patrick O'Patrick \n" "OK: Chris Christie \n" "OK: Don London \n" "OK: Ivan Ivanovich \n")); assert(searcherCache.numSearchers() < NUM_NAMES);