BDE 4.14.0 Production release
|
Provide an STL-compliant default_searcher class.
Canonical header: bsl_functional.h
This component defines two class templates, bsl::default_searcher
and bslstl::DefaultSearcher
. Both are compliant with section [func.search.default]
of the C++ Standard (C++17 and later).
bsl::default_searcher
is strictly limited to the Standard and is provided for clients for whom standard compliance is a priority. bslstl::DefaultSearcher
provides several additional accessors that are not mentioned in the Standard.
Except where there is a relevant difference, both are described below as if they were one.
This class template has two parameters:
FORWARD_ITR_NEEDLE
: The iterator type that defines on construction the range of values to be sought (the "needle"), and
EQUAL
: an optional parameter that defines the class used to compare those values.
The class also provides a functor-style interface that accepts two iterators that define the range of values to be searched (the "haystack"). Once constructed the searcher object can be re-used to search multiple haystacks (for the same needle).
The iterators defining the haystack need not be of the same type as those that define the needle. Moreover, the search method of the searcher can be overloaded for an arbitrary different haystack iterators (subject to {Iterator Requirements}).
The bslstl::DefaultSearcher
class uses the classic, "naive" algorithm. The needle is sought at the beginning of the haystack and, if not found there, the start position in the haystack is incremented. That is repeated until the needle is found in the haystack or the end of the haystack is encountered. The time complexity is O(M * N)
, where M
is the length of the needle and N
is the length of the haystack.
There are more sophisticated algorithms available; however, those typically require creating and retaining metadata derived from the needle and/or haystack. If the needle is short and the haystack of moderate length, the naive algorithm may be faster than generating that metadata, especially if the search is one-time and the metadata cost cannot be amortized.
Another advantage of the "default" searcher is that it accepts (relatively simple) *ForwardIterator*s whereas more sophisticated algorithms typically require (more fully-featured) *RandomIterator*s.
The two independent iterator types associated with this class – one defining the needle, the other defining the haystack – must meet the requirements of ForwardIterator.
Additionally:
Either of the iterator types are allowed to throw exceptions.
Iterators defining needles are required to remain valid as long as the searcher object might be used.
The comparator class must meet the requirements of BinaryPredicate:
operator()
method, which, given an ForwardIterator, iterator
, may be invoked as operator()(*iterator, *iterator)
.bool
.The comparator class is allowed to throw exceptions.
For certain template arguments this implementation improves performance by utilizing low-level operations. The requirements to do so are:
bsl::equal_to<value_type>
).value_type
is bitwise-equality-comparable.Users supplying their own data types are advised to set the bslmf::IsBitwiseEqualityComparable
trait when applicable so that bslstl::DefaultSearcher
knows that the last optimization condition listed above is met. See bslmf_isbitwiseequalitycomparable .
In this section we show the intended usage of this component.
The problem of searching a sequence of characters for a particular sub-sequence arises in many applications. For example, one might need to know if some document contains a particular word of interest. For example, suppose we would like to know the first occurrence of the word "United" in the Declaration of Independence (of the United States):
First, we obtain the text of document and word of interest as sequences of char
values.
Then, we create a bsl::default_searcher
object (a functor) using the given word
:
Notice that no equality comparison functor was specified so searchForUnited
will use bsl::equal_to<char>
by default.
Now, we invoke our functor, specifying the range of the document to be searched:
Finally, we notice that the search correctly ignored the appearance of the word "united" (all lower case) in the second sentence.
{bslstl_boyermoorehorspoolsearcher |Example 1} shows how the same problem is addressed using bsl::boyer_moore_horspool_searcher
.
As seen in {Example 1} above, the default equality comparison functor is case sensitive. If one needs case in-sensitive searches, a non-default equality comparison class must be specified.
First, define (at file scope if using a pre-C++11 compiler) an equality comparison class that provides the required functor interface:
Then, define a new bsl::default_searcher
type and create a searcher object to search for word
:
Note that the new searcher object will used a default constructed MyCaseInsensitiveCharComparator
class. If an equality comparison object requires state supplied on construction, such an object be explicitly created and supplied as the final constructor argument.
Now, we invoke our new functor, specifying that the same document searched in {Example 1}:
Finally, we find the next occurrence of word
by reusing the same searcher object, this time instructing it to begin its search just after the previous occurrence of word
was found:
{bslstl_boyermoorehorspoolsearcher |Example 2} shows how the same problem is addressed using bsl::boyer_moore_horspool_searcher
.
The "default" searcher class template is not constrained to searching for char
values. Searches can be done on other types (see {Iterator Requirements}). Moreover the container of the sequence being sought (the "needle") need not the same as the sequence being searched (the "haystack").
Suppose one has data from an instrument that reports float
values and that inserts the sequence { FLT_MAX, FLT_MIN, FLT_MAX }
as a marker for the start and end of a test run. We can assume the probably of the instrument reporting this sequence as readings is negligible and that data reported outside of the test runs is random noise. Here is how we can search for the first test run data in the data sequence.
First, we create a representation of the sequence that denotes the limit of a test run.
Next, we obtain the data to be searched. (In this example, we will use simulated data.)
Then, we define and create our searcher object:
Notice that no equality comparator was specified so searchForMarker
will use bsl::equal_to<float>
by default.
Now, we invoke our searcher on the instrument data.
Finally, we locate the marker of the end of the first test run and pass the location of the first test run data to some other function for processing.
{bslstl_boyermoorehorspoolsearcher |Example 3} shows how the same problem is addressed using bsl::boyer_moore_horspool_searcher
. Notice that other example uses data
from a container that provides random access iterators; whereas here, bidirectional iterators are used (and forward iterators would have sufficed).