Quick Links:

bal | bbl | bdl | bsl

Public Types | Public Member Functions

bdlb::Tokenizer Class Reference

#include <bdlb_tokenizer.h>

List of all members.

Public Types

typedef TokenizerIterator iterator

Public Member Functions

 Tokenizer (const char *input, const bsl::string_view &soft)
 Tokenizer (const bsl::string_view &input, const bsl::string_view &soft)
 Tokenizer (const char *input, const bsl::string_view &soft, const bsl::string_view &hard)
 Tokenizer (const bsl::string_view &input, const bsl::string_view &soft, const bsl::string_view &hard)
 ~Tokenizer ()
Tokenizeroperator++ ()
void reset (const char *input)
void reset (const bsl::string_view &input)
bool hasPreviousSoft () const
bool hasTrailingSoft () const
bool isPreviousHard () const
bool isTrailingHard () const
bool isValid () const
bslstl::StringRef previousDelimiter () const
bslstl::StringRef token () const
bslstl::StringRef trailingDelimiter () const
iterator begin () const
iterator end () const

Detailed Description

This class provides (read-only) sequential access to tokens delimited by two user-supplied character sets consisting, respectively, of soft and hard delimiters characters. Access to the previous and current (trailing) delimiter, as well as to the current token itself, is provided efficiently via bslstl::StringRef.

See Component bdlb_tokenizer


Member Typedef Documentation


Constructor & Destructor Documentation

bdlb::Tokenizer::Tokenizer ( const char *  input,
const bsl::string_view &  soft 
)
bdlb::Tokenizer::Tokenizer ( const bsl::string_view &  input,
const bsl::string_view &  soft 
)
bdlb::Tokenizer::Tokenizer ( const char *  input,
const bsl::string_view &  soft,
const bsl::string_view &  hard 
)
bdlb::Tokenizer::Tokenizer ( const bsl::string_view &  input,
const bsl::string_view &  soft,
const bsl::string_view &  hard 
)

Create a Tokenizer object bound to the specified sequence of input characters having the specified set of (unique) soft delimiter characters to be used to separate tokens (i.e., maximal sequence of non-delimiter characters) in input. Optionally specify a disjoint set of (unique) hard delimiter characters to be used to explicitly terminate tokens. Delimiters within input consist of a maximal sequence of one or more delimiter characters, at most one of which may be hard; when there is a contiguous sequence of delimiter characters containing two or more hard delimiter characters in input, any intervening soft delimiter characters are associated with the previous (hard) delimiter. Any leading soft delimiter characters -- i.e., those preceding the first token or hard delimiter character (referred to as the leader) -- are available immediately after construction via the previousDelimiter method. The behavior is undefined unless all supplied delimiter characters are unique. Note that the behavior is also undefined if this object is used in any way (other than to reset or destroy it) after its underlying input string is modified. Also note that the current token and (trailing) delimiter may be accessed only while this object is in the valid state; however, the previous delimiter (or leader) is always accessible. Also note that all token and delimiter strings are returned as references into the underlying input string, and hence remain valid so long as that string is not modified or destroyed -- irrespective of the state (or even the existence) of this object. Finally note that supplying a default constructed string_view is equivalent to supplying an empty c-string (i.e., "").

bdlb::Tokenizer::~Tokenizer (  ) 

Destroy this object.


Member Function Documentation

Tokenizer& bdlb::Tokenizer::operator++ (  ) 

Advance the iteration state of this object to refer to the next sequence of previous delimiter, current token, and current (trailing) delimiter in the underlying input sequence, and return a reference providing modifiable access to this object. The current delimiter reference becomes the previous one. If there is another token remaining in the input, the current token and delimiter are updated to refer to the respective new token and (trailing) delimiter values -- either of which (but not both) might be empty. If there are no tokens remaining in the input, the iteration state of this object becomes invalid. The behavior is undefined unless the iteration state of this object is initially valid, or if the underlying input has been modified or destroyed since this object was most recently reset (or created).

void bdlb::Tokenizer::reset ( const char *  input  ) 
void bdlb::Tokenizer::reset ( const bsl::string_view &  input  ) 

Rebind this object to refer to the specified sequence of input characters. The state of the tokenizer following this call is as if it had been constructed with input and its current sets of soft and hard delimiter characters. The behavior is undefined if this object is used in any way (other than to reset or destroy it) after its underlying input string is modified. Note that supplying a default constructed string_view is equivalent to supplying an empty c-string (i.e., "").

bool bdlb::Tokenizer::hasPreviousSoft (  )  const

Return true if the previous delimiter (or leader) contains a soft delimiter character, and false otherwise. The behavior is undefined if the underlying input itself has been modified or destroyed since this object was most recently reset (or created).

bool bdlb::Tokenizer::hasTrailingSoft (  )  const

Return true if the current (trailing) delimiter contains a soft delimiter character, and false otherwise. The behavior is undefined if the iteration state of this object is initially invalid, or if the underlying input itself has been modified or destroyed since this object was most recently reset (or created).

bool bdlb::Tokenizer::isPreviousHard (  )  const

Return true if the previous delimiter contains a hard-delimiter character, and false otherwise. The behavior is undefined if the underlying input itself has been modified or destroyed since this object was most recently reset (or created).

bool bdlb::Tokenizer::isTrailingHard (  )  const

Return true if the current (trailing) delimiter contains a hard delimiter character, and false otherwise. The behavior is undefined if the iteration state of this object is initially invalid, or if the underlying input itself has been modified or destroyed since this object was most recently reset (or created).

bool bdlb::Tokenizer::isValid (  )  const

Return true if the iteration state of this object is valid, and false otherwise. Note that the behavior of advancing the iteration state as well as accessing the current token or (trailing) delimiter is undefined unless the current iteration state of this object is valid.

bslstl::StringRef bdlb::Tokenizer::previousDelimiter (  )  const

Return a reference to the non-modifiable previous delimiter (or leader) in the input string. The behavior is undefined if the underlying input has been modified or destroyed since this object was most recently reset (or created).

bslstl::StringRef bdlb::Tokenizer::token (  )  const

Return a reference to the non-modifiable current token (i.e., maximal sequence of non-delimiter characters) in the input string. The returned reference remains valid so long as the underlying input is not modified or destroyed -- irrespective of the state (or existence) of this object. The behavior is undefined unless the iteration state of this object is initially valid, or if the underlying input has been modified or destroyed since this object was most recently reset (or created).

bslstl::StringRef bdlb::Tokenizer::trailingDelimiter (  )  const

Return a reference to the non-modifiable current (trailing) delimiter (maximal sequence of one or more delimiter characters containing at most one hard delimiter character) in the input string. The returned reference remains valid so long as the underlying input is not modified or destroyed -- irrespective of the state (or existence) of this object. The behavior is undefined unless the iteration state of this object is initially valid, or if the underlying input has been modified or destroyed since this object was most recently reset (or created).

iterator bdlb::Tokenizer::begin (  )  const

Return an iterator referring to the first token in this object's input string (the past-the-end iterator if this object iteration state is initially invalid). This reference remains valid as long as the underlying input has not been modified or destroyed since this object was most recently reset (or created).

iterator bdlb::Tokenizer::end (  )  const

Return an iterator referring to position beyond the last token in this object's input string. This reference remains valid as long as the underlying input has not been modified or destroyed since this object was most recently reset (or created).


The documentation for this class was generated from the following file: