BDE 4.14.0 Production release
|
#include <bdljsn_tokenizer.h>
Public Types | |
enum | TokenType { e_BEGIN = 1 , e_ELEMENT_NAME , e_START_OBJECT , e_END_OBJECT , e_START_ARRAY , e_END_ARRAY , e_ELEMENT_VALUE , e_ERROR , BAEJSN_ELEMENT_NAME = e_ELEMENT_NAME , BAEJSN_START_OBJECT = e_START_OBJECT , BAEJSN_END_OBJECT = e_END_OBJECT , BAEJSN_START_ARRAY = e_START_ARRAY , BAEJSN_END_ARRAY = e_END_ARRAY , BAEJSN_ELEMENT_VALUE = e_ELEMENT_VALUE , BAEJSN_ERROR = e_ERROR } |
enum | { k_EOF = +1 } |
enum | ConformanceMode { e_RELAXED = 0 , e_STRICT_20240119 } |
typedef bsls::Types::IntPtr | IntPtr |
typedef bsls::Types::Uint64 | Uint64 |
This class
provides a mechanism for traversing JSON data stored in a bsl::streambuf
one node at a time and allows clients to access the data associated with that node, including its type and data value.
See bdljsn_tokenizer
|
inlineexplicit |
Create a Tokenizer
object. Optionally specify a basicAllocator
used to supply memory. If basicAllocator
is 0, the currently installed default allocator is used. By default, the conformanceMode
is e_RELAXED
and the value of the Tokenizer
options are:
The reset
method must be called before any calls to advanceToNextToken
or resetStreamBufGetPointer
.
|
inline |
int bdljsn::Tokenizer::advanceToNextToken | ( | ) |
Move to the next token in the data steam. Return 0 on success and a non-zero value otherwise. Each call to advanceToNextToken
invalidates the string references returned by the value
accessor for prior nodes. This function may fail to move to the next token if doing so would advanced past a character sequence that is not valid JSON, and is guaranteed to do so (fail to move) if e_RELAXED != conformanceMode()
. The behavior is undefined unless reset
has been called.
|
inline |
Return the value of the allowConsecutiveSeparators
option of this tokenizer.
|
inline |
Return the value of the allowFormFeedAsWhitespace
option of this tokenizer.
|
inline |
Return the value of the allowHeterogenousArrays
option of this tokenizer.
|
inline |
Return the value of the allowNonUtf8StringLiterals
option of this tokenizer.
|
inline |
Return the value of the allowStandAloneValues
option of this tokenizer.
|
inline |
Return the value of the allowTrailingTopLevelComma
option of this tokenizer.
|
inline |
Return the value of the allowUnescapedControlCharacters
option of this tokenizer.
|
inline |
|
inline |
Return the offset of the current octet being tokenized in the stream supplied to reset
, or if an error occurred, the position where the failed attempt to tokenize a token occurred. Note that this operation is intended to provide additional information in the case of an error.
|
inline |
Return the last read position relative to when reset
was called. Note that readOffset() >= currentPosition()
– the readOffset
is the offset of the last octet read from the stream supplied to reset
, and is at or beyond the current position being tokenized.
|
inline |
Return the status of the last call to reloadStringBuffer()
:
reloadStringBuffer()
has not been called or if a token was successfully read.k_EOF
(which is positive) if no data could be read before reaching EOF.allowNonUtf8StringLiterals
option is false
and a UTF-8 error occurred. The specific value returned will be one of the enumerators of the bdlde::Utf8Util::ErrorStatus
enum
type indicating the nature of the UTF-8 error.
|
inline |
Reset this tokenizer to read data from the specified streambuf
. Note that the reader will not be on a valid node until advanceToNextToken
is called. Note that this function does not change the the conformanceMode
nor the values of any of the individual token options:
allowConsecutiveSeparators
allowFormFeedAsWhitespace
allowHeterogenousArrays
allowNonUtf8StringLiterals
allowStandAloneValues
allowTrailingTopLevelComma
allowUnescapedControlCharacters
int bdljsn::Tokenizer::resetStreamBufGetPointer | ( | ) |
Reset the get pointer of the streambuf
held by this object to refer to the byte following the last processed byte, if the held streambuf
supports seeking, and return an error otherwise leaving this object unchanged. Return 0 on success, and a non-zero value otherwise. The behavior is undefined unless reset
has been called. Note that after a successful function return users can read data from the streambuf
that was specified during reset
from where this object stopped. Also note that this call implies the end of processing for this object and any subsequent methods invoked on this object should only be done after calling reset
and specifying a new streambuf
.
|
inline |
Set the allowConsecutiveSeparators
option to the specified value
and return a non-const
reference to this tokenizer. JSON defines two separator tokens: the colon (:
) and the comma (,
). If the allowConsecutiveSeparartors
value is true
this tokenizer will accept multiple consecutive sequences of a given separator (e.g., "a"::b, "c":::d
and "a":b,, "c":d
, ,, "e":f') as if a single separator had appeared (i.e., "a":b, "c":d
and "a":b, "c":d
, "e":f', respectively). Otherwise the tokenizer returns an error when multiple consecutive colons are found. By default, the value of the allo ConsecutiveSeparators
option is true
. The behavior is undefined unless e_RELAXED == conformanceMode()
. Note that consecutive sequences using both tokens (e.g., ::,,::
) is always an error.
|
inline |
Set the allowFormFeedAsWhitespace
option to the specifiedd value and return a non-const
reference to this tokenizer. If the allowFormFeedAsWhitespace
value is true
the formfeed character ('\f') is recognized as a whitespace character in addition to '
', '\t', '\r', and '\v'. Otherwise, formfeed is diallowed a whitewpace.
|
inline |
Set the allowHeterogenousArrays
option to the specified value
and return a non-const
reference to this tokenizer. If the allowHeterogenousArrays
value is true
this tokenizer will successfully tokenize heterogeneous values within an array. If the option's value is false
then the tokenizer will return an error for arrays having heterogeneous values. By default, the value of the allowHeterogenousArrays
option is true
. The behavior is undefined unless e_RELAXED == conformanceMode()
.
|
inline |
Set the allowNonUtf8StringLiterals
option to the specified value
and return a non-const
reference to this tokenizer. If the allowNonUtf8StringLiterals
value is false
this tokenizer will check string literal tokens for invalid UTF-8, enter an error mode if it encounters a string literal token that has any content that is not UTF-8, and fail to advance to subsequent tokens until reset
is called. By default, the value of the allowNonUtf8StringLiterals
option is true
. The behavior is undefined unless e_RELAXED == conformanceMode()
.
|
inline |
Set the allowStandAloneValues
option to the specified value
and return a non-const
reference to this tokenizer. If the allowStandAloneValues
value is true
this tokenizer will successfully tokenize JSON values (strings and numbers). If the option's value is false
then the tokenizer will only tokenize complete JSON documents (JSON objects and arrays) and return an error for stand alone JSON values. By default, the value of the allowStandAloneValues
option is true
. The behavior is undefined unless e_RELAXED == conformanceMode()
.
|
inline |
Set the allowTrailingTopLevelComma
option to the specified value
and return a non-const
reference to this tokenizer. If the allowTrailingTopLevelComma
value is true
this tokenizer will successfully tokenize JSON values where a comma follows the top-level JSON element. If the option's value is false
then the tokenizer will reject documents with such trailing commas, such as {},
. By default, the value of the allowTrailingTopLevelComma
option is true
for backwards compatibility. Note that a document without any JSON elements is invalid whether or not it contains commas. The behavior is undefined unless e_RELAXED == conformanceMode()
.
|
inline |
Set the allowUnescapedControlCharacters
option of this tokenizer to the specified value
. If true
, characters in the range [ 0x00 .. 0x1F ]
are allowed in JSON strings. If the option is false
, these characters must be represented by their six byte escape sequences [ \u0000 .. \u001F ]
. Several values in that range are also (conveniently) represented by two byte sequences:
The DEL
control character (0x7F
) is accepted even in strict mode.
The behavior is undefined unless e_RELAXED == conformanceMode()
. Note that the representation of these byte sequences as C/C++ string literals requires that the escape character itself must be escaped:
Also note that the two resulting strings do not compare equal.
|
inline |
Set the conformanceMode
of this tokenizer to the specified mode
and return a non-const
reference to this tokenizer. If mode
is e_STRICT_20240119
the option values of this tokenizer are set to be fully compliant with RFC8259 (see https://www.rfc-editor.org/rfc/rfc8259)
Specifically, those option values are:
Otherwise (i.e., mode
is e_RELAXED
), those option values can be set in any combination. Note that the behavior is undefined if individual options are set when conformanceMode
is not e_RELAXED
.
|
inline |
int bdljsn::Tokenizer::value | ( | bsl::string_view * | data | ) | const |
Load into the specified data
the value of the specified token if the current token's type is e_ELEMENT_NAME
or e_ELEMENT_VALUE
or leave data
unmodified otherwise. Return 0 on success and a non-zero value otherwise. Note that the returned data
is only valid until the next manipulator call on this object.