BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bdljsn_tokenizer

Detailed Description

Outline

Purpose

Provide a tokenizer for extracting JSON data from a streambuf.

Classes

See also
baljsn_decoder

Description

This component provides a class, bdljsn::Tokenizer, that traverses data stored in a bsl::streambuf one node at a time and provides clients access to the data associated with that node, including its type and data value. Client code can use the reset function to associate a bsl::streambuf containing JSON data with a tokenizer object and then call the advanceToNextToken function to extract individual data values.

This class was created to be used by other components in the bdljsn and baljsn packages and in most cases clients should use the bdljsn_jsonutil , baljsn_decoder , or bdljsn_datumutil components instead of using this class.

On malformed JSON, tokenization may fail before the end of input is reached, but not all such errors are detected. In particular, callers should check that closing brackets and braces match opening ones.

Strict Conformance

The bdljsn::Tokenizer class allows several convenient variances from the JSON grammar as described in RFC8259 (see https://www.rfc-editor.org/rfc/rfc8259). If strict conformance is needed, users can put the tokenizer into strict conformance mode (see setConformanceMode). The behavioral differences are each controlled by options. The differences between a default constructed tokenizer and one in strict mode are:

Option Default Strict
-------------------------------- ------- ------
allowConsecutiveSeparators true false
allowFormFeedAsWhitespace true false
allowHeterogenousArrays true true
allowNonUtf8StringLiterals true false
allowStandAloneValues true true
allowTrailingTopLevelComma true false
allowUnescapedControlCharacters true false

The default-constructed bdljsn::Tokenizer is created having the options shown above (in the"Default" column) and a conformancemode of bdljsn::e_RELAXED. Accordingly, users are free to change any of the option values to any combination that may be needed; however, once a tokenizer is set to strict mode the options are set to the values shown above (in the "Strict" column) and changes are not allowed (doing so leads to undefined behavior) unless the conformance mode is again set to relaxed.

Usage

This section illustrates intended use of this component.

Example 1: Extracting JSON Data into an Object

For this example, we will use bdljsn::Tokenizer to read each node in a JSON document and populate a simple Employee object.

First, we will define the JSON data that the tokenizer will traverse over:

const char *INPUT = " {\n"
" \"street\" : \"Lexington Ave\",\n"
" \"state\" : \"New York\",\n"
" \"zipcode\" : \"10022-1331\",\n"
" \"floorCount\" : 55\n"
" }";

Next, we will construct populate a streambuf with this data:

bdlsb::FixedMemInStreamBuf isb(INPUT, bsl::strlen(INPUT));
Definition bdlsb_fixedmeminstreambuf.h:187

Then, we will create a bdljsn::Tokenizer object and associate the above streambuf with it:

tokenizer.reset(&isb);
Definition bdljsn_tokenizer.h:234
void reset(bsl::streambuf *streambuf)
Definition bdljsn_tokenizer.h:753

Next, we will create an address record type and object.

struct Address {
bsl::string d_street;
bsl::string d_state;
bsl::string d_zipcode;
int d_floorCount;
} address = { "", "", "", 0 };
Definition bslstl_string.h:1281

Then, we will traverse the JSON data one node at a time:

// Read '{'
int rc = tokenizer.advanceToNextToken();
assert(!rc);
rc = tokenizer.advanceToNextToken();
assert(!rc);
token = tokenizer.tokenType();
// Continue reading elements till '}' is encountered
// Read element name
bslstl::StringRef nodeValue;
rc = tokenizer.value(&nodeValue);
assert(!rc);
bsl::string elementName = nodeValue;
// Read element value
int rc = tokenizer.advanceToNextToken();
assert(!rc);
token = tokenizer.tokenType();
rc = tokenizer.value(&nodeValue);
assert(!rc);
// Extract the simple type with the data
if (elementName == "street") {
rc = bdljsn::StringUtil::readString(&address.d_street, nodeValue);
assert(!rc);
}
else if (elementName == "state") {
rc = bdljsn::StringUtil::readString(&address.d_state, nodeValue);
assert(!rc);
}
else if (elementName == "zipcode") {
rc = bdljsn::StringUtil::readString(&address.d_zipcode, nodeValue);
assert(!rc);
}
else if (elementName == "floorCount") {
rc = bdljsn::NumberUtil::asInt(&address.d_floorCount, nodeValue);
assert(!rc);
}
rc = tokenizer.advanceToNextToken();
assert(!rc);
token = tokenizer.tokenType();
}
TokenType tokenType() const
Return the token type of the current token.
Definition bdljsn_tokenizer.h:925
TokenType
Definition bdljsn_tokenizer.h:241
@ e_ELEMENT_NAME
Definition bdljsn_tokenizer.h:245
@ e_END_OBJECT
Definition bdljsn_tokenizer.h:247
@ e_ELEMENT_VALUE
Definition bdljsn_tokenizer.h:250
@ e_START_OBJECT
Definition bdljsn_tokenizer.h:246
int value(bsl::string_view *data) const
Definition bslstl_stringref.h:372
static int asInt(int *result, const bsl::string_view &value)
Definition bdljsn_numberutil.h:489
static int readString(bsl::string *value, const bsl::string_view &string, int flags=e_NONE)
Definition bdljsn_stringutil.h:285

Finally, we will verify that the address aggregate has the correct values:

assert("Lexington Ave" == address.d_street);
assert("New York" == address.d_state);
assert("10022-1331" == address.d_zipcode);
assert(55 == address.d_floorCount);