BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bdljsn_stringutil

Detailed Description

Outline

Purpose

Provide a utility functions for JSON strings.

Classes

Description

This component defines a utility struct, bdljsn::StringUtil, that is a namespace for functions that convert arbitrary UTF-8 codepoint sequences to JSON strings and vice versa. The rules for these conversions are outlined below in {JSON Strings} and detailed in: https://www.rfc-editor.org/rfc/rfc8259#section-7 (RFC8259)

This utility provides two key functions:

When using these functions, a UTF-8 codepoint sequence is always preserved on the round trip to JSON string and back; however, since there are equivalent allowed representations of a JSON string, the converse is not guaranteed.

JSON Strings

JSON strings consist of UTF-8 codepoints surround by double quotes (i.e., '"') Within those double quotes certain characters must be escaped (i.e., replaced with some alternative, multi-byte representation). Those characters are:

Each of the above characters can be escaped by replacing it with the six byte sequence consisting of:

For example, the character that rings the console bell is represented as '\u0007'. Note that the hexadecimal digits can use upper or lower case letters but the lead u character must be lower case. See {Strictness}.

Eight of the characters that must be escaped can be alternatively represented by special, 2-byte sequences:

+---------+-----------------+---------------+---------------+
| Unicode | Description | 6-byte escape | 2-byte escape |
+---------+-----------------+---------------+---------------+
| U+0022 | quotation mark | \u0022 | \" |
| U+005C | backslash | \u005c | \\ |
| U+002F | slash | \u002f | \/ |
| U+0008 | backspace | \u0008 | \b |
| U+000C | form feed | \u000C | \f |
| U+000A | line feed | \u000A | \n |
| U+000D | carriage return | \u000D | \r |
| U+0009 | tab | \u0009 | \t |
+---------+-----------------+---------------+---------------+

Note that the above set is similar to but not identical to the set of two byte char literals supported by C++. For example, '\0' (null) and '' (bell) are not included above.

Guarantees: Arbitrary UTF-8 to JSON String

Strictness

By default, the bdljsn::StringUtil read and write methods strictly follow the RFC8259 standard. Variances from those rules are expressed using bdljsn::StringUtil::FLags, an enum of flag values that can be set in the optional flags parameter of the decoding methods. Multiple flags can be bitwise set in flags; however, currently, just one variance flag is defined.

Example Variance

RFC8259 specifies that the 6-byte Unicode escape sequence start with a slash, /, and lower-case u. However, if the bdljsn::StringUtil::e_ACCEPT_CAPITAL_UNICODE_ESCAPE is set, an upper-case U is accepted as well. Thus, both '\u0007' and '\U0007' would be interpreted as the BELL character.

Usage

This section illustrates intended use of this component.

Example 1: Encoding and Decoding a JSON String

First, we initialize a string with a valid sequence of UTF-8 codepoints.

bsl::string initial("Does the name \"Ivan Pavlov\" ring a bell\a?\n");
assert(bdlde::Utf8Util::isValid(initial));
Definition bslstl_string.h:1281
static bool isValid(const char *string)
Definition bdlde_utf8util.h:983

Notice that, as required by C++ syntax, several characters are represented by their two-character escape sequence: double quote (twice), bell, and newline.

Then, we examine the string as output:

bsl::cout << initial << bsl::endl;

and observe:

Does the name "Ivan Pavlov" ring a bell?

Notice that the backslash characters (having served their purpose of giving special meaning to the subsequent character) are not shown. The BELL and NEWLINE characters are output but are not visible.

Now, we generate JSON string equivalent of the initial string.

int rcEncode = bdljsn::StringUtil::writeString(oss, initial);
assert(0 == rcEncode);
bsl::string jsonCompatibleString = oss.str();
bsl::cout << jsonCompatibleString << bsl::endl;
Definition bslstl_ostringstream.h:175
void str(const StringType &value)
Definition bslstl_ostringstream.h:581
static int writeString(bsl::ostream &stream, const bsl::string_view &string)

and observed how the initial string is represented for JSON:

"Does the name \"Ivan Pavlov\" ring a bell\u0007?\n"

Notice that:

Finally, we convert the jsonCompatibleString back to its original content:

bsl::string fromJsonString;
const int rcDecode = bdljsn::StringUtil::readString(
&fromJsonString,
jsonCompatibleString);
assert(0 == rcDecode);
assert(initial == fromJsonString);
bsl::cout << fromJsonString << bsl::endl;
static int readString(bsl::string *value, const bsl::string_view &string, int flags=e_NONE)
Definition bdljsn_stringutil.h:285

and observe (again):

Does the name "Ivan Pavlov" ring a bell?