BDE 4.14.0 Production release
|
Provide a utility functions for JSON strings.
This component defines a utility struct
, bdljsn::StringUtil
, that is a namespace for functions that convert arbitrary UTF-8 codepoint sequences to JSON strings and vice versa. The rules for these conversions are outlined below in {JSON Strings} and detailed in: https://www.rfc-editor.org/rfc/rfc8259#section-7 (RFC8259)
This utility provides two key functions:
writeString
: Given an arbitrary UTF-8 codepoint sequence, generate a JSON string representing the same codepoints.readString
: Given a JSON string (e.g., the output of writeString
), generate the equivalent sequence of UTF-8 code points.When using these functions, a UTF-8 codepoint sequence is always preserved on the round trip to JSON string and back; however, since there are equivalent allowed representations of a JSON string, the converse is not guaranteed.
JSON strings consist of UTF-8 codepoints surround by double quotes (i.e., '"') Within those double quotes certain characters must be escaped (i.e., replaced with some alternative, multi-byte representation). Those characters are:
U+0000
to U+001F
(inclusive).Each of the above characters can be escaped by replacing it with the six byte sequence consisting of:
u
, andFor example, the character that rings the console bell is represented as '\u0007'. Note that the hexadecimal digits can use upper or lower case letters but the lead u
character must be lower case. See {Strictness}.
Eight of the characters that must be escaped can be alternatively represented by special, 2-byte sequences:
Note that the above set is similar to but not identical to the set of two byte char
literals supported by C++. For example, '\0' (null) and '' (bell) are not included above.
By default, the bdljsn::StringUtil
read and write methods strictly follow the RFC8259 standard. Variances from those rules are expressed using bdljsn::StringUtil::FLags
, an enum
of flag values that can be set in the optional flags
parameter of the decoding methods. Multiple flags can be bitwise set in flags
; however, currently, just one variance flag is defined.
RFC8259 specifies that the 6-byte Unicode escape sequence start with a slash, /
, and lower-case u
. However, if the bdljsn::StringUtil::e_ACCEPT_CAPITAL_UNICODE_ESCAPE
is set, an upper-case U
is accepted as well. Thus, both '\u0007' and '\U0007' would be interpreted as the BELL character.
This section illustrates intended use of this component.
First, we initialize a string with a valid sequence of UTF-8 codepoints.
Notice that, as required by C++ syntax, several characters are represented by their two-character escape sequence: double quote (twice), bell, and newline.
Then, we examine the string as output:
and observe:
Notice that the backslash characters (having served their purpose of giving special meaning to the subsequent character) are not shown. The BELL and NEWLINE characters are output but are not visible.
Now, we generate JSON string equivalent of the initial
string.
and observed how the initial
string is represented for JSON:
Notice that:
Finally, we convert the jsonCompatibleString
back to its original content:
and observe (again):