|
static int | utf8ToUcs2 (unsigned short *dstBuffer, bsl::size_t dstCapacity, const char *srcString, bsl::size_t *numCharsWritten=0, unsigned short errorCharacter='?') |
|
static int | utf8ToUcs2 (bsl::vector< unsigned short > *result, const char *srcString, unsigned short errorCharacter='?') |
|
static int | utf8ToUcs2 (std::vector< unsigned short > *result, const char *srcString, unsigned short errorCharacter='?') |
|
static int | ucs2ToUtf8 (char *dstBuffer, bsl::size_t dstCapacity, const unsigned short *srcString, bsl::size_t *numCharsWritten=0, bsl::size_t *numBytesWritten=0) |
|
static int | ucs2ToUtf8 (bsl::string *result, const unsigned short *srcString, bsl::size_t *numCharsWritten=0) |
|
static int | ucs2ToUtf8 (std::string *result, const unsigned short *srcString, bsl::size_t *numCharsWritten=0) |
|
This struct
provides a namespace for a suite of pure procedures to convert character buffers between UTF-8 and UCS-2. UCS-2 conversions are performed to/from the full 2 ^ 16
bit space (the "UTF-16" hole U+D800-U+DFFF is not treated as a special case). Note that all C-style routines in this component honor strlcpy semantics, meaning that all returned C-style strings will be null-terminated as long as the return buffer size is positive (i.e., dstCapacity > 0
). Note that since all UCS-2 operations take place as unsigned short
s, byte order is not taken into consideration, and Byte Order Mark (BOM) characters are not generated. If a BOM is present in the input, it will be translated into the output.
static int bdlde::CharConvertUcs2::ucs2ToUtf8 |
( |
std::string * |
result, |
|
|
const unsigned short * |
srcString, |
|
|
bsl::size_t * |
numCharsWritten = 0 |
|
) |
| |
|
static |
Load, into the specified dstBuffer
of the specified dstCapacity
, the result of converting the specified null-terminated UCS-2 srcString
to its UTF-8 equivalent. Optionally specify numCharsWritten
which (if not 0) indicates the modifiable integer into which the number of UTF-8 characters written (including the null terminator) is to be loaded. Optionally specify numBytesWritten
which (if not 0) indicates the modifiable integer into which the number of bytes written (including the null terminator) is to be loaded. Return 0 on success and a bitwise-or of the masks specified by CharConvertStatus::Enum
otherwise, with CharConvertStatus::k_INVALID_INPUT_BIT
set to indicate that at least one invalid input sequence was encountered, and CharConvertStatus::k_OUT_OF_SPACE_BIT
set to indicate that dstCapacity
was insufficient to accommodate the output. If dstCapacity
was insufficient, the maximal null-terminated prefix of the properly converted result string is loaded into dstBuffer
. The behavior is undefined unless 0 <= dstCapacity
, dstBuffer
refers to an array of at least dstCapacity
elements, and srcString
is null-terminated. Note that if dstCapacity
is 0, this function returns exactly 2 and *numCharsWritten
and *numBytesWritten
(if not null) are loaded with 0 (since there is insufficient space for the null terminator even for an empty input string). Also note that since UTF-8 is a variable-length encoding, it is possible for numBytesWritten
to be greater than numCharsWritten
, and therefore that an input srcString
of dstCapacity - 1
characters may not fit into dstBuffer
.
static int bdlde::CharConvertUcs2::utf8ToUcs2 |
( |
std::vector< unsigned short > * |
result, |
|
|
const char * |
srcString, |
|
|
unsigned short |
errorCharacter = '?' |
|
) |
| |
|
static |
Load, into the specified dstBuffer
of the specified dstCapacity
, the result of converting the specified null-terminated UTF-8 srcString
to its UCS-2 equivalent. Optionally specify numCharsWritten
which (if non-zero) indicates the modifiable integer into which the number of characters written (including the null terminator) is to be loaded. Optionally specify errorCharacter
to be substituted for invalid (i.e., not convertible to UCS-2) input characters. If errorCharacter
is 0, invalid input characters are ignored (i.e., produce no corresponding output characters). Return 0 on success and a bitwise-or of the masks specified by CharConvertStatus::Enum
otherwise, with CharConvertStatus::k_INVALID_INPUT_BIT
set to indicate that at least one invalid input sequence was encountered, and CharConvertStatus::k_OUT_OF_SPACE_BIT
set to indicate that dstCapacity
was insufficient to accommodate the output. If dstCapacity
was insufficient, the maximal null-terminated prefix of the properly converted result string is loaded into dstBuffer
, and (unless null) *numCharsWritten
is set to dstCapacity
. The behavior is undefined unless 0 <= dstCapacity
, dstBuffer
refers to an array of at least dstCapacity
elements, and srcString
is null-terminated. Note that if dstCapacity
is 0, this function returns exactly 2 and *numCharsWritten
(if specified) is loaded with 0 (since there is insufficient space for the null terminator even for an empty input string).