Provide wrapper for Reader
to check input UTF-8 validity.
More...
Detailed Description
- Outline
-
-
- Purpose:
- Provide wrapper for
Reader
to check input UTF-8 validity.
-
- Classes:
-
- See also:
- Component balxml_reader Component balxml_errorinfo bdlde_utf8streambufinputwrapper
-
- Description:
- This component supplies a mechanism,
balxml::Utf8ReaderWrapper
, which holds another object of type balxml::Reader
and forwards operations to the held object. The held object is to operate on a bsl::streambuf
, which is in fact a bdlde::Utf8CheckingInStreamBufWrapper
contained in the object, which holds another bsl::streambuf
and forward actions to that held bsl::streambuf
.
- The
bdlde_Utf8StreamBufInputWrapper
detects invalid UTF-8. If the input contains nothing but valid UTF-8, the bdlde_Utf8StreamBufInputWrapper
simply forwards all operations to the bsl::streambuf
it holds, and the wrapper has no influence on behavior.
- Similarly, if the input contains nothing but valid UTF-8, the reader wrapper simply forwards all operations to the held
Reader
and has no influence on behavior.
- If invalid UTF-8 occurs in the input,
errorInfo().message()
will reflect the nature of the UTF-8 error.
-
- Usage:
- This section illustrates intended use of this component.
-
- Example 1: Routine Parsing:
- Utility function to skip past white space. Then, in
main
, we parse an XML string using the UTF-8 reader wrapper:
- The following string describes xml for a very simple user directory. The top level element contains one xml namespace attribute, with one embedded entry describing a user. The person's name contains some non-ascii UTF-8.
static const char TEST_XML_STRING[] =
"<?xml version='1.0' encoding='UTF-8'?>\n"
"<directory-entry xmlns:dir='http://bloomberg.com/schemas/directory'>\n"
" <name>John Smith\xe7\x8f\x8f</name>\n"
" <phone dir:phonetype='cell'>212-318-2000</phone>\n"
" <address/>\n"
"</directory-entry>\n";
In order to read the XML, we first need to construct a balxml::NamespaceRegistry
object, a balxml::PrefixStack
object, and a Utf8ReaderWrapper
object. The reader uses a balxml::PrefixStack
to manage namespace prefixes so we need to set it before we call open. Now we call the open
method to setup the reader for parsing using the data contained in the in the XML string. reader.open(TEST_XML_STRING, sizeof(TEST_XML_STRING) -1, 0, "UTF-8");
Confirm that the bdem::Reader
has opened properly Advance through all the nodes and assert all information contained at each node is correct.
- Assert the next node's document type is xml.
int rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert( reader.nodeType() ==
balxml::Reader::e_NODE_TYPE_XML_DECLARATION);
assert(!bsl::strcmp(reader.nodeName(), "xml"));
assert( reader.nodeHasValue());
assert(!bsl::strcmp(reader.nodeValue(), "version='1.0' encoding='UTF-8'"));
assert( reader.nodeDepth() == 1);
assert(!reader.numAttributes());
assert(!reader.isEmptyElement());
assert( 0 == rc);
assert( reader.nodeDepth() == 1);
Advance to the top level element, which has one attribute, the xml namespace. Assert the namespace information has been added correctly to the prefix stack. The XML being read contains one entry describing a user, advance the users name name and assert all information can be read correctly. Advance to the user's phone number and assert all information can be read correctly. The phone node has one attribute, look it up and assert the balxml::ElementAttribute
contains valid information and that the prefix returns the correct namespace URI from the prefix stack. Advance to the user's address and assert all information can be read correctly. Advance to the end element. rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert( reader.nodeType() == balxml::Reader::e_NODE_TYPE_END_ELEMENT);
assert(!bsl::strcmp(reader.nodeName(), "directory-entry"));
assert(!reader.nodeHasValue());
assert( reader.nodeDepth() == 1);
assert( reader.numAttributes() == 0);
assert(!reader.isEmptyElement());
Close the reader.