Outline
Purpose
Provide wrapper for Reader
to check input UTF-8 validity.
Classes
- See also
- balxml_reader, balxml_errorinfo, bdlde_utf8streambufinputwrapper
Description
This component supplies a mechanism, balxml::Utf8ReaderWrapper
, which holds another object of type balxml::Reader
and forwards operations to the held object. The held object is to operate on a bsl::streambuf
, which is in fact a bdlde::Utf8CheckingInStreamBufWrapper
contained in the object, which holds another bsl::streambuf
and forward actions to that held bsl::streambuf
.
The bdlde_Utf8StreamBufInputWrapper
detects invalid UTF-8. If the input contains nothing but valid UTF-8, the bdlde_Utf8StreamBufInputWrapper
simply forwards all operations to the bsl::streambuf
it holds, and the wrapper has no influence on behavior.
Similarly, if the input contains nothing but valid UTF-8, the reader wrapper simply forwards all operations to the held Reader
and has no influence on behavior.
If invalid UTF-8 occurs in the input, errorInfo().message()
will reflect the nature of the UTF-8 error.
Usage
This section illustrates intended use of this component.
Example 1: Routine Parsing:
Utility function to skip past white space.
{
static const char whiteSpace[] = "\n\r\t ";
const char *value = 0;
int type = 0;
int rc = 0;
do {
bsl::strlen(value) == bsl::strspn(value, whiteSpace)));
return rc;
}
Definition balxml_reader.h:835
virtual int advanceToNextNode()=0
virtual NodeType nodeType() const =0
virtual const char * nodeValue() const =0
@ e_NODE_TYPE_TEXT
Definition balxml_reader.h:846
@ e_NODE_TYPE_WHITESPACE
Definition balxml_reader.h:856
Then, in main
, we parse an XML string using the UTF-8 reader wrapper:
The following string describes xml for a very simple user directory. The top level element contains one xml namespace attribute, with one embedded entry describing a user. The person's name contains some non-ascii UTF-8.
static const char TEST_XML_STRING[] =
"<?xml version='1.0' encoding='UTF-8'?>\n"
"<directory-entry xmlns:dir='http://bloomberg.com/schemas/directory'>\n"
" <name>John Smith\xe7\x8f\x8f</name>\n"
" <phone dir:phonetype='cell'>212-318-2000</phone>\n"
" <address/>\n"
"</directory-entry>\n";
In order to read the XML, we first need to construct a balxml::NamespaceRegistry
object, a balxml::PrefixStack
object, and a Utf8ReaderWrapper
object.
Definition balxml_minireader.h:343
Definition balxml_namespaceregistry.h:181
Definition balxml_prefixstack.h:137
virtual bool isOpen() const =0
Definition balxml_utf8readerwrapper.h:333
The reader uses a balxml::PrefixStack
to manage namespace prefixes so we need to set it before we call open.
virtual void setPrefixStack(PrefixStack *prefixes)=0
virtual PrefixStack * prefixStack() const =0
Now we call the open
method to setup the reader for parsing using the data contained in the in the XML string.
reader.
open(TEST_XML_STRING,
sizeof(TEST_XML_STRING) -1, 0,
"UTF-8");
virtual int open(const char *filename, const char *encoding=0)=0
Confirm that the bdem::Reader
has opened properly
virtual bool isEmptyElement() const =0
virtual const char * nodeName() const =0
virtual int nodeDepth() const =0
virtual const char * documentEncoding() const =0
@ e_NODE_TYPE_NONE
Definition balxml_reader.h:844
virtual bool nodeHasValue() const =0
Return true if the current node has a value and false otherwise.
virtual int numAttributes() const =0
Advance through all the nodes and assert all information contained at each node is correct.
Assert the next node's document type is xml.
int rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.
nodeName(),
"xml"));
assert(!bsl::strcmp(reader.
nodeValue(),
"version='1.0' encoding='UTF-8'"));
assert( 0 == rc);
@ e_NODE_TYPE_XML_DECLARATION
Definition balxml_reader.h:860
Advance to the top level element, which has one attribute, the xml namespace. Assert the namespace information has been added correctly to the prefix stack.
rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.
nodeName(),
"directory-entry"));
assert(!bsl::strcmp(prefixStack.lookupNamespacePrefix("dir"), "dir"));
assert(prefixStack.lookupNamespaceId("dir") == 0);
assert(!bsl::strcmp(prefixStack.lookupNamespaceUri("dir"),
"http://bloomberg.com/schemas/directory"));
@ e_NODE_TYPE_ELEMENT
Definition balxml_reader.h:845
The XML being read contains one entry describing a user, advance the users name name and assert all information can be read correctly.
rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.
nodeName(),
"name"));
assert( 0 == rc);
assert(!bsl::strcmp(reader.
nodeValue(),
"John Smith\xe7\x8f\x8f"));
assert( 0 == rc);
assert(!bsl::strcmp(reader.
nodeName(),
"name"));
@ e_NODE_TYPE_END_ELEMENT
Definition balxml_reader.h:858
Advance to the user's phone number and assert all information can be read correctly.
rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.
nodeName(),
"phone"));
The phone node has one attribute, look it up and assert the balxml::ElementAttribute
contains valid information and that the prefix returns the correct namespace URI from the prefix stack.
assert( 0 == rc);
assert(!bsl::strcmp(elemAttr.
value(),
"cell"));
assert(!bsl::strcmp(elemAttr.
prefix(),
"dir"));
assert(!bsl::strcmp(elemAttr.
localName(),
"phonetype"));
"http://bloomberg.com/schemas/directory"));
assert(!bsl::strcmp(prefixStack.lookupNamespaceUri(elemAttr.
prefix()),
rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.
nodeValue(),
"212-318-2000"));
rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.
nodeName(),
"phone"));
Definition balxml_elementattribute.h:289
const char * prefix() const
const char * localName() const
const char * namespaceUri() const
const char * qualifiedName() const
Definition balxml_elementattribute.h:517
const char * value() const
Definition balxml_elementattribute.h:523
bool isNull() const
Definition balxml_elementattribute.h:535
virtual int lookupAttribute(ElementAttribute *attribute, int index) const =0
Advance to the user's address and assert all information can be read correctly.
rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.
nodeName(),
"address"));
Advance to the end element.
rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.
nodeName(),
"directory-entry"));
Close the reader.