BDE 4.14.0 Production release
Loading...
Searching...
No Matches
balxml_utf8readerwrapper

Detailed Description

Outline

Purpose

Provide wrapper for Reader to check input UTF-8 validity.

Classes

See also
balxml_reader, balxml_errorinfo, bdlde_utf8streambufinputwrapper

Description

This component supplies a mechanism, balxml::Utf8ReaderWrapper, which holds another object of type balxml::Reader and forwards operations to the held object. The held object is to operate on a bsl::streambuf, which is in fact a bdlde::Utf8CheckingInStreamBufWrapper contained in the object, which holds another bsl::streambuf and forward actions to that held bsl::streambuf.

The bdlde_Utf8StreamBufInputWrapper detects invalid UTF-8. If the input contains nothing but valid UTF-8, the bdlde_Utf8StreamBufInputWrapper simply forwards all operations to the bsl::streambuf it holds, and the wrapper has no influence on behavior.

Similarly, if the input contains nothing but valid UTF-8, the reader wrapper simply forwards all operations to the held Reader and has no influence on behavior.

If invalid UTF-8 occurs in the input, errorInfo().message() will reflect the nature of the UTF-8 error.

Usage

This section illustrates intended use of this component.

Example 1: Routine Parsing:

Utility function to skip past white space.

int advancePastWhiteSpace(balxml::Reader& reader)
{
static const char whiteSpace[] = "\n\r\t ";
const char *value = 0;
int type = 0;
int rc = 0;
do {
rc = reader.advanceToNextNode();
value = reader.nodeValue();
type = reader.nodeType();
} while ((0 == rc && type == balxml::Reader::e_NODE_TYPE_WHITESPACE) ||
bsl::strlen(value) == bsl::strspn(value, whiteSpace)));
return rc;
}
Definition balxml_reader.h:835
virtual int advanceToNextNode()=0
virtual NodeType nodeType() const =0
virtual const char * nodeValue() const =0
@ e_NODE_TYPE_TEXT
Definition balxml_reader.h:846
@ e_NODE_TYPE_WHITESPACE
Definition balxml_reader.h:856

Then, in main, we parse an XML string using the UTF-8 reader wrapper:

The following string describes xml for a very simple user directory. The top level element contains one xml namespace attribute, with one embedded entry describing a user. The person's name contains some non-ascii UTF-8.

static const char TEST_XML_STRING[] =
"<?xml version='1.0' encoding='UTF-8'?>\n"
"<directory-entry xmlns:dir='http://bloomberg.com/schemas/directory'>\n"
" <name>John Smith\xe7\x8f\x8f</name>\n"
" <phone dir:phonetype='cell'>212-318-2000</phone>\n"
" <address/>\n"
"</directory-entry>\n";

In order to read the XML, we first need to construct a balxml::NamespaceRegistry object, a balxml::PrefixStack object, and a Utf8ReaderWrapper object.

balxml::PrefixStack prefixStack(&namespaces);
balxml::MiniReader miniReader;
balxml::Utf8ReaderWrapper reader(&miniReader);
assert(!reader.isOpen());
Definition balxml_minireader.h:343
Definition balxml_namespaceregistry.h:181
Definition balxml_prefixstack.h:137
virtual bool isOpen() const =0
Definition balxml_utf8readerwrapper.h:333

The reader uses a balxml::PrefixStack to manage namespace prefixes so we need to set it before we call open.

reader.setPrefixStack(&prefixStack);
assert(reader.prefixStack());
assert(reader.prefixStack() == &prefixStack);
virtual void setPrefixStack(PrefixStack *prefixes)=0
virtual PrefixStack * prefixStack() const =0

Now we call the open method to setup the reader for parsing using the data contained in the in the XML string.

reader.open(TEST_XML_STRING, sizeof(TEST_XML_STRING) -1, 0, "UTF-8");
virtual int open(const char *filename, const char *encoding=0)=0

Confirm that the bdem::Reader has opened properly

assert( reader.isOpen());
assert(!bsl::strncmp(reader.documentEncoding(), "UTF-8", 5));
assert(!reader.nodeName());
assert(!reader.nodeHasValue());
assert(!reader.nodeValue());
assert(!reader.nodeDepth());
assert(!reader.numAttributes());
assert(!reader.isEmptyElement());
virtual bool isEmptyElement() const =0
virtual const char * nodeName() const =0
virtual int nodeDepth() const =0
virtual const char * documentEncoding() const =0
@ e_NODE_TYPE_NONE
Definition balxml_reader.h:844
virtual bool nodeHasValue() const =0
Return true if the current node has a value and false otherwise.
virtual int numAttributes() const =0

Advance through all the nodes and assert all information contained at each node is correct.

Assert the next node's document type is xml.

int rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert( reader.nodeType() ==
assert(!bsl::strcmp(reader.nodeName(), "xml"));
assert( reader.nodeHasValue());
assert(!bsl::strcmp(reader.nodeValue(), "version='1.0' encoding='UTF-8'"));
assert( reader.nodeDepth() == 1);
assert(!reader.numAttributes());
assert(!reader.isEmptyElement());
assert( 0 == rc);
assert( reader.nodeDepth() == 1);
@ e_NODE_TYPE_XML_DECLARATION
Definition balxml_reader.h:860

Advance to the top level element, which has one attribute, the xml namespace. Assert the namespace information has been added correctly to the prefix stack.

rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.nodeName(), "directory-entry"));
assert(!reader.nodeHasValue());
assert( reader.nodeDepth() == 1);
assert( reader.numAttributes() == 1);
assert(!reader.isEmptyElement());
assert(!bsl::strcmp(prefixStack.lookupNamespacePrefix("dir"), "dir"));
assert(prefixStack.lookupNamespaceId("dir") == 0);
assert(!bsl::strcmp(prefixStack.lookupNamespaceUri("dir"),
"http://bloomberg.com/schemas/directory"));
@ e_NODE_TYPE_ELEMENT
Definition balxml_reader.h:845

The XML being read contains one entry describing a user, advance the users name name and assert all information can be read correctly.

rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.nodeName(), "name"));
assert(!reader.nodeHasValue());
assert( reader.nodeDepth() == 2);
assert( reader.numAttributes() == 0);
assert(!reader.isEmptyElement());
rc = reader.advanceToNextNode();
assert( 0 == rc);
assert( reader.nodeHasValue());
assert(!bsl::strcmp(reader.nodeValue(), "John Smith\xe7\x8f\x8f"));
assert( reader.nodeDepth() == 3);
assert( reader.numAttributes() == 0);
assert(!reader.isEmptyElement());
rc = reader.advanceToNextNode();
assert( 0 == rc);
assert(!bsl::strcmp(reader.nodeName(), "name"));
assert(!reader.nodeHasValue());
assert( reader.nodeDepth() == 2);
assert( reader.numAttributes() == 0);
assert(!reader.isEmptyElement());
@ e_NODE_TYPE_END_ELEMENT
Definition balxml_reader.h:858

Advance to the user's phone number and assert all information can be read correctly.

rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.nodeName(), "phone"));
assert(!reader.nodeHasValue());
assert( reader.nodeDepth() == 2);
assert( reader.numAttributes() == 1);
assert(!reader.isEmptyElement());

The phone node has one attribute, look it up and assert the balxml::ElementAttribute contains valid information and that the prefix returns the correct namespace URI from the prefix stack.

rc = reader.lookupAttribute(&elemAttr, 0);
assert( 0 == rc);
assert(!elemAttr.isNull());
assert(!bsl::strcmp(elemAttr.qualifiedName(), "dir:phonetype"));
assert(!bsl::strcmp(elemAttr.value(), "cell"));
assert(!bsl::strcmp(elemAttr.prefix(), "dir"));
assert(!bsl::strcmp(elemAttr.localName(), "phonetype"));
assert(!bsl::strcmp(elemAttr.namespaceUri(),
"http://bloomberg.com/schemas/directory"));
assert( elemAttr.namespaceId() == 0);
assert(!bsl::strcmp(prefixStack.lookupNamespaceUri(elemAttr.prefix()),
elemAttr.namespaceUri()));
rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert( reader.nodeHasValue());
assert(!bsl::strcmp(reader.nodeValue(), "212-318-2000"));
assert( reader.nodeDepth() == 3);
assert( reader.numAttributes() == 0);
assert(!reader.isEmptyElement());
rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.nodeName(), "phone"));
assert(!reader.nodeHasValue());
assert( reader.nodeDepth() == 2);
assert( reader.numAttributes() == 0);
assert(!reader.isEmptyElement());
Definition balxml_elementattribute.h:289
const char * prefix() const
const char * localName() const
const char * namespaceUri() const
const char * qualifiedName() const
Definition balxml_elementattribute.h:517
const char * value() const
Definition balxml_elementattribute.h:523
bool isNull() const
Definition balxml_elementattribute.h:535
virtual int lookupAttribute(ElementAttribute *attribute, int index) const =0

Advance to the user's address and assert all information can be read correctly.

rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.nodeName(), "address"));
assert(!reader.nodeHasValue());
assert( reader.nodeDepth() == 2);
assert( reader.numAttributes() == 0);
assert( reader.isEmptyElement());

Advance to the end element.

rc = advancePastWhiteSpace(reader);
assert( 0 == rc);
assert(!bsl::strcmp(reader.nodeName(), "directory-entry"));
assert(!reader.nodeHasValue());
assert( reader.nodeDepth() == 1);
assert( reader.numAttributes() == 0);
assert(!reader.isEmptyElement());

Close the reader.

reader.close();
assert(!reader.isOpen());
return 0;
virtual void close()=0