Quick Links:

bal | bbl | bdl | bsl

Namespaces

Component balxml_utf8readerwrapper
[Package balxml]

Provide wrapper for Reader to check input UTF-8 validity. More...

Namespaces

namespace  balxml

Detailed Description

Outline
Purpose:
Provide wrapper for Reader to check input UTF-8 validity.
Classes:
balxml::Utf8ReaderWrapper Wrap a Reader, check UTF-8 input.
See also:
Component balxml_reader Component balxml_errorinfo bdlde_utf8streambufinputwrapper
Description:
This component supplies a mechanism, balxml::Utf8ReaderWrapper, which holds another object of type balxml::Reader and forwards operations to the held object. The held object is to operate on a bsl::streambuf, which is in fact a bdlde::Utf8CheckingInStreamBufWrapper contained in the object, which holds another bsl::streambuf and forward actions to that held bsl::streambuf.
The bdlde_Utf8StreamBufInputWrapper detects invalid UTF-8. If the input contains nothing but valid UTF-8, the bdlde_Utf8StreamBufInputWrapper simply forwards all operations to the bsl::streambuf it holds, and the wrapper has no influence on behavior.
Similarly, if the input contains nothing but valid UTF-8, the reader wrapper simply forwards all operations to the held Reader and has no influence on behavior.
If invalid UTF-8 occurs in the input, errorInfo().message() will reflect the nature of the UTF-8 error.
Usage:
This section illustrates intended use of this component.
Example 1: Routine Parsing:
Utility function to skip past white space.
  int advancePastWhiteSpace(balxml::Reader& reader)
  {
      static const char whiteSpace[] = "\n\r\t ";
      const char *value = 0;
      int         type = 0;
      int         rc = 0;

      do {
          rc    = reader.advanceToNextNode();
          value = reader.nodeValue();
          type  = reader.nodeType();
      } while ((0 == rc && type == balxml::Reader::e_NODE_TYPE_WHITESPACE) ||
               (type == balxml::Reader::e_NODE_TYPE_TEXT &&
                bsl::strlen(value) == bsl::strspn(value, whiteSpace)));

      assert( reader.nodeType() != balxml::Reader::e_NODE_TYPE_WHITESPACE);

      return rc;
  }
Then, in main, we parse an XML string using the UTF-8 reader wrapper:
The following string describes xml for a very simple user directory. The top level element contains one xml namespace attribute, with one embedded entry describing a user. The person's name contains some non-ascii UTF-8.
  static const char TEST_XML_STRING[] =
     "<?xml version='1.0' encoding='UTF-8'?>\n"
     "<directory-entry xmlns:dir='http://bloomberg.com/schemas/directory'>\n"
     "    <name>John Smith\xe7\x8f\x8f</name>\n"
     "    <phone dir:phonetype='cell'>212-318-2000</phone>\n"
     "    <address/>\n"
     "</directory-entry>\n";
In order to read the XML, we first need to construct a balxml::NamespaceRegistry object, a balxml::PrefixStack object, and a Utf8ReaderWrapper object.
  balxml::NamespaceRegistry namespaces;
  balxml::PrefixStack prefixStack(&namespaces);
  balxml::MiniReader miniReader;
  balxml::Utf8ReaderWrapper reader(&miniReader);

  assert(!reader.isOpen());
The reader uses a balxml::PrefixStack to manage namespace prefixes so we need to set it before we call open.
  reader.setPrefixStack(&prefixStack);
  assert(reader.prefixStack());
  assert(reader.prefixStack() == &prefixStack);
Now we call the open method to setup the reader for parsing using the data contained in the in the XML string.
  reader.open(TEST_XML_STRING, sizeof(TEST_XML_STRING) -1, 0, "UTF-8");
Confirm that the bdem::Reader has opened properly
  assert( reader.isOpen());
  assert(!bsl::strncmp(reader.documentEncoding(), "UTF-8", 5));
  assert( reader.nodeType() == balxml::Reader::e_NODE_TYPE_NONE);
  assert(!reader.nodeName());
  assert(!reader.nodeHasValue());
  assert(!reader.nodeValue());
  assert(!reader.nodeDepth());
  assert(!reader.numAttributes());
  assert(!reader.isEmptyElement());
Advance through all the nodes and assert all information contained at each node is correct.
Assert the next node's document type is xml.
  int rc = advancePastWhiteSpace(reader);
  assert( 0 == rc);
  assert( reader.nodeType() ==
                            balxml::Reader::e_NODE_TYPE_XML_DECLARATION);
  assert(!bsl::strcmp(reader.nodeName(), "xml"));
  assert( reader.nodeHasValue());
  assert(!bsl::strcmp(reader.nodeValue(), "version='1.0' encoding='UTF-8'"));
  assert( reader.nodeDepth() == 1);
  assert(!reader.numAttributes());
  assert(!reader.isEmptyElement());
  assert( 0 == rc);
  assert( reader.nodeDepth() == 1);
Advance to the top level element, which has one attribute, the xml namespace. Assert the namespace information has been added correctly to the prefix stack.
  rc = advancePastWhiteSpace(reader);
  assert( 0 == rc);
  assert( reader.nodeType() == balxml::Reader::e_NODE_TYPE_ELEMENT);
  assert(!bsl::strcmp(reader.nodeName(), "directory-entry"));
  assert(!reader.nodeHasValue());
  assert( reader.nodeDepth() == 1);
  assert( reader.numAttributes() == 1);
  assert(!reader.isEmptyElement());

  assert(!bsl::strcmp(prefixStack.lookupNamespacePrefix("dir"), "dir"));
  assert(prefixStack.lookupNamespaceId("dir") == 0);
  assert(!bsl::strcmp(prefixStack.lookupNamespaceUri("dir"),
                      "http://bloomberg.com/schemas/directory"));
The XML being read contains one entry describing a user, advance the users name name and assert all information can be read correctly.
  rc = advancePastWhiteSpace(reader);
  assert( 0 == rc);
  assert( reader.nodeType() == balxml::Reader::e_NODE_TYPE_ELEMENT);
  assert(!bsl::strcmp(reader.nodeName(), "name"));
  assert(!reader.nodeHasValue());
  assert( reader.nodeDepth() == 2);
  assert( reader.numAttributes() == 0);
  assert(!reader.isEmptyElement());

  rc = reader.advanceToNextNode();
  assert( 0 == rc);
  assert( reader.nodeType() == balxml::Reader::e_NODE_TYPE_TEXT);
  assert( reader.nodeHasValue());
  assert(!bsl::strcmp(reader.nodeValue(), "John Smith\xe7\x8f\x8f"));
  assert( reader.nodeDepth() == 3);
  assert( reader.numAttributes() == 0);
  assert(!reader.isEmptyElement());

  rc = reader.advanceToNextNode();
  assert( 0 == rc);
  assert( reader.nodeType() == balxml::Reader::e_NODE_TYPE_END_ELEMENT);
  assert(!bsl::strcmp(reader.nodeName(), "name"));
  assert(!reader.nodeHasValue());
  assert( reader.nodeDepth() == 2);
  assert( reader.numAttributes() == 0);
  assert(!reader.isEmptyElement());
Advance to the user's phone number and assert all information can be read correctly.
  rc = advancePastWhiteSpace(reader);
  assert( 0 == rc);
  assert( reader.nodeType() == balxml::Reader::e_NODE_TYPE_ELEMENT);
  assert(!bsl::strcmp(reader.nodeName(), "phone"));
  assert(!reader.nodeHasValue());
  assert( reader.nodeDepth() == 2);
  assert( reader.numAttributes() == 1);
  assert(!reader.isEmptyElement());
The phone node has one attribute, look it up and assert the balxml::ElementAttribute contains valid information and that the prefix returns the correct namespace URI from the prefix stack.
  balxml::ElementAttribute elemAttr;

  rc = reader.lookupAttribute(&elemAttr, 0);
  assert( 0 == rc);
  assert(!elemAttr.isNull());
  assert(!bsl::strcmp(elemAttr.qualifiedName(), "dir:phonetype"));
  assert(!bsl::strcmp(elemAttr.value(), "cell"));
  assert(!bsl::strcmp(elemAttr.prefix(), "dir"));
  assert(!bsl::strcmp(elemAttr.localName(), "phonetype"));
  assert(!bsl::strcmp(elemAttr.namespaceUri(),
                      "http://bloomberg.com/schemas/directory"));
  assert( elemAttr.namespaceId() == 0);

  assert(!bsl::strcmp(prefixStack.lookupNamespaceUri(elemAttr.prefix()),
                      elemAttr.namespaceUri()));

  rc = advancePastWhiteSpace(reader);
  assert( 0 == rc);
  assert( reader.nodeType() == balxml::Reader::e_NODE_TYPE_TEXT);
  assert( reader.nodeHasValue());
  assert(!bsl::strcmp(reader.nodeValue(), "212-318-2000"));
  assert( reader.nodeDepth() == 3);
  assert( reader.numAttributes() == 0);
  assert(!reader.isEmptyElement());

  rc = advancePastWhiteSpace(reader);
  assert( 0 == rc);
  assert( reader.nodeType() == balxml::Reader::e_NODE_TYPE_END_ELEMENT);
  assert(!bsl::strcmp(reader.nodeName(), "phone"));
  assert(!reader.nodeHasValue());
  assert( reader.nodeDepth() == 2);
  assert( reader.numAttributes() == 0);
  assert(!reader.isEmptyElement());
Advance to the user's address and assert all information can be read correctly.
  rc = advancePastWhiteSpace(reader);
  assert( 0 == rc);
  assert( reader.nodeType() == balxml::Reader::e_NODE_TYPE_ELEMENT);
  assert(!bsl::strcmp(reader.nodeName(), "address"));
  assert(!reader.nodeHasValue());
  assert( reader.nodeDepth() == 2);
  assert( reader.numAttributes() == 0);
  assert( reader.isEmptyElement());
Advance to the end element.
  rc = advancePastWhiteSpace(reader);
  assert( 0 == rc);
  assert( reader.nodeType() == balxml::Reader::e_NODE_TYPE_END_ELEMENT);
  assert(!bsl::strcmp(reader.nodeName(), "directory-entry"));
  assert(!reader.nodeHasValue());
  assert( reader.nodeDepth() == 1);
  assert( reader.numAttributes() == 0);
  assert(!reader.isEmptyElement());
Close the reader.
  reader.close();
  assert(!reader.isOpen());

  return 0;