Quick Links: |
Provide common reader protocol for parsing XML documents. More...
Namespaces | |
namespace | balxml |
balxml::Reader | protocol for fast, forward-only access to XML data stream |
balxml::Reader
that defines an interface for accessing a forward-only, read-only stream of XML data. The balxml::Reader
interface is somewhat similar to Microsoft XmlReader interface, which provides a simpler and more flexible programming model than the quasi-standard SAX/SAX2 model and a (potentially) more memory-efficient programming model than DOM. Access to the data is done in a cursor-like fashion, going forward on the document stream and stopping at each node along the way. A "node" is an XML syntactic construct such as the start of an element, the end of an element, element text, etc.. (See the balxml::Reader::NodeType
enumeration for a complete list.) Note that, unlike the Microsoft interace, an element attribute is not considered a node in this interface, but is rather considered an attribute of a start-element node. In the documentation below the "current node" refers to the node on which the reader is currently positioned. The client code advances through all of the nodes in the XML document by calling the advanceToNextNode
function repeatedly and processing each node in the order it appears in the xml document. balxml::Reader
supplies accessors that query a node's attributes, such as the node's type, name, value, element attributes, etc.. Note that each call to advanceToNextNode
invalidates strings and data structures returned when the balxml::Reader
accessors were call for the prior node. E.g., the pointer returned from nodeName
for one node will not be valid once the reader has advanced to the next node. The fact that this interface provides so little prior context gives the derived-class implementations the potential to be very efficient in their use of memory. balxml::Reader::NodeType
enumeration for a complete list.) balxml::Reader
interface provides access to the entire qualified name and separate access to the prefix, the local name, the namespace URI, and the namespace ID. nodeBaseUri
return an empty string. documentEncoding
method can differ from the encoding of the strings returned from the balxml::Reader
accessors; all strings returned by these accessors are UTF-8 regardless of the encoding used in the original document. balxml::Reader::open
method allows clients to specify an encoding to use. The encoding passed to balxml::Reader::open
will take effect only when there is no encoding information in the original document, i.e., the encoding information obtained from the original document trumps all. If there is no encoding provided within the document and the client has not provided one via the balxml::Reader::open
method, then a derived-class implementation should set the encoding to UTF-8. (See the balxml::Reader::open
method for more details.) balxml::Reader
class is abstract and cannot be instantiated. There is no guarantee that any specific derived class will provide a thread-safe implementation. const char TEST_XML_STRING[] = "<?xml version='1.0' encoding='UTF-8'?>\n" "<directory-entry xmlns:dir='http://bloomberg.com/schemas/directory'>\n" " <name>John Smith</name>\n" " <phone dir:phonetype='cell'>212-318-2000</phone>\n" " <address/>\n" "</directory-entry>\n";
balxml::NamespaceRegistry
object, a balxml::PrefixStack
object, and a TestReader
object, where TestReader
is an implementation of balxml::Reader
. balxml::NamespaceRegistry namespaces; balxml::PrefixStack prefixStack(&namespaces); TestReader testReader; balxml::Reader& reader = testReader;
balxml::PrefixStack
to manage namespace prefixes. Installing a stack for an open reader leads to undefined behavior. So, we want to ensure that our reader is not open before installation. assert(false == reader.isOpen()); reader.setPrefixStack(&prefixStack); assert(&prefixStack == reader.prefixStack());
open
method to setup the reader for parsing using the data contained in the XML string. reader.open(TEST_XML_STRING, sizeof(TEST_XML_STRING) -1, 0, "UTF-8");
bdem::Reader
has opened properly. assert(true == reader.isOpen());
int rc = 0; bsl::string name; bsl::string number; do { rc = reader.advanceToNextNode(); assert(0 == rc); } while (bsl::strcmp(reader.nodeName(), "name")); rc = reader.advanceToNextNode(); assert(0 == rc); assert(3 == reader.nodeDepth()); assert(balxml::Reader::e_NODE_TYPE_TEXT == reader.nodeType()); assert(true == reader.nodeHasValue()); name.assign(reader.nodeValue());
do { rc = reader.advanceToNextNode(); assert(0 == rc); } while (bsl::strcmp(reader.nodeName(), "phone")); assert(false == reader.isEmptyElement()); assert(1 == reader.numAttributes()); balxml::ElementAttribute elemAttr; rc = reader.lookupAttribute(&elemAttr, 0); assert(0 == rc); assert(false == elemAttr.isNull()); if (!bsl::strcmp(elemAttr.value(), "cell")) { rc = reader.advanceToNextNode(); assert(0 == rc); assert(balxml::Reader::e_NODE_TYPE_TEXT == reader.nodeType()); assert(true == reader.nodeHasValue()); number.assign(reader.nodeValue()); }
assert("John Smith" == name); assert("212-318-2000" == number);
balxml::Reader
protocol, but to make the example easier to read and shorter we will stub some methods. Moreover, we will provide fake implementations of the methods used in this example, so our implementation will not handle the given XML fragment, but iterate through some supposititious XML structure. struct TestNode { // A struct that contains information capable of describing an XML // node. // TYPES struct Attribute { // This struct represents the qualified name and value of an XML // attribute. const char *d_qname; // qualified name of the attribute const char *d_value; // value of the attribute }; enum { k_NUM_ATTRIBUTES = 5 }; // DATA balxml::Reader::NodeType d_type; // type of the node const char *d_qname; // qualified name of the node const char *d_nodeValue; // value of the XML node (if it's null, 'hasValue()' returns // 'false') int d_depthChange; // adjustment for the depth level of 'TestReader', valid values are // -1, 0 or 1 bool d_isEmpty; // flag indicating whether the element is empty Attribute d_attributes[k_NUM_ATTRIBUTES]; // array of attributes }; static const TestNode fakeDocument[] = { // 'fakeDocument' is an array of 'TestNode' objects, that will be used // by the 'TestReader' to traverse and describe the user directory XML // above. { balxml::Reader::e_NODE_TYPE_NONE, 0 , 0 , 0, false, {} }, { balxml::Reader::e_NODE_TYPE_XML_DECLARATION, "xml" , "version='1.0' encoding='UTF-8'", +1, false, {} }, { balxml::Reader::e_NODE_TYPE_ELEMENT, "directory-entry" , 0 , 0, false, {"xmlns:dir" , "http://bloomberg.com/schemas/directory"} }, { balxml::Reader::e_NODE_TYPE_ELEMENT, "name" , 0 , +1, false, {} }, { balxml::Reader::e_NODE_TYPE_TEXT, 0 , "John Smith" , +1, false, {} }, { balxml::Reader::e_NODE_TYPE_END_ELEMENT, "name" , 0 , -1, false, {} }, { balxml::Reader::e_NODE_TYPE_ELEMENT, "phone" , 0 , 0, false, {"dir:phonetype", "cell"} }, { balxml::Reader::e_NODE_TYPE_TEXT, 0 , "212-318-2000" , +1, false, {} }, { balxml::Reader::e_NODE_TYPE_END_ELEMENT, "phone" , 0 , -1, false, {} }, { balxml::Reader::e_NODE_TYPE_ELEMENT, "address" , 0 , 0, true, {} }, { balxml::Reader::e_NODE_TYPE_END_ELEMENT, "directory-entry", 0 , -1, false, {} }, { balxml::Reader::e_NODE_TYPE_NONE, 0 , 0 , 0, false, {} }, };
balxml::Reader
interface. Note that documentation for class methods is omitted to reduce the text of the usage example. If necessary, it can be seen in the balxml::Reader
class declaration. // ================ // class TestReader // ================ class TestReader : public balxml::Reader { private: // DATA balxml::ErrorInfo d_errorInfo; // current error information balxml::PrefixStack *d_prefixes; // prefix stack (held, not owned) XmlResolverFunctor d_resolver; // place holder, not actually used bool d_isOpen; // flag indicating whether the // reader is open bsl::string d_encoding; // document encoding int d_nodeDepth; // level of the current node const TestNode *d_currentNode; // node being handled (held, not // owned) // PRIVATE CLASS METHODS void setEncoding(const char *encoding); void adjustPrefixStack(); public: // CREATORS TestReader(); virtual ~TestReader(); // MANIPULATORS virtual void setResolver(XmlResolverFunctor resolver); virtual void setPrefixStack(balxml::PrefixStack *prefixes); virtual int open(const char *filename, const char *encoding = 0); virtual int open(const char *buffer, size_t size, const char *url = 0, const char *encoding = 0); virtual int open(bsl::streambuf *stream, const char *url = 0, const char *encoding = 0); virtual void close(); virtual int advanceToNextNode(); virtual int lookupAttribute(balxml::ElementAttribute *attribute, int index) const; virtual int lookupAttribute(balxml::ElementAttribute *attribute, const char *qname) const; virtual int lookupAttribute( balxml::ElementAttribute *attribute, const char *localName, const char *namespaceUri) const; virtual int lookupAttribute( balxml::ElementAttribute *attribute, const char *localName, int namespaceId) const; virtual void setOptions(unsigned int flags); // ACCESSORS virtual const char *documentEncoding() const; virtual XmlResolverFunctor resolver() const; virtual bool isOpen() const; virtual const balxml::ErrorInfo& errorInfo() const; virtual int getLineNumber() const; virtual int getColumnNumber() const; virtual balxml::PrefixStack *prefixStack() const; virtual NodeType nodeType() const; virtual const char *nodeName() const; virtual const char *nodeLocalName() const; virtual const char *nodePrefix() const; virtual int nodeNamespaceId() const; virtual const char *nodeNamespaceUri() const; virtual const char *nodeBaseUri() const; virtual bool nodeHasValue() const; virtual const char *nodeValue() const; virtual int nodeDepth() const; virtual int numAttributes() const; virtual bool isEmptyElement() const; virtual unsigned int options() const; }; // ---------------- // class TestReader // ---------------- // PRIVATE CLASS METHODS inline void TestReader::setEncoding(const char *encoding) { d_encoding = (0 == encoding || '\0' == encoding[0]) ? "UTF-8" : encoding; } inline void TestReader::adjustPrefixStack() { // Each time this object reads a 'e_NODE_TYPE_ELEMENT' node, it must // push a namespace prefix onto the prefix stack to handle in-scope // namespace calculations that happen inside XML documents where inner // namespaces can override outer ones. if (balxml::Reader::e_NODE_TYPE_ELEMENT == d_currentNode->d_type) { for (int ii = 0; ii < TestNode::k_NUM_ATTRIBUTES; ++ii) { const char *prefix = d_currentNode->d_attributes[ii].d_qname; if (!prefix || bsl::strncmp("xmlns", prefix, 5)) { continue; } if (':' == prefix[5]) { d_prefixes->pushPrefix( prefix + 6, d_currentNode->d_attributes[ii].d_value); } else { // default namespace d_prefixes->pushPrefix( "", d_currentNode->d_attributes[ii].d_value); } } } else if (balxml::Reader::e_NODE_TYPE_NONE == d_currentNode->d_type) { d_prefixes->reset(); } } // PUBLIC CREATORS TestReader::TestReader() : d_errorInfo() , d_prefixes(0) , d_resolver() , d_isOpen(false) , d_encoding() , d_nodeDepth(0) , d_currentNode(0) { } TestReader::~TestReader() { } // MANIPULATORS void TestReader::setResolver(XmlResolverFunctor resolver) { d_resolver = resolver; } void TestReader::setPrefixStack(balxml::PrefixStack *prefixes) { assert(!d_isOpen); d_prefixes = prefixes; } int TestReader::open(const char * /* filename */, const char * /* encoding */) { return -1; // STUB } int TestReader::open(const char * /* buffer */, size_t /* size */, const char * /* url */, const char *encoding) { if (d_isOpen) { return false; // RETURN } d_isOpen = true; d_nodeDepth = 0;
d_currentNode = fakeDocument; setEncoding(encoding); return 0; } int TestReader::open(bsl::streambuf * /* stream */, const char * /* url */, const char * /* encoding */) { return -1; // STUB } void TestReader::close() { if (d_prefixes) { d_prefixes->reset(); } d_isOpen = false; d_encoding.clear(); d_nodeDepth = 0; d_currentNode = 0; } int TestReader::advanceToNextNode() { if (!d_currentNode) { return -1; // RETURN } const TestNode *nextNode = d_currentNode + 1; if (balxml::Reader::e_NODE_TYPE_NONE == nextNode->d_type) { // The document ends when the type of the next node is // 'e_NODE_TYPE_NONE'. d_prefixes->reset(); return 1; // RETURN } d_currentNode = nextNode; if (d_prefixes && 1 == d_nodeDepth) { // A 'TestReader' only recognizes namespace URIs that have the // prefix "xmlns:" on the top-level element. A 'TestReader' adds // such URIs to its prefix stack. It treats namespace URI // declarations on any other elements like normal attributes, and // resets its prefix stack once the top level element closes. adjustPrefixStack(); } d_nodeDepth += d_currentNode->d_depthChange; return 0; } int TestReader::lookupAttribute(balxml::ElementAttribute *attribute, int index) const { if (!d_currentNode || index < 0 || index >= TestNode::k_NUM_ATTRIBUTES) { return 1; // RETURN } const char *qname = d_currentNode->d_attributes[index].d_qname; if ('\0' == qname[0]) { return 1; // RETURN } attribute->reset( d_prefixes, qname, d_currentNode->d_attributes[index].d_value); return 0; } int TestReader::lookupAttribute( balxml::ElementAttribute * /* attribute */, const char * /* qname */) const { return -1; // STUB } int TestReader::lookupAttribute( balxml::ElementAttribute * /* attribute */, const char * /* localName */, const char * /* namespaceUri */) const { return -1; // STUB } int TestReader::lookupAttribute( balxml::ElementAttribute * /* attribute */, const char * /* localName */, int /* namespaceId */) const { return -1; // STUB } void TestReader::setOptions(unsigned int /* flags */) { return; // STUB } // ACCESSORS const char *TestReader::documentEncoding() const { return d_encoding.c_str(); } TestReader::XmlResolverFunctor TestReader::resolver() const { return d_resolver; } bool TestReader::isOpen() const { return d_isOpen; } const balxml::ErrorInfo& TestReader::errorInfo() const { return d_errorInfo; } int TestReader::getLineNumber() const { return 0; // STUB } int TestReader::getColumnNumber() const { return 0; // STUB } balxml::PrefixStack *TestReader::prefixStack() const { return d_prefixes; } TestReader::NodeType TestReader::nodeType() const { if (!d_currentNode || !d_isOpen) { return e_NODE_TYPE_NONE; // RETURN } return d_currentNode->d_type; } const char *TestReader::nodeName() const { if (!d_currentNode || !d_isOpen) { return 0; // RETURN } return d_currentNode->d_qname; } const char *TestReader::nodeLocalName() const { if (!d_currentNode || !d_isOpen) { return 0; // RETURN } // This simple 'TestReader' does not understand XML that contains // qualified node names. This means the local name of a node is always // equal to its qualified name, so this function simply returns // 'd_qname'. return d_currentNode->d_qname; } const char *TestReader::nodePrefix() const { return ""; // STUB } int TestReader::nodeNamespaceId() const { return -1; // STUB } const char *TestReader::nodeNamespaceUri() const { return ""; // STUB } const char *TestReader::nodeBaseUri() const { return ""; // STUB } bool TestReader::nodeHasValue() const { if (!d_currentNode || !d_isOpen) { return false; // RETURN } if (0 == d_currentNode->d_nodeValue) { return false; // RETURN } return ('\0' != d_currentNode->d_nodeValue[0]); } const char *TestReader::nodeValue() const { if (!d_currentNode || !d_isOpen) { return 0; // RETURN } return d_currentNode->d_nodeValue; } int TestReader::nodeDepth() const { return d_nodeDepth; } int TestReader::numAttributes() const { for (int index = 0; index < TestNode::k_NUM_ATTRIBUTES; ++index) { if (0 == d_currentNode->d_attributes[index].d_qname) { return index; // RETURN } } return TestNode::k_NUM_ATTRIBUTES; } bool TestReader::isEmptyElement() const { return d_currentNode->d_isEmpty; } unsigned int TestReader::options() const { return 0; }
balxml::Reader
is complete. We may use this implementation as the TestReader
in the first example.