Provide a common reader protocol for parsing and validating XML.
More...
Detailed Description
- Outline
-
-
- Purpose:
- Provide a common reader protocol for parsing and validating XML.
-
- Classes:
-
- See also:
- Component balxml_reader
-
- Description:
- This component represents an abstract class
balxml::ValidatingReader
- an XML reader that provides data validation against DTD or/and XML Schemas(XSD). The balxml::ValidatingReader
inherits from the balxml::Reader
interface and therefore fully compliant with it. In addition, balxml::ValidatingReader
provides additional methods to control the validation. The enableValidation
method specifies what type of validation the reader should perform. Setting validationFlag
to false
produces a non-validating reader. Setting it to true
forces the reader perform the validation of input XML data against XSD schemas.
-
- Schema Location and obtaining Schemas:
- In validating mode the reader should be able obtain external XSD schemas.
balxml::ValidatingReader
requires that all schema sources must be represented in the form of bsl::streambuf
objects. According to W3C standard an information about external XSD schemas can be defined in three places:
-
In an instance document, the attribute
xsi:schemaLocation
provides hints from the author to a processor regarding the location of schema documents. The schemaLocation
attribute value consists of one or more pairs of URI references, separated by white space. The first member of each pair is a namespace name, and the second member of the pair is a hint describing where to find an appropriate schema document for that namespace. The presence of these hints does not require the processor to obtain or use the cited schema documents, and the processor is free to use other schemas obtained by any suitable means. For example, XercesC has a property XercesSchemaExternalSchemaLocation, that informs parser about available schemas exactly in the same format as the attribute schemaLocation
in the document instance.
- Example:
<purchaseReport
xmlns="http://www.example.com/Report"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.com/Report
http://www.example.com/Report.xsd"
period="P3M" periodEnding="1999-12-31">
-
In a schema, the
include
element has a required schemaLocation
attribute, and it contains a URI reference which must identify a schema document.
-
Also in a schema, the import element has optional namespace and
schemaLocation
attributes. If present, the schemaLocation
attribute is understood in a way which parallels the interpretation of xsi:schemaLocation
in (1). Specifically, it provides a hint from the author to a processor regarding the location of a schema document that the author warrants supplies the required components for the namespace identified by the namespace attribute.
- For all mentioned cases, having the URI reference which identifies a schema and an optional namespace, the processor(parser) should obtain
bsl::streambuf
object for the schema. For this purpose balxml::ValidatingReader
interface defines the two level schemas resolution process:
-
The reader(parser) must lookup schema in internal cache. If the schema is found, it must be used.
-
Otherwise reader must use the associated resolver to obtain schema (see
balxml::Reader::XmlResolverFunctor
).
- Both the schema cache and resolver should be setup before the method
open
is called.
-
- Schema Cache:
balxml::ValidatingReader
provides two abstract methods to maintain the schema cache:
-
addSchema
, add a schema to the cache
-
removeSchemas
, clear the cache and remove all schemas
-
- Thread Safety:
- This component does not provide any functions that present a thread safety issue, since the
balxml::Reader
class is abstract and cannot be instantiated. There is no guarantee that any specific derived class will provide a thread-safe implementation.
-
- Usage:
- In this example, we will create a validating parser that parses and validates document again the schema. The following string describes an XSD schema for the documents we are going to parse:
const char TEST_XSD_STRING[] =
"<?xml version='1.0' encoding='UTF-8'?>"
"<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'"
" xmlns='http://bloomberg.com/schemas/directory'"
" targetNamespace='http://bloomberg.com/schemas/directory'"
" elementFormDefault='qualified'"
" attributeFormDefault='qualified' >"
" "
"<xsd:complexType name='entryType'>"
" <xsd:sequence>"
" <xsd:element name='name' type='xsd:string'/>"
" <xsd:element name='phone'>"
" <xsd:complexType>"
" <xsd:simpleContent>"
" <xsd:extension base='xsd:string'>"
" <xsd:attribute name='phonetype' type='xsd:string'/>"
" </xsd:extension>"
" </xsd:simpleContent>"
" </xsd:complexType>"
" </xsd:element>"
" <xsd:element name='address' type='xsd:string'/>"
" </xsd:sequence>"
"</xsd:complexType>"
" "
"<xsd:element name='directory-entry' type='entryType'/>"
"</xsd:schema>";
The following string describes correct XML for a conforming schema. The top-level element contains one XML namespace attribute, with one embedded entry describing a user: const char TEST_GOOD_XML_STRING[] =
"<?xml version='1.0' encoding='UTF-8'?>\n"
"<directory-entry xmlns:dir='http://bloomberg.com/schemas/directory'\n"
" xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'\n"
" xsi:schemaLocation='http://bloomberg.com/schemas/directory \n"
" aaa.xsd' >\n"
" <name>John Smith</name>\n"
" <phone dir:phonetype='cell'>212-318-2000</phone>\n"
" <address/>\n"
"</directory-entry>\n";
The following string describes invalid XML. More specifically, the XML document is well-formed, but does not conform to our schema: const char TEST_BAD_XML_STRING[] =
"<?xml version='1.0' encoding='UTF-8'?>\n"
"<directory-entry xmlns:dir='http://bloomberg.com/schemas/directory'\n"
" xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'\n"
" xsi:schemaLocation='http://bloomberg.com/schemas/directory \n"
" aaa.xsd' >\n"
" <name>John Smith</name>\n"
" <phone dir:phonetype='cell'>212-318-2000</phone>\n"
"</directory-entry>\n";
Now we define a parse
method for parsing an XML document and validating against an XSD schema: In order to read the XML, we first need to construct a balxml::NamespaceRegistry
object, a balxml::PrefixStack
object, and a TestReader
object, where TestReader
is a derived implementation of balxml_validatingreader
. The reader uses a balxml::PrefixStack
to manage namespace prefixes so we need to set it before we call open. Setup validation Now we call the open
method to setup the reader for parsing using the data contained in the in the XML string. int rc = reader->open(xmlData, bsl::strlen(xmlData), 0, "UTF-8");
ASSERT(rc == 0);
Confirm that the bdem::Reader
has opened properly Do actual document reading process current node here Cleanup and close the reader. The main program parses an XML string using the TestReader int usageExample()
{
a_xercesc::Reader reader;
int rc = parse(&reader, TEST_GOOD_XML_STRING, TEST_XSD_STRING);
Normal end of data ASSERT(rc==1);
int rc = parse(&reader, TEST_BAD_XML_STRING, TEST_XSD_STRING);
Parser error - document validation failed ASSERT(rc==-1);
return 0;
}