Frequently Asked Questions about SAX2

MSXML 5.0 SDK

Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - SAX2 Developer's Guide

Frequently Asked Questions about SAX2

The following are some frequently asked questions about the Simple API for XML (SAX).

What are the system requirements for SAX2?

Can I pass a BSTR instead of a URL to the SAXXMLReader?

Does SAX support validation?

When would I want to use standalone SAX validation (i.e. the MXValidator CoClass) instead of the integrated validation feature provided on the SAX reader?

Why does the MXValidator CoClass contain methods and properties that are not implemented?

Why is white space reported as characters()? Why isn't ignorableWhitespace called?

How do I get XML header information?

How can I use SAXXMLWriter from scripts?

Can I continue parsing if my Visual Basic program breaks because the XML file has errors?

Can I use the same instance of SAXXMLReader to parse XML files sequentially?

How can I write from SAXXMLWriter in a memory buffer in nonUnicode encoding?

How can I reset SAXXMLWriter to create a new string?

How do I avoid appending a new XML document to the previous one?

How can I tell if attribute values have an entity reference?

Can I find the order of element attributes?

How do I handle errors with SAX?

What are the system requirements for SAX2?

The system requirements for SAX2 are the same as those for Installing and Registering the MSXML 5.0 SDK.

Return to top

Does SAX support validation?

Yes. With MSXML 5.0, SAX supports validation to XSD schemas but does not support validation using Document Type Definition (DTD) files.

To validate documents when using SAX, you set the validation flag on the SAXXMLReader through the putFeature method.

When setting this feature, the feature name is "schema-validation" and its value is set to True like this:

   oReader.putFeature("schema-validation", True)

This feature is read-only during parsing and read/write otherwise.

For more information, see Validate Documents Using SAX.

Return to top

When would I want to use standalone SAX validation (i.e. the MXValidator CoClass) instead of the integrated validation feature provided on the SAX reader?

SAXXMLReader and MXValidator behave in very similar ways when used for validating SAX document streams. The only difference is that the SAX reader might provide some additional controls that the MXValidator does not. Also, unlike the SAX reader, which allows you to toggle the error-reporting mode, MXValidator always operates in "exhaustive-errors" mode and reports all errors.

MXValidator is most useful when the data to be parsed is not in XML form, such as comma-delimited text, a Microsoft Word document, or a C-style data structure. The SAX reader cannot read any of these formats. If you need to parse non-XML data, you can implement your own data transformer using SAX interfaces, and connect to MXValidator to verify the data stream. For an example, see the putProperty Method (MXValidator) sample.

Why does the MXValidator CoClass contain methods and properties that are not implemented?

MXValidator inherits from the IMXFilter interface, which is intended to provide a generic front end for SAX event filtering. For MXValidator, only the ability to get and put features and properties is needed. Currently, only the contentHandler and errorHandler properties are implemented, but other SAX filters might be developed in the future.

Can I pass a BSTR, instead of a URL, to the SAXXMLReader?

You can pass a VARIANT containing a BSTR to ISAXXMLReader::parse(VARIANT). In this case, the encoding is UTF-16.

Return to top

Why is white space reported as characters()? Why isn't ignorableWhitespace called?

White space can occur in several places, for example, in an element without character data, which contains only child nodes and white space. To ignore white space, the parser must be able to distinguish those cases. The SAX parser is a nonvalidating parser and cannot distinguish those cases, so ignorableWhitespace() never gets called. Nonvalidating parsers treat all white space between elements as characters.

Return to top

How do I get XML header information?

The XML header contains version and encoding information, for example, <?xml version="1.0" encoding="UTF-8"?>. To get XML header information, call ISAXXMLReader::getProperty([in] const wchar_t * pwchName, [out, retval] VARIANT * pvarValue); and pass one of following three property values:

"xmldecl-encoding"
"xmldecl-version"
"xmldecl-standalone"
Note   The "xmldecl-encoding", "xmldecl-version", and "xmldecl-standalone" properties provide information about the presence and content of the XML header. The information is available only when SAXXMLReader reads and parses the XML document. After processing, the control returns to the application, and this information is no longer available.

XML header information was designed for low-level reader and parser use, not for applications.

To get the processing instruction, implement a ContentHandler that supports ISAXContentHandler and handles the processingInstruction event.

Return to top

How can I use SAXXMLWriter from scripts?

SAXXMLReader implements IVBSAXXMLReader, which is accessible from scripts. You can call handler events directly from SAXXMLWriter and generate XML without the reader.

Return to top

Can I continue parsing if my Visual Basic program breaks because the XML file has errors?

All parser errors are fatal (E_FAIL). However, the classification of parsing errors in SAX is independent from the classification of errors in Microsoft® Visual Basic®. In SAX, a fatal parsing error means only that parsing cannot continue; in Visual Basic, a fatal error means that the application cannot continue. Use the ON ERROR statement in Visual Basic.

Return to top

Can I use the same instance of SAXXMLReader to parse XML files sequentially?

You can use the same instance of SAXXMLReader to parse two XML files sequentially, but not in different threads. The MSXML implementation does not support multithreaded use. AddRef/Release are not multithread-safe and there is no locking on any of the API entry points.

However, you can use two instances of SAXXMLReader in two threads, and parse two different XML files, as long as nothing gets shared.

Return to top

How can I write from SAXXMLWriter in a memory buffer in nonUnicode encoding?

If "memory buffer" is a Visual Basic string, you cannot write to it because it is always in Unicode format. However, you can provide an IStream/ISequentialStream object, which writes to a memory buffer. XML will be generated the same way as for output into a file.

Return to top

How can I reset SAXXMLWriter to create a new string?

To reset SAXXMLWriter to create a new string:

  1. Generate an XML document.
  2. Use a string object from the writer.
  3. Generate another XML document.

Return to top

How do I avoid appending a new XML document to the previous one?

To reset XML writer to create a new string, reset the output property. Internally, the flush method of IMXXMLWriter will be called.

Return to top

How can I tell if attribute values have an entity reference?

There is no indication of whether attribute values have an entity reference.

Return to top

Can I find the order of element attributes?

The order of attributes is not important in XML, and is therefore not exposed. Enumeration with attributes may follow the original order of the attributes.

Return to top

How do I handle errors with SAX?

ISAXErrorHandler/IVBSAXErrorHandler provides the basic interface for handling parsing errors. Currently, all errors are fatal.

In C++, a fatal error will result in returning a value other than S_OK HRESULT from the parse or parseURL method.

In Visual Basic, the On Error statement handles exceptions.

Return to top