SAX to DOM Implementation Notes
You should be aware of the following considerations.
Reading and Modifying the DOMDocument Object
When a DOMDocument
object is built from SAX events, the document is locked and cannot be modified until the endDocument
method is called. The document can, however, be read anytime after startDocument
method is called. The following provides an overview of how the document is locked as it is built.
When the startDocument
method is called, a write-lock is imposed on the document; existing data in the document is removed; and the lock is downgraded to a read-lock.
The read-lock is held on the document until one of the following conditions is met:
- The
endDocument
method is called. - Any of the methods on the
errorHandler
interface are called. - The
MXXMLWriter
object is released and destructed. - There is an error (for example, a call to
startElement
fails).
Any attempt to modify the document while the read-lock is held will raise an error if the method is invoked from the current thread or block if invoked from a different thread.
Note For MSXML 5.0, the new parser will always be used for aDOMDocument
object connected to receive the output of SAX events. In this scenario, you should be aware that the new parser does not support DTD validation. For example, if you do not first set thevalidateOnParse
property to False on theDOMDocument
object, and then try to invoke the SAX lexical handler (ISAXLexicalHandler
) by callingstartDTD
, it will raise an error.
Supported Methods
When a DOMDocument
object is set as writer output, only the following methods are supported.
get/set output
get/set indent
flush()
Allowed Sequence of Handler Callbacks
The following sequence of handler callbacks can be invoked.
startDocument (comment | processingInstruction)* (startDTD DTD_CONTENT endDTD)? (comment | processingInstruction)* (startElement ELEMENT_CONTENT endElement) (comment | processingInstructions)* endDocument DTD_CONTENT := elementDecl | attributeDecl | internalEntityDecl | externalEntityDecl | unparsedEntity | notationDecl | startEntity ELEMENT_CONTENT* endEntity | processingInstruction | comment ELEMENT_CONTENT := Characters | ignorableWhitespace | startCDATA characters* endCDATA | startElement ELEMENT_CONTENT* endElement | skippedEntity | startEntity ELEMENT_CONTENT* endEntity | processingInstruction | comment
When building a DOMDocument
object, a startDocument
event and its corresponding endDocument
event must be called. If the callback sequence is not valid (for example, endElement
is called without closing an open CDATA section), errorHandler
aborts the parse and builds a parseError
object from the provided information.
indent Property
When the indent
property is set, characters
events are scanned to see if the event is purely white space. In this case, it is treated just like a white space event from the old parser. What this means is that when the only event to separate other (non-characters) events is a characters
with only white space text, then there is no text node built for the white space, but the white space is remembered (as hints).
For example:
startElement("","","a",attrs); characters(" "); endElement("","","a");
normally builds a DOM that looks like this:
Element: nodeName="a" | TextNode " " which would be output like this: <a> </a>
If the indent
property is set, then Element node does not have a text child and the output looks like this:
<a> </a>
Note The indenting is due to special hints that are stored internally. They are not exposed in any manner, except as indenting when saving.
See Also
MXXMLWriter CoClass | output Property | DOMDocument