Modeling Documents as Node Trees

MSXML 5.0 SDK

Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - DOM Developer's Guide

Modeling Documents as Node Trees

The DOM provides you with an interface for loading, accessing, manipulating, and serializing XML documents. The DOM provides a representation of a complete XML document stored in memory, providing random access to the contents of the entire document. The DOM allows applications to rely on the logic provided by the MSXML parser to handle XML-based information, using its facilities rather than writing custom code to read and process XML.

When the MSXML parser loads an XML document into a DOM, it reads it from start to finish and creates a logical model of nodes from the structures and content within the XML document. The document itself is considered a single node that contains all of the other nodes, including a node representing the root element, which, in turn, contains all of the element, attribute, and text nodes in the document.

Example

The following XML document has a simple multi-tier structure.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="show_book.xsl"?>
<!DOCTYPE catalog [ 
<!NOTATION XLS PUBLIC "http://www.microsoft.com/office/excel/">
<!ELEMENT COLLECTION    (DATE? , BOOK+) >
<!ATTLIST COLLECTION
    xmlns:dt CDATA #FIXED "urn:schemas-microsoft-com:datatypes">
<!ELEMENT BOOK (TITLE, AUTHOR, PUBLISHER) >
<!ELEMENT DATE          (#PCDATA) >
<!ELEMENT TITLE         (#PCDATA)  >
<!ELEMENT AUTHOR        (#PCDATA)  >
<!ELEMENT PUBLISHER     (#PCDATA)  >
]>
<!--catalog last updated 2000-11-01-->
<catalog xmlns="http://www.example.com/catalog/">
  <book id="bk101">
     <author>&#71;ambardella, Matthew</author>
     <title>XML Developer's &#x47;uide</title>
     <genre>Computer</genre>
     <price>44.95</price>
     <publish_date>2000-10-01</publish_date>
     <description><![CDATA[An in-depth look at creating applications with XML, using <, >,]]> and &amp;.</description>
  </book>
  <book id="bk109">
     <author>Kress, Peter</author>
     <title>Paradox Lost</title>
     <genre>Science Fiction</genre>
     <price>6.95</price>
     <publish_date>2000-11-02</publish_date>
     <description>After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum.</description>
  </book>
</catalog>

After MSXML parsing, the top two levels of the node structure representing this document will look like this.

The topmost node is the document itself, which contains all of the other nodes. Immediately within the document are nodes representing the XML declaration, the style sheet processing instruction, the DOCTYPE declaration, and the root element for the document, in this case, catalog.

The catalog element contains the real content of the document, and its structure is shown below.

This part of the DOM contains element, attribute, text, and CDATA nodes. (The character references and built-in entities are converted to ordinary text by the parser, but the CDATA section has its own node.)