How the XML Extractor Application Works

MSXML 5.0 SDK

Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - SAX2 Developer's Guide

How the XML Extractor Application Works

The XML Extractor application uses SAX to read the XML sample file (invoices.xml), and to process it for each invoice. This approach conserves memory by creating and loading a small DOMDocument file for each invoice, and then discarding it after the invoice has generated the intended output — that is, an HTML file containing the patient's bill.

First, the SAX reader (SAXXMLReader50) connects to a content handler (IVBSAXContentHandler) implemented in the MyExtractor class module. The content handler performs the extraction work, connecting to a SAX filter (IVBSAXXMLFilter), which only lets through events that match the following XPath expression:

/invoice//*

This expression loosely translates to "match using the descendant-or-self axis to select the contents of all top-level <invoice> elements in the file."

With this selection criteria, you can simulate a filtered stream of multiple documents in the IVBSAXContentHandler implementation. You do this by firing the SAX startDocument() and endDocument() events for each instance of the <invoice> element in the sample XML file. The SAX reader has its output connected to the SAX writer (MXXMLWriter), which has been configured as a DOMDocument builder for generating each of the invoices.

For all of the three component interfaces implemented, only the SAX filter is custom-built for this application. The filter also calls an application method (processInvoice) to process the next bill, each time the <invoice> element ends and a corresponding DOMDocument tree is available in the DOM builder (MXXMLWriter).

As you review the sample code, note that the SAX filter remains unaware of what the output implementation (in this sample, MXXMLWriter) is doing. Only the SAX reader and its content handler direct the output. This means that it remains possible to change the application (for example, by connecting its output to another custom writer or by connecting it to another filter) without requiring any changes to the current filter code used in the application.

It is fairly simple to processing each DOM document. The transformNode method applies the XSLT style sheet file (invoice.xsl) to each DOMDocument tree generated for a particular <invoice> element. This generates the HTML version of the invoice and displays it in the invoice preview window. The application also saves the HTML version of each invoice to its own file. These files are named according to the invoicexx.html naming pattern where xx is the value of the number attribute associated with each <invoice> element previewed and generated.

See Also

Extract Data From a Large Document | Overview of the XML Extractor Application | Application Forms (XML Extractor) | Sample Files (XML Extractor) | MyExtractor Class (XML Extractor) | Run the Application (XML Extractor)