Parsers and Parser Types
From a developer's point of view, XML parsers are the fundamental XML component, a bridge between XML documents, seen as a long chain of bytes, and applications that process that XML. Almost all XML applications are built on top of parsers.
The parser is responsible for handling XML syntax and, if desired, checking the contents of the document against constraints established in a document type definition (DTD) or schema; the application must understand how to process or display the information. The application is insulated from the details of the XML document, allowing document creators to take advantage of those details without worrying about the application. Document creators must still present correctly structured information to an application, but they can do so using generic tools instead of creating new interfaces for every different transaction.
There are two basic kinds of XML parsers defined in the XML 1.0 specification. Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office can operate in either validating or nonvalidating mode.
- nonvalidating parsers
- Check document syntax and report all violations of well-formedness constraints. Nonvalidating parsers can also add information to the document based on declarations in the DTD. MSXML does read the DTD, including external resources, and acts on that information.
- validating parsers
- Perform the same functions as nonvalidating parsers, but also compare the structures of documents to rules in the DTD.
The needs of your application will determine whether you use validating or nonvalidating mode. If you're building a generic XML application that doesn't expect to see a particular set of document structures, a nonvalidating parser will allow you to work with documents that weren't built to be valid. By default, Microsoft Internet Explorer uses nonvalidating mode to accommodate the largest possible number of documents.
On the other hand, if you're creating systems for exchanging information with other organizations, and have had to agree on the format and content of those exchanges, DTDs (or schemas) and validating parsers are a good choice. They provide an extra layer of processing that lets you trim a lot of structure-checking code from your programs, while ensuring that document structures conform to the formats you agreed on prior to the transaction.
In many ways, DTDs and schemas are like contracts, and validating parsers enforce the terms of those contracts.
Testing for well-formedness with Internet Explorer 5.5
You can open any XML document labeled as XML in Internet Explorer. If the document is well-formed, Internet Explorer will display the document according to the style sheet specified within the document. If the document does not specify a style sheet, Internet Explorer will display the document using its default style sheet for XML. If the document is not well-formed, Internet Explorer will report an error. If a document has multiple errors, you might have to fix an error, reopen the document, and fix the next error.
Note To label a document as XML, assign it a file extension of .xml if you are opening it from a local file system or file server, or a Multipurpose Internet Mail Extensions (MIME) type of text/xml or application/xml if you are opening it over a Web connection.
Testing for validity with Internet Explorer 5.5
To use Internet Explorer to validate an XML document, you must set a switch in the parser before loading the document.
Microsoft offers an XML validator that sets these switches and provides an interface. This validator allows you to enter an XML document either as a URL or through a form. The validator parses the document using the MSXML parser in validating mode and reports any errors.
The core of the validator is this snippet of Microsoft JScript® code.
xmldoc= new ActiveXObject("Msxml2.DOMDocument.5.0"); xmldoc.validateOnParse = true; xmldoc.load(url);
The code appears as follows in Microsoft Visual Basic® Scripting Edition (VBScript).
Set xmldoc = CreateObject("Msxml2.DOMDocument.5.0") xmldoc.validateOnParse = true xmldoc.load(url)