Starter Kit

MSXML 5.0 SDK

Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - XML Developer's Guide

Starter Kit

On the surface, XML looks like HTML. Both are derived from the Standard Generalized Markup Language (SGML). Tools that generate HTML can often be reused to generate XML.

XML is different from HTML in two key areas: syntax and semantics.

XML Syntax for Well-formed Documents

Both HTML and XML use <, >, and & to create element and attribute structures. While HTML browsers accept or ignore mangled markup language, XML parsers and applications built on those parsers are less forgiving. Errors in XML syntax halt document processing, and users or applications receive error messages, not a best-guess interpretation of the document structure.

XML documents must be well-formed. That is, they must follow rules for identifying document parts and creating nested element structures. These rules include:

  • An XML document can only have one document element. The document element is a single element that contains all the content considered to be part of the document itself. This document root element is the first element to appear after the document prolog. For more information, see Elements.
  • All XML elements must have end tags. While end tags may be optional with certain HTML elements, all elements in XML must have an end tag. For more information, see Elements.
  • XML elements cannot overlap. If the start tag for an element appears within another element, it must end within the same containing element. For example, the following HTML code suggests a combination of bold and italic by overlapping the structures.
    <b>This is bold text. <i>This is bold italic text.</b> This is italic text.</i>

    In some HTML browsers, this text appears as follows.

    This is bold text. This is bold italic text. This is italic text.

    In an XML parser, however, all processing halts as soon as </b> is encountered because the XML parser is looking for </i>, and will not accept </b>. To achieve the same formatting in XML, use the following syntax.

    <b>This is bold text.</b> <i><b>This is bold italic text.</b> This is italic text.</i>

    This extra work for XML document creators results in a leap forward for interoperability. Because XML processors have far less "guessing" code, they fit more easily into smaller-scale processing, like embedded systems. Structural ambiguities are eliminated from XML documents—all XML parsers see the same nested element structures.

  • All attribute values must be quoted, whether or not they contain spaces. You still have the option of single- or double-quotes. For more information, see Attributes.
  • You cannot use the characters <, >, or & within the text of your documents. Use the built-in entities &lt;, &gt;, and &amp;. For more information, see Character and Entity References.

XML Semantics

Although XML is unforgiving about syntax, it offers developers more options for defining meaning in XML documents. HTML is basically one vocabulary with a few variations; <b> always means the same thing to an HTML processor. With XML, you can create your own markup vocabulary or choose from markup vocabularies appropriate to your industry or project type. Schemas and document type definitions (DTDs) let you describe these vocabularies, but you can also create documents using vocabularies without formal definitions. Namespaces help you identify the vocabulary you are using.

This approach requires architectures different from those used by browsers. Developers cannot count on XML applications to understand what their markup means or how it is to be presented, understandings that were built into HTML browsers. Browsers can still present XML, but require a style sheet to format to your specifications. These style sheets are built using cascading style sheets (CSS) or XSL Transformations (XSLT). Some browsers, including Internet Explorer 5.0 and later, include a default style sheet, but it is designed more for diagnostics than for presenting information to end users.

XML applications can also bring their own logic to XML vocabularies, rather than relying on style sheets. This logic may take the form of simple scripts or binding to particular presentation modes, or it may involve writing an entire application from scratch. These applications can take advantage of their built-in knowledge of the labeled structures contained in XML documents to process the information in those documents, present them to users, connect them with other data sources, or redirect them to other appropriate consumers.

In This Section