About the HTML Source for Office Documents
Microsoft Office supports Hypertext Markup Language (HTML) as a native file format. Using HTML, Office documents and data can be stored, distributed, and presented in a format that can be viewed using most Web browsers, while retaining the rich content and functionality of Office documents stored using the traditional companion binary file formats. Widely used in Web pages, HTML elements are focused primarily on the presentation of content, or in other words, on how information is displayed. Although HTML is quite capable of displaying a wide variety of content, it is incapable of describing data in an efficient way.
To compensate for limitations in HTML for defining all the elements of an Office document, the Office applications use combinations of Extensible Markup Language (XML), Vector Markup Language (VML), and cascading style sheets (CSS) in addition to HTML. This allows the Office applications to preserve all the Office-specific content and information about a document when it is saved as HTML. An Office application can interpret all the content and information it saved the next time it opens the HTML file.
Extensible Markup Language (XML)
In order to support the wide variety of functions and features in Office, Extensible Markup Language (XML) is used. The XML standard enables the creation of an extended set of elements to define and describe data, objects, and properties, separating the data from the presentation and surmounting HTML's inability to describe these objects. XML is a subset of Standardized General Markup Language (SGML), and although SGML could be used, XML provides the necessary functionality with much less complexity. Its structured data descriptions are what makes it possible to open HTML documents in a Web browser or Office application yet retain the properties, options, and settings that are used only when editing and saving the document. The XML standard requires that the rules (or schema) for using these extended elements be specified, enabling the documents to be parsed by Web browsers, document viewers, and editors that support XML and that can act on those rules.
For graphics and shapes, a subset of XML called Vector Markup Language (VML) is used to define and describe the vectors that comprise those objects within a document. In Web browsers that support VML, the definitions are used to render the graphics and shapes from vectors. Using VML instead of bitmap graphics files results in smaller file sizes and shorter document download times. In addition, VML objects can be manipulated in script to perform dynamic image transformations that are not possible when using traditional bitmap graphics.
Although HTML contains a number of text formatting elements, cascading style sheets (CSS) are required to display the full range of common Office text formats. CSS specifies text formatting and styles that the standard HTML text formatting elements, such as for bold or emphasis, cannot describe. Also, CSS works similarly to styles in Microsoft Word in that a style definition can be defined once and used many times in an HTML document. CSS provides greater control over positioning of elements such as paragraphs and images within a document.