Controlling White Space with the DOM

MSXML 5.0 SDK

Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - XSLT Developer's Guide

Controlling White Space with the DOM

When a text file is opened with the xmlDoc.load method or the xmlDoc.loadXML method (where xmlDoc is an XML DOM document), the parser strips most white space from the file, unless specifically directed otherwise. The parser notes within each node whether one or more spaces, tabs, newlines, or carriage returns follow the node in the text by setting a flag. This method is efficient, reducing both the size of each XML file and the number of calculations required to redisplay the XML in a browser. However, because this information is lost, an XML document stored in this manner can lose formatting information in its content. Tabs, in particular, can be lost, because they are not formally recognized in the default mode as anything but white space.

XSLT uses the DOM, not the source document, to guide the transformation. Because the white space has already been stripped to process the XML into the DOM, white space characters are lost even before the transformation takes place. Most of the XSLT-related methods for specifying white space in the source data document or style sheets are applied too late to make a difference in formatting.

The preserveWhiteSpace property tells the XML parser whether or not to convert white space from the initial source file that acts against the XML DOM. The default value of this property is False: Extraneous white space characters are stripped, and the file is reduced to the smallest possible stream. If you want the XML document to retain extraneous white space when converted to a DOM document, you must set preserveWhiteSpace to True prior to loading the file into the DOM.

If you set the preserveWhiteSpace property from True to False and then back to True for a given DOM document, the spaces will not reappear. Setting the property to False removes the white space from the DOM, which cannot reconstruct it.

If you are working with XML as a data format streamed to some other process, disable preserveWhiteSpace by setting it, or allowing it to default, to False. If retaining positional information is important—for example, in conversions to non-XML formats like tab-separated data—set preserveWhiteSpace to True. Be aware that this option increases the number of characters, and places more demands on the browser.

Example

This example shows how white space can be programmatically controlled, using the DOM. To run the example, view the HTML file in Internet Explorer.

In the first block of results, the content of each original <whitespace> element appears on a separate line. In the second block, where the preserveWhiteSpace property is set to False, all the content appears on a single line. In addition, the first block is indented, while the second block is not. The indents and line breaks in the first block are a result of newline and tab characters in the source document between the boundaries of the <whitespace> elements. This white space is affected by the preserveWhiteSpace property.

The white space within the <whitespace> elements is not affected by the value of the preserveWhiteSpace property. To remove white space only text children of elements, use the XPath normalize-space() function.

The example uses the following three files.

striporpres.xml
The XML source document. Each of the <whitespace> elements contains different kinds of white space, including tags, spaces, and newlines.
striporpres.xsl
An XSLT style sheet that makes invisible white space visible for demonstration purposes. The striporpres.xsl style sheet consists of two template rules. The first rule applies to the source document root node, instantiating an HTML <pre> element node in the result tree. The transformed <whitespace> elements are placed in this node, which is handled by the second template rule. As in the <xsl:strip-space> and <xsl:preserve-space> example, the XPath translate() function is used to make the invisible white space visible.
striporpres.htm
An HTML file that contains Microsoft® JScript®. The preserveStripPreserve JScript function is called when the file is loaded. This function performs the following tasks:
  • Sets the preserveWhiteSpace property.
  • Loads striporpres.xml and striporpres.xsl into separate DOMDocument objects.
  • Alternately sets the preserveWhitespace property of the DOMDocument object to True or False.

The script also uses the transformNode method to apply the striporprres.xls style sheet to the striporpres.xml document, and assigns the resulting string to the innerHTML property. This property is used to display the results in Internet Explorer.

XML File (striporpres.xml)

There is no <?xml-stylesheet?> processing instruction. Instead, a style sheet is applied programmatically to the document using the transformNode method. Unlike a pure XSLT solution, this technique allows you to set and reset the preserveWhiteSpace property.

Copy this text to a file. Then, find each tabhere and replace it with a tab character (that is, press the TAB key).

<?xml version='1.0'?>
<whitespaceTest>
    <whitespace>Tabs[tabheretabhere]</whitespace>
    <whitespace>Spaces[  ]</whitespace>
    <whitespace>Newlines[

]</whitespace>
</whitespaceTest> 

XSLT File (striporpres.xsl)

<?xml version='1.0'?>
<xsl:stylesheet version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
   <pre>
      <xsl:apply-templates />
   </pre>
</xsl:template>

<xsl:template match="whitespace">
   <!-- Use translate() XPath function to convert 
        character X to character Y. -->
   <xsl:value-of select="translate(.,' &#10;&#13;&#9;','-NRT')"/>
</xsl:template>

</xsl:stylesheet>

HTML File (striporpres.htm)

<html>
<head>
<title>Demo: Controlling White Space Output via the DOM</title>
<script language='JScript'>
<!--
function preserveStripPreserve() {
// Associate the result tree object with any element(s) whose
// id attribute is "testResults."
var objResTree = document.all['testResults'];
// Declare two new Msxml DOMDocument objects.
var objSrcTree = new ActiveXObject('Msxml2.DOMDocument');
var objXSLT = new ActiveXObject('Msxml2.DOMDocument');
// Load the two DOMDocuments with the XML document and the
// XSLT style sheet.
objSrcTree.preserveWhiteSpace = true;
objSrcTree.load('stripOrPres.xml');
objXSLT.load('stripOrPres.xsl');
// Use the transformNode method to apply the XSLT to the XML.
strResult = objSrcTree.transformNode(objXSLT);
// Now reset preserveWhiteSpace to false, and re-load the source...
objSrcTree.preserveWhiteSpace = false;
objSrcTree.load('stripOrPres.xml');
// ...and rerun the transform. Note the concatenation of the
// this transformation's result tree to the one created when
// preserveWhiteSpace was true.
strResult = strResult + objSrcTree.transformNode(objXSLT);
// Assign the resulting string to the result tree object's
// innerHTML property.
objResTree.innerHTML = strResult;
return true;
}
-->
</script>
<body onload='preserveStripPreserve()'>
<div id='testResults'></div>
</body>
</html>

Formatted Output

    Tabs[TT]
    Spaces[--]
    Newlines[NN]
Tabs[TT]Spaces[--]Newlines[NN]

Processor Output

No XSLT processor output stream is available in Internet Explorer for this example.

See Also

Example of <xsl:preserve-space> and <xsl:strip-space> | Stripping White Space Using normalize-space()