Controlling White Space with the DOM
When a text file is opened with the xmlDoc.load
method or the xmlDoc.loadXML
method (where xmlDoc
is an XML DOM document), the parser strips most white space from the file, unless specifically directed otherwise. The parser notes within each node whether one or more spaces, tabs, newlines, or carriage returns follow the node in the text by setting a flag. This method is efficient, reducing both the size of each XML file and the number of calculations required to redisplay the XML in a browser. However, because this information is lost, an XML document stored in this manner can lose formatting information in its content. Tabs, in particular, can be lost, because they are not formally recognized in the default mode as anything but white space.
XSLT uses the DOM, not the source document, to guide the transformation. Because the white space has already been stripped to process the XML into the DOM, white space characters are lost even before the transformation takes place. Most of the XSLT-related methods for specifying white space in the source data document or style sheets are applied too late to make a difference in formatting.
The preserveWhiteSpace
property tells the XML parser whether or not to convert white space from the initial source file that acts against the XML DOM. The default value of this property is False: Extraneous white space characters are stripped, and the file is reduced to the smallest possible stream. If you want the XML document to retain extraneous white space when converted to a DOM document, you must set preserveWhiteSpace
to True prior to loading the file into the DOM.
If you set the preserveWhiteSpace
property from True to False and then back to True for a given DOM document, the spaces will not reappear. Setting the property to False removes the white space from the DOM, which cannot reconstruct it.
If you are working with XML as a data format streamed to some other process, disable preserveWhiteSpace
by setting it, or allowing it to default, to False. If retaining positional information is important—for example, in conversions to non-XML formats like tab-separated data—set preserveWhiteSpace
to True. Be aware that this option increases the number of characters, and places more demands on the browser.
Example
This example shows how white space can be programmatically controlled, using the DOM. To run the example, view the HTML file in Internet Explorer.
In the first block of results, the content of each original <whitespace>
element appears on a separate line. In the second block, where the preserveWhiteSpace
property is set to False, all the content appears on a single line. In addition, the first block is indented, while the second block is not. The indents and line breaks in the first block are a result of newline and tab characters in the source document between the boundaries of the <whitespace>
elements. This white space is affected by the preserveWhiteSpace
property.
The white space within the <whitespace>
elements is not affected by the value of the preserveWhiteSpace
property. To remove white space only text children of elements, use the XPath normalize-space()
function.
The example uses the following three files.
- striporpres.xml
- The XML source document. Each of the
<whitespace>
elements contains different kinds of white space, including tags, spaces, and newlines. - striporpres.xsl
- An XSLT style sheet that makes invisible white space visible for demonstration purposes. The striporpres.xsl style sheet consists of two template rules. The first rule applies to the source document root node, instantiating an HTML
<pre>
element node in the result tree. The transformed<whitespace>
elements are placed in this node, which is handled by the second template rule. As in the<xsl:strip-space>
and<xsl:preserve-space>
example, the XPathtranslate()
function is used to make the invisible white space visible. - striporpres.htm
- An HTML file that contains Microsoft® JScript®. The
preserveStripPreserve
JScript function is called when the file is loaded. This function performs the following tasks:- Sets the
preserveWhiteSpace
property. - Loads striporpres.xml and striporpres.xsl into separate
DOMDocument
objects. - Alternately sets the
preserveWhitespace
property of theDOMDocument
object to True or False.
The script also uses the
transformNode
method to apply the striporprres.xls style sheet to the striporpres.xml document, and assigns the resulting string to theinnerHTML
property. This property is used to display the results in Internet Explorer. - Sets the
XML File (striporpres.xml)
There is no <?xml-stylesheet?>
processing instruction. Instead, a style sheet is applied programmatically to the document using the transformNode
method. Unlike a pure XSLT solution, this technique allows you to set and reset the preserveWhiteSpace
property.
Copy this text to a file. Then, find each tabhere
and replace it with a tab character (that is, press the TAB key).
<?xml version='1.0'?> <whitespaceTest> <whitespace>Tabs[tabheretabhere]</whitespace> <whitespace>Spaces[ ]</whitespace> <whitespace>Newlines[ ]</whitespace> </whitespaceTest>
XSLT File (striporpres.xsl)
<?xml version='1.0'?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <pre> <xsl:apply-templates /> </pre> </xsl:template> <xsl:template match="whitespace"> <!-- Use translate() XPath function to convert character X to character Y. --> <xsl:value-of select="translate(.,' 	','-NRT')"/> </xsl:template> </xsl:stylesheet>
HTML File (striporpres.htm)
<html> <head> <title>Demo: Controlling White Space Output via the DOM</title> <script language='JScript'> <!-- function preserveStripPreserve() { // Associate the result tree object with any element(s) whose // id attribute is "testResults." var objResTree = document.all['testResults']; // Declare two new Msxml DOMDocument objects. var objSrcTree = new ActiveXObject('Msxml2.DOMDocument'); var objXSLT = new ActiveXObject('Msxml2.DOMDocument'); // Load the two DOMDocuments with the XML document and the // XSLT style sheet.objSrcTree.preserveWhiteSpace = true;
objSrcTree.load('stripOrPres.xml');
objXSLT.load('stripOrPres.xsl'); // Use the transformNode method to apply the XSLT to the XML. strResult = objSrcTree.transformNode(objXSLT); // Now reset preserveWhiteSpace to false, and re-load the source...objSrcTree.preserveWhiteSpace = false;
objSrcTree.load('stripOrPres.xml');
// ...and rerun the transform. Note the concatenation of the // this transformation's result tree to the one created when // preserveWhiteSpace was true. strResult = strResult + objSrcTree.transformNode(objXSLT); // Assign the resulting string to the result tree object's // innerHTML property. objResTree.innerHTML = strResult; return true; } --> </script> <body onload='preserveStripPreserve()'> <div id='testResults'></div> </body> </html>
Formatted Output
Tabs[TT] Spaces[--] Newlines[NN] Tabs[TT]Spaces[--]Newlines[NN]
Processor Output
No XSLT processor output stream is available in Internet Explorer for this example.
See Also
Example of <xsl:preserve-space> and <xsl:strip-space> | Stripping White Space Using normalize-space()