How White Space is Handled in XML Files
White space refers to characters in an XML document, such as spaces and tabs, that do not appear when the document is represented in a text editor like Notepad, or in an application like Microsoft® Internet Explorer.
Character | Entity representation |
---|---|
Space |   (hexadecimal   ) |
Carriage return | (
 ) |
Tab | 	 (	 ) |
Line feed (newline) | (
 ) |
White space includes the tab, space, and line break characters. Any series of white space characters between elements is considered a single white space node. For example,
<books> <book>
Suppose that between the above elements is a line break, a tab, and some space characters. Together, these characters form a single white space node. If this white space node is stripped, the following results:
<books><book>
If the white space node is normalized, the following single space results:
<books> <book>
If you want the processor to produce human-readable output such as a standard HTML listing, you should preserve the white space nodes that are contained in the XML input file and the XSLT file.
To control white space, use the following XSLT constructs:
  or   <xsl:preserve-space>
A non-breaking space ( 
, hexadecimal  
, or
in HTML) is not considered white space, because it is not an allowable place to break to a new line.
In HTML, a non-breaking space is useful for formatting. For example, if a table cell is completely empty, the inner cell border is not drawn. If you want the border drawn even though the cell has no content, use a non-breaking space in the XSLT, as follows: <td> </td>
An alternative to using  
or  
in XSLT is to define the
entity at the beginning of the XSLT file, as follows:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xsl:stylesheet [<!ENTITY nbsp " ">]>
This allows you to use
throughout the document. It will be recognized the same way as an HTML
entity.
You can control white space output in an XML document by using the Document Object Model (DOM) or XSLT features.
Any XML document exercises some control over white space. The parser handles white space in an XML document according to the xml:space
attribute and the default white space rules, before the document is processed by the XSLT processor.
The XML 1.0 specification imposes the following white space rules.
- The parser normalizes newline characters specific to an operating system into true newline characters (hex x0A, or decimal 10). This is because different operating systems represent line breaks in different ways: for example, as true newlines, carriage returns, line feed/carriage return character pairs, and so on.
- The parser normalizes the values of attributes (other than CDATA-type attributes), by replacing multiple consecutive occurrences of white space with a single space. For example, an attribute value such as
"text text"
(with four intervening spaces embedded in the value is passed from the parser) as"text text"
. The multiple spaces in the original value are replaced with a single space. - If an
xml:space
attribute in the source XML document or style sheet conflicts with explicit XSLT white space handling, the behavior associated with thatxml:space
attribute always takes precedence.
According to the XSLT specification, the XSLT processor merges adjacent text nodes into a single text node. If a text node (following any merging that occurs) consists of white space only, the containing element is compared to the list of elements in any <xsl:strip-space>
elements in the style sheet. If the containing element appears in such a list, the text node with white space only is removed from the result tree.
This applies only to insignificant white space—that is, white space between, not within, elements. Use the XSLT normalize-space()
function to normalize white space within elements.
White space is handled by the built-in parser of MSXML, as well as by its built-in XSLT processor.
See Also
White Space (XML Developer's Guide) | Controlling White Space with XSLT | <xsl:preserve-space> Element | <xsl:strip-space> Element