Generating Well-Formed HTML Using XSLT
Well-formed HTML conforms to the rules of XML. This means that the same HTML tags are applicable, but the stricter XML syntax is required. For example, <BR>
is not a well-formed HTML tag, but <BR/>
is. <H1>...</h1>
is not well-formed, but <H1>...</H1>
or <h1>...</h1>
is. An XSLT style sheet is itself XML, and it is important that any HTML within it be well-formed. The following are some basic rules to follow as you write or convert to well-formed HTML.
All tags must be closed
HTML allows certain end tags to be optional, such as <P>
, <LI>
, <TR>
, and <TD>
. XML requires all tags to be closed explicitly.
HTML | Well-formed HTML |
---|---|
<P> This is an HTML paragraph.
<P>or two. |
<P>This is an HTML paragraph. </P>
<P>or two. </P> |
Leaf nodes must also be closed by placing a forward slash (/) within the tag: <BR/>
, <HR/>
, <INPUT/>
, and <IMG/>
.
HTML | Well-formed HTML |
---|---|
<IMG src="sample.gif" width="10" height="20"> |
<IMG src="sample.gif" width="10" height="20"/> |
No overlapping tags
XML does not allow start and end tags to overlap, but enforces a strict hierarchy within the document.
HTML | Well-formed HTML |
---|---|
<B>Bold <I>Bold and Italic</B> Italic</I> |
<B>Bold </B> <I><B>Bold and Italic </B> Italic</I> |
Case matters
Choose a consistent case for start and end tags. The examples in this SDK generally use uppercase for HTML elements.
HTML | Well-formed HTML |
---|---|
<B><i>Hello!</I></b> |
<B> <I> Hello!</I> </B> |
Quote your attributes
All attributes must be surrounded by quotation marks, either single or double.
HTML | Well-formed HTML |
---|---|
<IMG src=sample.gif width=10 height=20 > |
|
Use a single root
Shortcuts that eliminate the <HTML>
element as the single top-level element are not allowed.
HTML | Well-formed HTML |
---|---|
<TITLE>Shortcut markup</TITLE> <BODY> <P>Amazing that this HTML works.</P> </BODY> |
<HTML> <HEAD> <TITLE>Clean markup</TITLE> </HEAD> <BODY> <P>Not nearly so amazing that this well-formed HTML works.</P> </BODY> </HTML> |
Use fewer named entities
XML defines only a minimal set of built-in named entities. These are as follows:
- < — (<)
- > — (>)
- & — (&)
- " — (")
- ' — (')
Therefore, you should avoid using other named HTML entities. When in doubt, always use the numeric character reference for the character of interest. For example, for non-breaking spaces, use  
; or  
instead of
. For emphatic dashes, use —
instead of —
. Also, in Internet Explorer, the numeric character reference of —
is treated as the named entity of —
, but will not be resolved to —
in MSXML.
Escape script blocks
Script blocks in HTML can contain characters that cannot be parsed, namely < and &. These must be escaped in well-formed HTML by using character entities, or by enclosing the script block in a CDATA section.
In addition, Microsoft® JScript® (compatible with the ECMA 262 language specification) comments terminate at the end of the line, so it is important to preserve the white space within script blocks containing comments. By default, the xml:space
attribute value normalizes white space by compressing adjacent white space characters into a single space. This destroys the new line that terminates the JScript comment. Any JScript following the comment is treated as part of the comment and ignored, often resulting in script errors. The CDATA notation also ensures that the white space is preserved.
The following HTML script block contains both an unparsable character (<) and JScript comments. The well-formed script block uses CDATA to encapsulate the script.
HTML | Well-formed HTML |
---|---|
<SCRIPT> // checks a number against 7 function less-than-seven(n) { return n < 7; } </SCRIPT> |
<SCRIPT> <![CDATA[ // checks a number against 7 function less-than-seven(n) { return n < 7; }
]]> </SCRIPT> |
Not all scripts will fail if they are not escaped in this way. However, it is highly recommended that you habitually escape them. This ensures not only that the script will work if it contains escaped characters or comments now, but that it will continue to work if these characters are added in the future.