Character and Entity References

MSXML 5.0 SDK

Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - XML Developer's Guide

Character and Entity References

Character and entity references provide ways to include information in XML documents by reference rather than by typing characters into the document directly. This can be useful in cases in which:

  • Characters cannot be entered directly into a document because they would be interpreted as markup.
  • Characters cannot be entered directly into a document because of input device limitations.
  • Characters cannot be transported reliably through a processor limited to one-byte characters.
  • A character string or document fragment appears repeatedly and can be abbreviated.

For content representation, XML provides a number of syntactical constructs that start with an ampersand (&) and end with a semi-colon (;).

Character references provide a way to insert Unicode characters that are identified by a number pointing to a Unicode code point. Code points can be identified using either decimal or hexadecimal notation.

&#value;
Syntax used for decimal references.
&#xvalue;
Syntax used for hexadecimal references.

For example, to insert the Euro symbol, a character still missing from many keyboards, you can insert € or € into a document.

The following table lists the five built-in entities for the characters used for XML markup.

Entity Entity Reference Meaning
lt &lt; < (less than)
gt &gt; > (greater than)
amp &amp; & (ampersand)
apos &apos; ' (apostrophe or single quote)
quot &quot; " (double quote)

In cases where the character might cause the XML parser to misinterpret the document structure, use the entity instead of typing the character. The &apos; and &quot; entity references are most commonly used in attribute values.

To write Me&You, for example, use Me&amp;You. For a<b, use a&lt;b. For b>c, use b&gt;c.

You can also define your own entities, much like HTML defines a set of entities for use in HTML. &apos is not recognized as an HTML file; $#.... must be used when transforming to HTML.

If you're working with a document type definition (DTD) that has defined entities, you can reference them in document content by using the following syntax.

&entityName;