About XML
Extensible Markup Language (XML) is a markup language defined by the XML Working Group of the World Wide Web Consortium (W3C). XML is similar to Hypertext Markup Language (HTML) in that XML is a tag-based language specifically designed for delivering information on the Web. However, XML is different from HTML in that the tags that XML uses are not predefined. Instead, the W3C XML recommendation specifies a set of rules that must be followed so that you can create your own meaningful set of tags.
You can create your own tags to use within an XML document by following a few simple rules:
- An XML document can contain only one root element The root element of an XML document is a single element that contains all of the content that is considered to be part of the document itself. The root element is the first element to appear after the document's prolog section. The root element is also known as the document element.
- All XML elements must contain end tags Although end tags are optional with certain HTML document elements, all elements in an XML document must have an end tag.
- Element start and end tag names must be identical XML is case-sensitive, so the name of an end tag must exactly match the name of its accompanying start tag.
- XML elements cannot overlap If the start tag for an element appears within another element, it must end within the same containing element.
- All attribute values must use quotation marks Attribute values must be enclosed in either single or double quotation marks.
- You cannot use the following characters within the text of an XML document: < > & These are special characters that have a specific meaning for XML parsers. If you need to use these characters in the text of your XML document, you should use predefined character or entity references.
Following these rules will ensure that your XML document is well formed, which means that it adheres to XML syntax as set forth by the W3C recommendation. XML documents are considered to have valid XML if they use an XML Schema to constrain the type of data that can be used in the XML document.
XML documents consist of two primary parts: a prolog and a root element. XML documents can also contain comments.
Prolog
The prolog is the first section of an XML document. The prolog contains the XML declaration, which states that the document is an XML document; processing instructions, which provide information that XML parsers use to determine how to handle the document; and schema declarations, which determine the XML Schemas that should be used to verify that the document is valid. The following is an example of a prolog in an XML document:
<?xml version="1.0" encoding="UTF-8"?>
Root element
The root element is the main section of an XML document. The root element contains the document's data, along with the information that describes the structure of the data. The following is an example of the root element section in an XML document:
<Employees>
...
</Employees>
Information in the root element is stored in two types of XML constructs: elements and attributes. All the elements and attributes used in an XML document are nested within the root element.
Elements Elements are the primary building blocks of an XML document. They are used to represent both the structure of the XML document and the data that is contained in the XML document. Elements contain a start tag, content, and an end tag. Because XML is case-sensitive, the start and end tags must match exactly. The following is an example of a simple Employee element that describes the name of an employee. The Employee element is nested within a root element named Employees:
<Employees>
<Employee>
<Name>Patricia Doyle</Name>
</Employee>
</Employees>
Elements can contain text, other elements, character references, or character data sections. Elements that have no content are called empty elements. The start and end tags of an empty element can be combined into one tag, as shown in the following example:
<Name/>
Attributes Attributes are XML constructs that use a name-value pair that is associated with a particular element. They contain information about the element's content that is not necessarily intended to be displayed but is used instead to describe some property of the element. Attribute values are enclosed in either single or double quotation marks, separated from the name of the attribute by an equal sign, and enclosed within the element's start tag. The following is an example of an EmployeeNumber attribute that is associated with a Name element:
<Employees>
<Employee>
<Name EmployeeNumber="10101">Patricia Doyle</Name>
</Employee>
</Employees>
Comments
XML documents can also contain comments. Comments are not processed by the XML parser but are used to provide meaningful documentation in the XML source of the document. Comments begin with<!--
and end with
-->
. The text between these characters is ignored by the XML parser. The
following is an example of a comment in an XML document:
<!-- This XML document contains employee information. -->
<Employees>
<Employee>
<Name EmployeeNumber="10101">Patricia Doyle</Name>
</Employee>
</Employees>