Introduction to DTD Syntax

MSXML 5.0 SDK

Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - XML Schemas

Introduction to DTD Syntax

DTDs use a specialized non-XML vocabulary, which includes the following grammar for writing and declaring markup rules that define a specific type of XML document structure:

DTD declaration statements can be added internally as a section within the <!DOCTYPE> declaration of the XML document. Alternatively, you can use a resource URI to point to an external DTD file.

For example, the following is an internal DTD that could be added to the sample XML file (Books.xml) to describe and validate its contents.

Internal DTD

<!DOCTYPE catalog [ 
<!ELEMENT catalog    (book+) >
<!ELEMENT book       (author, title, genre, price, publish_date, description) >
<!ATTLIST book       id ID #REQUIRED >
<!ELEMENT author         (#PCDATA)   >
<!ELEMENT title          (#PCDATA)   >
<!ELEMENT genre          (#PCDATA)   >
<!ELEMENT price          (#PCDATA)   >
<!ELEMENT publish_date   (#PCDATA)   >
<!ELEMENT description    (#PCDATA)   >
]>

The first line identifies "catalog" as the document type (DOCTYPE), which also happens to be the name of the root element. After this, various elements are defined for the "catalog" document type.

A set of open ([) and close (]) brackets contain this DTD as an internal section within the DOCTYPE statement. This section needs to be inserted at the top of the books.xml sample file so that it can be read as a set of directives for the XML parser to use when validating and parsing the remaining XML portion of the of the document. The DTD declaration statements are made within these brackets:

  • The first ELEMENT statement declares the document element, in this case the <catalog> element. It states that <catalog> must include one or more <book> elements as children, as indicated by a trailing + sign.
  • The second ELEMENT statement declares the <book> element, and that its contents are restricted to the following six child elements (which must be used in this order): <author>, <title>, <genre>, <price>, <publish_date>, and <description>. The <catalog> and <book> elements both exemplify what is known as a structured-element content model, where these elements only permit certain specified elements as their contents.
  • Next, an ATTLIST statement declares a required id attribute for use with the <book> element.
  • The remaining ELEMENT statements in the DTD declare that for each of the six element children for <book>, only text is allowed as content. This is specified by the #PCDATA keyword, which means parsed character data. By contrast, elements that allow either text or a combination of text and other markup have a mixed content model.

See Also

What is a DTD? | Authoring DTDs