Introduction to DTD Syntax
DTDs use a specialized non-XML vocabulary, which includes the following grammar for writing and declaring markup rules that define a specific type of XML document structure:
DTD declaration statements can be added internally as a section within the <!DOCTYPE>
declaration of the XML document. Alternatively, you can use a resource URI to point to an external DTD file.
For example, the following is an internal DTD that could be added to the sample XML file (Books.xml) to describe and validate its contents.
Internal DTD
<!DOCTYPE catalog [ <!ELEMENT catalog (book+) > <!ELEMENT book (author, title, genre, price, publish_date, description) > <!ATTLIST book id ID #REQUIRED > <!ELEMENT author (#PCDATA) > <!ELEMENT title (#PCDATA) > <!ELEMENT genre (#PCDATA) > <!ELEMENT price (#PCDATA) > <!ELEMENT publish_date (#PCDATA) > <!ELEMENT description (#PCDATA) > ]>
The first line identifies "catalog" as the document type (DOCTYPE
), which also happens to be the name of the root element. After this, various elements are defined for the "catalog" document type.
A set of open ([) and close (]) brackets contain this DTD as an internal section within the DOCTYPE statement. This section needs to be inserted at the top of the books.xml sample file so that it can be read as a set of directives for the XML parser to use when validating and parsing the remaining XML portion of the of the document. The DTD declaration statements are made within these brackets:
- The first
ELEMENT
statement declares the document element, in this case the<catalog>
element. It states that<catalog>
must include one or more<book>
elements as children, as indicated by a trailing + sign. - The second
ELEMENT
statement declares the<book>
element, and that its contents are restricted to the following six child elements (which must be used in this order):<author>
,<title>
,<genre>
,<price>
,<publish_date>
, and<description>
. The<catalog>
and<book>
elements both exemplify what is known as a structured-element content model, where these elements only permit certain specified elements as their contents. - Next, an
ATTLIST
statement declares a requiredid
attribute for use with the<book>
element. - The remaining
ELEMENT
statements in the DTD declare that for each of the six element children for<book>
, only text is allowed as content. This is specified by the#PCDATA
keyword, which means parsed character data. By contrast, elements that allow either text or a combination of text and other markup have a mixed content model.
See Also
What is a DTD? | Authoring DTDs