Implementing DTDs

MSXML 5.0 SDK

Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - XML Schemas

Implementing DTDs

A DTD can be implemented as a single marked section in the DOCTYPE declaration, or as a single external DTD file. In some cases, a DTD can become quite large. When this happens, additional steps are required to make the DTD maintainable.

The following are the primary ways to implement maintainable and reusable DTDs:

Using Parameter Entities

Parameter entities are a way of declaring and using entity contents internally within the DTD itself. Though not required, parameter entities can be helpful if you are working on a large DTD that has a content structure you want to reuse and apply elsewhere in the DTD.

For example, if you were writing a DTD to generate various e-business documents to be sent to customers in e-mail message format, you might declare a parameter entity such as the one in the following example.

<!ENTITY % email "from, to, subject, message" >
<!ELEMENT WelcomeMessage (%email;) >
<!ELEMENT MonthlyBilling (%email;) >
<!ELEMENT PaymentConfirmation (%email;) >

In this example, an ENTITY statement first declares a parameter entity called "email", making its contents available for reuse in other markup declarations in the same DTD. The three ELEMENT declarations that follow it can then apply the value of the entity by referencing it as "%email;". This specifies a common and standard definition of how to mark up e-mail for documents created using the DTD that contains these statements.

If you later decided to redo or extend this DTD to support optional CC: and BCC: fields, you could revise the ENTITY declaration as follows:

<!ENTITY % email "from, to, cc?, bcc?, subject, message" >

All of the other elements that refer to this entity and use its contents will inherit the change.

Modularizing a DTD

Parameter entities allow you to reuse the value of a single DTD declaration to other locations in the same DTD, but in some cases you might have other reasons to partition a section of a large external DTD into a separate file that can be linked to the main DTD (.dtd) file. This type of implementation is known as a modularized DTD.

Extensible HTML (XHTML) is one example of where a modular DTD implementation is helpful. XHTML defines HTML as an XML document type. This allows a document that conforms to XHTML to be processed as either HTML or XML. One of the key differences between XML and HTML is the support for built-in or predefined general entities. HTML supports a large number of these, while XML supports a very limited number of character and entity references natively. Therefore, if you are using the standard XHTML DTD or adapting it to your own DTD, you might notice that it uses parameter entities to declare separate DTD modules, as shown in the following example:

<!ENTITY % HTMLlat1 PUBLIC 
   "-//W3C//ENTITIES Latin 1 for XHTML//EN"
   "xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
   "-//W3C//ENTITIES Symbols for XHTML//EN"
   "xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
   "-//W3C//ENTITIES Special for XHTML//EN"
   "xhtml-special.ent">

The highlighted values refer to three separate files (xhtml-lat1.ent, xhtml-symbol.ent, and xhtml-special.ent) that are DTD modules included as part of the XHTML 1.0 strict DTD. Because each of these sets of definitions is a fairly long and coherent section of the DTD, it makes sense in this case that they have been divided out into separate files for each of the three categories of HTML character entities.

Modular DTDs are a good fit when a DTD might be too large to be manageable as a single file. However, modular DTDs can, if overused, have an impact on parsing performance and document load times. Usually, a single layer of modularity (a main DTD module with direct links to all child modules) provides the right balance.

See Also

Declaring DTDs