Content Model

MSXML 5.0 SDK

Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - XML Schemas

Content Model

The content model describes the content structure of elements and attributes in the XML document. To describe the content structure, you can use the model, minOccurs, maxOccurs, order, content, minLength, maxLength, default, and type attributes.

In this section, the content model of an element is described first and then the content model of an attribute is described. Attributes used to describe the content model are introduced.

Specifying the Content Model of an Element

When specifying the content model of an element:

  • Describe the child elements and attributes that can appear in the element.
  • Specify whether the element can include text and elements.
  • Specify the order in which the child elements can appear in an instance of the element.

Example

The following example describes the content model and elements for an XDR schema, BookSchema.xml.

<s:Schema xmlns:s="urn:schemas-microsoft-com:xml-data">
  <s:ElementType name="title" content="mixed"/>
  <s:ElementType name="authors" content="textOnly"/>
  <s:ElementType name="pages" content="textOnly"/>
  <s:ElementType name="book" order="seq" content="eltOnly">
    <s:element type="pages"/>
    <s:element type="title"/>
    <s:element type="authors"/>
  </s:ElementType>
</s:Schema>

The content model for the title, author, and pages elements is straightforward. Because the content attribute specifies the elements as textOnly, these elements contain only text and nothing else (that is, no child elements).

The content model for the book element is more complex. The content attribute for the book element is eltOnly. This value means that the book element can contain only the elements specified (title, author, and pages) in the schema. Furthermore, for each book element instance in the document, the child elements must be in the order specified in the schema, validated by setting the order attribute to the value of seq.

The following is a valid instance of the XML document.

<x:book xmlns:x="x-schema:BookSchema.xml">
  <x:pages>474</x:pages>
  <x:title>Applied XML: A Toolkit for Programmers</x:title>
  <x:authors>Alex Ceponkus and Faraz Hoodbhoy</x:authors>
<x:/book>
Alert Elements with content="empty" and model="open" are not allowed.

The following paragraphs describe the attributes used by the content model.

The model Attribute (Open and Closed Content Models)

The content model attribute of an element can have the values open or closed. In an open content model, an element in an XML document can have additional child elements and attributes that are not declared in the XDR schema that the document references. Conversely, a closed model is one in which the document cannot include any information that does not follow the rules specified in the referenced schema.

By default, the content model for an XDR schema is open. This provides an XDR schema with the extensibility that is not present in a document type definition (DTD), which is a closed model. When using a DTD, a document cannot include any information that does not follow the rules specified in the referenced DTD.

For example, consider the following document fragment.

<x:book xmlns:x="x-schema:BookSchema.xml" 
        xmlns:y="urn:some-new-namespace">
  <x:title y:id="123">Applied XML: A Toolkit for Programmers</x:title>
  <x:authors>Alex Ceponkus and Faraz Hoodbhoy</x:authors>
  <y:publisher>Wiley Computer Publishing</y:publisher>
</x:book>

This fragment refers to the BookSchema.xml schema presented earlier. Because this schema specifies an open content model (the default), this document fragment is valid even though it has additional elements and attributes not specified in BookSchema.xml (such as the id attribute in the title element and the publisher child element, both of which are defined in the urn:some-new-namespace namespace).

In an open content model, the following constraints apply:

  • Content that breaks the existing content model cannot be added or removed. For example, BookSchema.xml defines the book element as a sequence of three elements. Therefore, you must first provide that exact sequence of elements before adding any open content. Thus, you cannot remove the pages element, nor can you provide two title elements next to each other. Doing so will cause validation to fail.
  • Undeclared elements can be added as long as they are defined in a different namespace.
  • After satisfying the content model for the schema, other elements can be added. For example, an XML document will validate even if a second title element is added after the pages element.

Specify a closed content model if you do not want the default value of open to be used. In this case, you use the model attribute for the ElementType, as in the following example.

<x:ElementType name="book" model="closed">

This indicates that a book element can only contain the content specified — the title, author, and pages elements — in the schema. Using this setting, the extended elements in the preceding XML fragment would prevent the document from being validated.

The content Attribute

When describing the content model, you use the content attribute to specify whether an element can contain only text (content="textOnly"), only elements (content="eltOnly"), a mixture of text and elements (content="mixed"), or nothing at all (content="empty"). The default value for this attribute is mixed.

If the ElementType has a data type specified (such as date, number, and so on), the element is assumed to contain only text (the textOnly value is implied for the content attribute).

For example, the following XDR schema defines a book element that contains three child elements: title, author, and pages. These child elements can contain only text because the content attribute for each of these elements is textOnly. The book element can contain only elements and no text because its content attribute is eltOnly. This book element uses a closed content model (model="closed"). Thus, the book element can contain only these three child elements and no text or additional subelements and attributes.

<?xml version="1.0"?>
<s:Schema xmlns:s="urn:schemas-microsoft-com:xml-data">
  <s:ElementType name="title" content="textOnly" />
  <s:ElementType name="author" content="textOnly" />
  <s:ElementType name="pages" content="textOnly" />
  <s:ElementType name="book"  content="eltOnly" model="closed">
    <s:element type="title" />
    <s:element type="author" />
    <s:attribute type="pages" />
    
    <s:AttributeType name="copyright" />
    <s:attribute type="copyright" />
  </s:ElementType>
</s:Schema>

If the content attribute for an element is empty, that element cannot contain any text or child elements; however, it can have attributes. A mixed element, on the other hand, can contain text and child elements.

The minOccurs and maxOccurs Attributes

You can specify how many times a child element can appear within its parent element by using the minOccurs and maxOccurs attributes.

<element type="Item" maxOccurs="*" />

The maxOccurs attribute is a constraint rule, specifying the maximum number of times that a child element may appear. Valid values for maxOccurs are 1 and "*"; the "*" indicates that an unrestricted number of elements can appear. The default value for maxOccurs is "1"; however, when content="mixed", the default value is "*".

In a similar way, you can specify a minimum number of times a child element can appear with minOccurs. The valid values for minOccurs are "0" and "1". For example, to make a child element optional in the instance of a parent element, set minOccurs to "0". The default value for minOccurs is "1".

In the following schema, the author element sets maxOccurs to "*" while the pages element sets the minOccurs attribute to "0".

<?xml version="1.0"?>
<s:Schema xmlns:s="urn:schemas-microsoft-com:xml-data">
  <s:ElementType name="title" content="textOnly" />
  <s:ElementType name="author" content="textOnly" />
  <s:ElementType name="pages" content="textOnly" />
  <s:ElementType name="book"  content="eltOnly" model="closed">
    <s:element type="title" />
    <s:element type="author" maxOccurs="*" />
    <s:element type="pages" minOccurs="0" />
  </s:ElementType>
  <s:ElementType name="root" >
    <s:element type="book" />
  </s:ElementType>
</s:Schema>

Based on the preceding example schema, the following is a valid document instance.

<root>
  <book>
    <title>C Programming</title>
    <author>Author A</author>
    <author>Author B</author>
    <pages>300</pages>
  </book>
  <book>
    <title>Java Programming</title>
    <author>Author C</author>
  </book>
</root>

The minLength and maxLength Attributes

You can specify a data type for an element or attribute by using the urn:schemas-microsoft-com:datatypes namespace. The minLength and maxLength attributes defined in this namespace can be used to constrain the length of a string, number, bin.hex, or bin.base64 data type.

  • For string and number data types, maxLength specifies the maximum number of characters allowed, while minLength specifies the minimum number of characters.
  • For bin.hex and bin.base64, maxLength sets the maximum number of bytes of the binary object, while minLength sets the minimum number of bytes.

Regardless of the data type, both length attributes are inclusive. That is, the data can be as long as the length set by maxLength, but no longer, and as short as the length set by minLength, but no shorter.

The maxLength and minLength attributes are enforced at parse time and run-time. Valid parent elements are ElementType, AttributeType, and datatype.

The following XDR schema shows an example of how to use these length attributes. This schema specifies the content model for the userID, password, and LoginInfo elements. However, only the content model of the password element specifies a minimum and maximum length (6 to 8 characters) for the required string.

<?xml version="1.0"?>
<s:Schema xmlns:s="urn:schemas-microsoft-com:xml-data" 
        xmlns:dt="urn:schemas-microsoft-com:datatypes" >
  <s:AttributeType name="userID" 
             xmlns:dt="urn:schemas-microsoft-com:datatypes"
             dt:type="string" />
  <s:AttributeType name="password" 
             dt:type="string" 
             dt:minLength="6"
             dt:maxLength="8"/>
  <s:ElementType name="LoginInfo" >
    <s:attribute type="userID" />
    <s:attribute type="password" />
  </s:ElementType>
</s:Schema>

Using the preceding schema, these two documents are valid instances.

<LoginInfo userID="1" password="xyz123" />
<LoginInfo userID="2" password="" />

Although the password attribute value in the second instance is less than the minLength, the instance is still valid because a password value specified as "" is treated the same as if the password attribute is not specified.

The order Attribute

When specifying the content model, you use the order attribute to specify how sequences of elements can appear in the document instance. You can assign the order attribute the value of seq, one, or many.

The seq value indicates that the enclosed elements must appear in the same order as they appear in the schema, as in the following example.

<ElementType name="PurchaseOrder" order="seq">
  <element type="PONumber" />
  <element type="PODate" />
  <element type="ShipAddress" />
</ElementType>

The one value specifies that only one of the child elements defined in an ElementType can appear. For example, to specify that an Item element can contain either a product element or a backOrderedProduct element, but not both, the schema can be specified as follows.

<ElementType name="Item" order="one">
   <element type="product" />
   <element type="backOrderedProduct" />
</ElementType>

The many value specifies that the child elements can appear in any order, and in any quantity.

The default value for the order attribute depends on the content model in use. When the content attribute is not set or set to the value of mixed, the default value for order is many. When the content attribute is set to eltOnly, the default value for order is seq. Unexpected problems can occur when the order attribute is not set and the default value is inserted.

For example, in this schema the order attribute is not specified.

<?xml version="1.0"?>
<Schema xmlns="urn:schemas-microsoft-com:xml-data">
   <ElementType name="PONumber"/>
   <ElementType name="PODate"/>
   <ElementType name="ShipAddress"/>
   <ElementType name="product"/>
   <ElementType name="backOrderedProduct"/>
   <ElementType name="PurchaseOrder">
      <element type="PONumber" maxOccurs="1" />
      <element type="PODate" minOccurs="1" />
      <element type="ShipAddress" minOccurs="1" maxOccurs="*" />
      <element type="Item"/>
   </ElementType>
   <ElementType name="Item" content="eltOnly">
      <element type="product" />
      <element type="backOrderedProduct" />
   </ElementType>
</Schema>

The schema appears to the parser as follows.

<?xml version="1.0"?>
<Schema xmlns="urn:schemas-microsoft-com:xml-data">
   <ElementType name="PONumber"/>
   <ElementType name="PODate"/>
   <ElementType name="ShipAddress"/>
   <ElementType name="product"/>
   <ElementType name="backOrderedProduct"/>
   <ElementType name="PurchaseOrder" content="mixed" order="many">
      <element type="PONumber" maxOccurs="1" />
      <element type="PODate" minOccurs="1" />
      <element type="ShipAddress" minOccurs="1" maxOccurs="*" />
      <element type="Item"/>
   </ElementType>
   <ElementType name="Item" content="eltOnly" order="seq">
      <element type="product" />
      <element type="backOrderedProduct" />
   </ElementType>
</Schema>

Using the preceding schema will not produce the expected validation of the document. Now multiple instances of the element can be entered without an error occurring, as shown in the following XML document example.

<?xml Version="1.0"?>
<PurchaseOrder xmlns="x-schema:PurchaseOrder.xml " schemaLocation="http://www.example.microsoft.com/someschema.xml">
<!-- The following lines would validate against the schema -->
  <PONumber>1234</PONumber>
  <PONumber>1235</PONumber>
  <ShipAddress>555 Nowhere Blvd</ShipAddress>
<!-- The following lines would not validate against the schema -->
   <Item>
      <backOrderedProduct>firstBOproduct</backOrderedProduct>
      <product>firstprod</product>
   </Item>
</PurchaseOrder>

The order attribute is valid for either an ElementType or group element.

The <group> Element

The group element enables you to specify constraints on a subset of child elements. This can be quite useful when specifying elements within a schema.

The group element accepts the order, minOccurs, and maxOccurs attributes.

For example, the following schema defines the Item element as containing a group element with two child elements, product and backOrderedProduct. Because this group element sets its order attribute to one, only one of these child elements can appear in the Item element. This prevents the Item element from having both a product and a backOrderedProduct element. The Item element can have only one of these child elements.

<ElementType name="Item">
    <group order="one">
        <element type="product" />
        <element type="backOrderedProduct" />
    </group>
    <element type="quantity"/>
    <element type="price"/>
</ElementType>

Using the preceding schema, the following document is a valid instance.

<Item>
  <product>CD</product>
  <quantity>100</quantity>
  <price>10</price>
</Item>
<Item>
  <product>FloppyDisk</product>
  <quantity>100</quantity>
  <price>1</price>
</Item>

Specifying the Content Model of an Attribute

The AttributeType element specifies the type of attribute used within elements. Using this element, you can even specify whether the attribute is required for the element, as in the following.

<AttributeType name="shipTo" dt:type="idref" required="yes"/>

The attribute element specifies instances of an attribute defined within the AttributeType element. You use the attribute element within an ElementType element.

Attributes are more limited in some ways than elements. For example, attributes cannot contain child elements, and you cannot require attributes to appear in any particular order; nor can you pose alternatives, such as a "product" or a "backOrderedProduct". You can specify whether an attribute is required or optional, but an attribute can appear only once per element.

At the same time, attributes have the following capabilities that elements do not:

  • Attributes can limit their legal values to a small set of strings, as in the following example.
    <AttributeType name="priority" dt:type="enumeration" dt:values="high medium low" />
  • Attributes indicate a value to be inferred if the attribute is omitted from an element (that is, the attribute's default value), as in the following example.
    <AttributeType name="quantity" dt:type="int">
    <attribute type="quantity" default="1"/>

Although different element types can have attributes with the same name, these attributes are independent and unrelated.

Specifying the Default Value of an Attribute

To specify the default value of an attribute, use the default attribute. You specify this attribute on AttributeType and attribute elements in the schema.

For example, the following schema assigns the default value of Seattle to the City attribute.

<?xml version="1.0" ?>
<Schema xmlns="urn:schemas-microsoft-com:xml-data" >
<ElementType name="Customer" >
    <AttributeType name="CustomerID" />
    <AttributeType name="ContactName" />
    <AttributeType name="City" default="Seattle" />

    <attribute type="CustomerID" />
    <attribute type="ContactName" />
    <attribute type="City"  />
</ElementType>
</Schema>

If you have a document instance that has a Customer element with a missing City attribute, the parser assumes the default value (Seattle) for the attribute and validates the document. For example, consider the following document instance.

<Customer CustomerID="ALFKI" ContactName="Maria Anders" City="London" />
<Customer CustomerID="ANATR" ContactName="Ana Trujillo" />

The customer ALFKI specifies a city (London), so the default value is ignored. On the other hand, the customer ANATR has no City attribute, so it receives the default value (Seattle).

The behavior is slightly different if the schema specifies both a default attribute and a required attribute. For example, the following AttributeType specifies the City attribute as required with a default value of Seattle.

<AttributeType name="City" default="Seattle" required="yes" />

The Customer element is required to have a City attribute and it must have Seattle as its value.