Writing XML Using OPENXML

XML and Internet Support

XML and Internet Support

Writing XML Using OPENXML

OPENXML is a Transact-SQL keyword that provides a rowset over in-memory XML documents. OPENXML is a rowset provider similar to a table or a view. OPENXML allows access to XML data as if it is a relational rowset by providing a rowset view of the internal representation of an XML document. The records in the rowset can be stored in database tables (similar to the rowsets provided by tables and views).

OPENXML can be used in SELECT, and SELECT INTO statements wherever rowset providers, such as a table, a view or OPENROWSET can appear as the source. For information about the syntax of OPENXML, see OPENXML.

To write queries against an XML document using OPENXML, you must first call sp_xml_preparedocument, which parses the XML document and returns a handle to the parsed document that is ready for consumption. The parsed document is a tree representation of various nodes (elements, attributes, text, comment, and so on) in the XML document. The document handle is passed to OPENXML, which then provides a rowset view of the document based on the parameters passed to it.

The internal representation of an XML document must be removed from memory by calling sp_xml_removedocument system stored procedure to free the memory.

This illustration shows the process.

Note  To understand OPENXML, familiarity with XPath queries and understanding of XML is required. For more information about XPath support in Microsoft® SQL Server™ 2000, see Using XPath Queries.

OPENXML Parameters

The parameters to OPENXML include:

  • An XML document handle (idoc)

  • An XPath expression to identify the nodes to be mapped to rows (rowpattern)

  • A description of the rowset to be generated

  • Mapping between the rowset columns and the XML nodes
XML Document Handle (idoc)

The document handle is returned by the sp_xml_preparedocument stored procedure.

XPath Expression to Identify the Nodes to Be Processed (rowpattern)

The XPath expression specified as rowpattern identifies a set of nodes in the XML document. Each node identified by rowpattern corresponds to a single row in the rowset generated by OPENXML.

The nodes identified by the XPath expression can be any XML node (elements, attributes, processing instructions, and so on) in the XML document. If rowpattern identifies a set of elements in the XML document, there is one row in the rowset for each element node identified. For example, if rowpattern ends in an attribute, a row is created for each attribute node selected by rowpattern.

Description of the Rowset to Be Generated

A rowset schema must be provided to OPENXML to generate the rowset. You can specify the rowset schema by using the optional WITH clause. These options are available for specifying the rowset schema:

  • Specify the complete schema in the WITH clause.

    In specifying the rowset schema you specify the column names and their data types and their mapping to the XML document.

    You can specify the column pattern (using the ColPattern parameter in the SchemaDeclaration). The column pattern specified is used to map a rowset column to the XML node identified by rowpattern and also to determine the type of mapping.

    If ColPattern is not specified for a column, the rowset column maps to the XML node with same name based on the mapping specified by the flags parameter. However, if ColPattern is specified as part of schema specification in the WITH clause, it overwrites the mapping specified in the flags parameter.

  • Specify the name of an existing table in the WITH clause.

    You can simply specify an existing table name whose schema can be used by OPENXML to generate the rowset.

  • Do not specify the WITH clause.

    In this case, OPENXML returns a rowset in the edge table format. This is called an edge table because, in this table format, every edge in the parsed XML document tree maps to a row in the rowset.

    Edge tables represent the fine-grained XML document structure (for example, element/attribute names, the document hierarchy, the namespaces, processing instructions, and so on) in a single table. The edge table format allows you to get additional information that is not exposed through the metaproperties. For more information about metaproperties, see Specifying Metaproperties in OPENXML.

    The additional information provided by edge table allows you to store and query the data type of an element/attribute, the node type (element node, attribute node, or a value node), store and query information about the XML document structure, and to possibly build your own XML document management system.

    Using an edge table, you can write stored procedures that take XML documents as a BLOB input, produce the edge table, and then extract and analyze the document on its finest level (find the document hierarchy, element/attribute names, namespaces, processing instructions, and so on).

    The edge table also can serve as a storage format for XML documents when mapping to other relational formats does not make sense, and an ntext field is not providing enough structural information.

    Whenever you would use an XML parser to examine the XML document, you can use edge table to get the same information.

    This table describes the structure of the edge table.

    Column name Data type Description
    id Bigint Is the unique ID of the document node.

    The root element has an ID value 0. The negative ID values are reserved.

    parentid Bigint Identifies the parent of the node. The parent identified by this ID is not necessarily the parent element but it depends on the NodeType of the node whose parent is identified by this ID. For example, if the node is a text node, the parent of it may be an attribute node.

    If the node is at the top level in the XML document, its ParentID is NULL.

    nodetype Int Identifies the node type. Is an integer that corresponds to the XML DOM node type numbering (see DOM for node information).

    Some of the node types are:

    1 = Element node
    2 = Attribute node
    3 = Text node

    localname nvarchar Gives the local name of the element or attribute. Is NULL if the DOM object does not have a name.
    prefix nvarchar Is the namespace prefix of the node name.
    namespaceuri nvarchar Is the namespace URI of the node. If the value is NULL, no namespace is present.
    datatype nvarchar Is the actual data type of the element or attribute row and is NULL otherwise. The data type is inferred from the inline DTD or from the inline schema.
    prev Bigint Is the XML ID of the previous sibling element. Is NULL if there is no direct previous sibling.
    text Ntext Contains the attribute value or the element content in text form (or is NULL if the edge table entry does not need a value).

Mapping Between the Rowset Columns and the XML Nodes

In the OPENXML statement, you can optionally specify the type of mapping (attribute-centric, element-centric) between the rowset columns and the XML nodes identified by the rowpattern. This information is used in transformation between the XML nodes and the rowset columns.

There are two ways to specify the mapping (you can specify both):

  • Use the flags parameter.

    The mapping specified by the flags parameter assumes name correspondence where the XML nodes map to corresponding rowset columns with same name.

  • Use the ColPattern parameter.

    ColPattern, an XPath expression, is specified as part of SchemaDeclaration in the WITH clause. The mapping specified in ColPattern overwrites the mapping specified by flags parameter.

    ColPattern can be used to specify the special nature of the mapping (in case of attribute-centric and element-centric mapping) that overwrites or enhances the default mapping indicated by the flags.

    ColPattern is specified if:

    • The column name in the rowset is different from the element/attribute name to which it is mapped. In this case ColPattern is used to identify the XML element/attribute name to which the rowset column maps.

    • You want to map a metaproperty attribute to the column. In this case, ColPattern is used to identify the metaproperty to which the rowset column maps. For more information about using metaproperties , see Specifying Metaproperties in OPENXML.

Both the flags and ColPattern parameters are optional. If no mapping is specified, attribute-centric mapping (default value of flags parameter) is assumed by default.

Attribute-centric Mapping

If the flags parameter in OPENXML is set to attributes map to the columns in the rowset based on the name correspondence. Name correspondence means that XML attributes of a given name are stored in a column in the rowset with the same name.

If the column name is different from the attribute name to which it maps, ColPattern must be specified.

If XML attribute has a namespace qualifier, the column name in the rowset must have the qualifier as well.

Element-centric Mapping

Setting the flags parameter in OPENXML to 2 (XML_ELEMENTS) specifies the element-centric mapping. It is similar to attribute-centric mapping except for these differences:

  • The name correspondence of the mapping (for example, a column mapping to an XML element with the same name) chooses the noncomplex subelements, unless a column-level pattern is specified. In the retrieval case, if subelement is complex (contains further subelements), the column is set to NULL. Attribute values of the subelements are disregarded.

  • Multiple subelements with the same name overwrite each other in the order retrieved. Fusion on the parent appends subelement in case of name equivalence.

See Also

sp_xml_preparedocument

sp_xml_removedocument

OPENXML