Quick Guide to XPath expressions for use with the XML_QUERY activity
There are many XPath resources available on the web. To get you started, you could try the following:
- For one quick and easy introduction to XPath: XPath Tutorial
- Another XPath tutorial can be found here: XPath Tutorial
- XML in a Nutshell – A Desktop Quick Reference: Chapter 9 XPath
- For a description of the XPath standard: XML Path Language (XPath) Version 1.0
This document does not intend to provide a definitive description of or reference to XPath expression syntax. However, for those readers who have not used XPath expressions before, following is a brief overview and examples that might help you get started with the XML_QUERY activity. Refer to the following headings:
The examples provided later in this section will refer to the following simple example XML document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Orders SYSTEM "http://www.lansa.com/schemas/tutorder.dtd" >
<Orders>
<SalesOrder SONumber="12345">
<Customer CustNumber="543">
<CustName>ABC Industries</CustName>
<Street>123 North St.</Street>
<City>Bankstown</City>
<State>NSW</State>
<PostCode>2087</PostCode>
</Customer>
<OrderDate>2012-11-19</OrderDate>
<Line LineNumber="1">
<Part PartNumber="123">
<Description>Gasket Paper</Description>
<Price>9.95</Price>
</Part>
<Quantity>10</Quantity>
</Line>
<Line LineNumber="2">
<Part PartNumber="456">
<Description>Glue</Description>
<Price>13.27</Price>
</Part>
<Quantity>5</Quantity>
</Line>
</SalesOrder>
</Orders>
Introduction to XML Path Language (XPath)
XPath is a syntax for constructing path expressions to select nodes in an XML document. To some extent, these path expressions look very similar to path expressions you use when working with the file system on your computer.
In general, XPath recognises seven types of nodes, viz. element, attribute, text, namespace, processing-instruction, comment, and document nodes. In the context of the XMLQueryService we are chiefly concerned with the element and attribute nodes and of course, the document node.
In the example XML document shown above, some of the
are <Orders>, <SalesOrder>, <Customer>, <OrderDate>, <Line> and <Part>, while the include SONumber=, CustNumber=, LineNumber= and PartNumber=.The following is an example XPath expression that will select the PartNumber= attribute of the first <Part> element in the second <Line> element in the first <SalesOrder> element of the example XML document:
/Orders/SalesOrder[1]/Line[2]/Part[1]/@PartNumber
Note that the selection of the <SalesOrder>, <Line> and <Part> elements in the above example are by ordinal index. In particular, the selection of the <Line> element does NOT refer to the value of the LineNumber= attribute (although that is possible too, as you will see later).
XPath provides a large number of built-in functions that can manipulate and compare values in a variety of ways for more advanced usage. For example, the following expression uses the
built-in function to select all <Part> elements (wherever they occur) whose <Description> element contains the string "Paper"://Part[contains(Description, "Paper")]
In XPath, you select a node or set of nodes, by following a path or steps. Your XPath expression will often include one or more of the following:
nodename |
Selects all nodes with the specified name. |
/ |
Selects from the root node |
// |
Selects nodes in the document from the current node that match the selection no matter where they are |
. |
Selects the current node |
.. |
Selects the parent of the current node |
@nodename |
Selects attributes with the specified name |
In XPath, a predicate is a sub-expression contained in square brackets that is used to select a specific node or a node that contains a specific value. The following are some examples of XPath expressions that use predicates:
/Orders/SalesOrder[1] |
Selects the first <SalesOrder> element that is a child of the <Orders> element. |
/Orders/SalesOrder[last()] |
Selects the last <SalesOrder> element that is a child of the <Orders> element. (In the particular instance of the example XML document shown, there is only one <SalesOrder> element and so the result will be the same.) |
//Part[Price<=10.00] |
Selects <Part> elements, wherever they occur, whose <Price> element has a value less than or equal to 10.00. |
There is much more to know about XPath expressions. If you would like more information, you could start by referring to some of the links provided above.
Important note: XML node names are case sensitive. Your XPath expressions must specify the correct case when specifying element and attribute names. For example, the expression '//salesorder' is NOT the same as '//SalesOrder'. When used with the example XML document shown above, the former expression will FAIL to select ANY nodes, while the latter will select all <SalesOrder> elements, wherever they occur in the document.
XPath Examples for use with the XML_QUERY Activity
The following examples use XPath expressions in the parameters of the XML_QUERY activity to select values from the example XML document shown above.
1.This example will select nothing because XML and XPath are case-sensitive and the wrong case is used to select the <SalesOrder> elements:
XML_QUERYXMLFILE('TUTorder.xml')
QUERYNODES('//SALESORDER')
QUERYNODESVALUE1('@SONumber')
2.These two examples use alternate implementations to select all <SalesOrder> elements, and return the sales order number for each. Functionally, they are equivalent (when used with the example XML document):
XML_QUERYXMLFILE('TUTorder.xml')
QUERYNODES('//SalesOrder/@SONumber')
XML_QUERYXMLFILE('TUTorder.xml')
QUERYNODES('//SalesOrder')
QUERYNODESVALUE1('@SONumber')
3.Selects all <SalesOrder> elements, and returns the customer number for each:
XML_QUERYXMLFILE('TUTorder.xml')
QUERYNODES('//SalesOrder')
QUERYNODESVALUE1('Customer/@CustNumber')
4.Selects <Customer> elements that have a value of '543' for their customer number and returns the sales order number of the parent <SalesOrder> element:
XML_QUERYXMLFILE('TUTorder.xml')
QUERYNODES('//Customer[@CustNumber="543"]')
QUERYNODESVALUE1('../@SONumber')
5.Selects all <Part> elements for the <SalesOrder> element(s) with the order number specified and returns the part number and quantity for each:
XML_QUERYXMLFILE('TUTorder.xml')
QUERYNODES('//SalesOrder[@SONumber="12345"]/Line/Part')
QUERYNODESVALUE1('@PartNumber')
QUERYNODESVALUE2('../Quantity')
6.Selects all <Part> elements with a price greater than 2.99 and, for each, returns the order number, the part number, the price, the quantity and calculates and returns the extended value (price * quantity):
XML_QUERYXMLFILE('TUTorder.xml')
QUERYNODES('//Part[Price>2.99]')
QUERYNODESVALUE1('../../@SONumber')
QUERYNODESVALUE2('@PartNumber')
QUERYNODESVALUE3('Price')
QUERYNODESVALUE4('../Quantity')
QUERYNODESVALUE5('Price*../Quantity')
NOTE: the above examples are used in the supplied processing sequence EXAMPLE_XML01. This is a working example that you may wish to use as a starting point for your exploration of the XML_QUERY activity.
XML Namespaces and How They Affect XPath Expressions for the XML_QUERY Activity
The examples used so far operate on an XML document that contains no explicit namespace declarations and does not make use of namespace prefixes. This is the simplest case, but frequently does not reflect the real world.
Consider this minor alteration to the example XML document that specifies a
namespace for the XML document:<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Orders SYSTEM "http://www.lansa.com/schemas/tutorder.dtd" >
<Orders xmlns="urn:schemas-lansa-com:tutorder.dtd">
… etc …
</Orders>
Where a document makes use a single default namespace like this, the easiest approach to formulating XPath expressions for use with it is usually to disregard the namespace. Since only one namespace is used and there are no namespace prefixes present on the node names, you can usually use the same expressions as you would use with the earlier example. Each of the following queries work successfully with the example document that declares the default namespace, providing the document is
loaded in namespace-aware mode:XML_QUERYXMLFILE('TUTorderNamespaceDefault.xml')
QUERYNODES('//SalesOrder/@SONumber')
XML_QUERYXMLFILE('TUTorderNamespaceDefault.xml')
QUERYNODES('//SalesOrder')
QUERYNODESVALUE1('Customer/@CustNumber')
XML_QUERYXMLFILE('TUTorderNamespaceDefault.xml')
QUERYNODES('//Customer[@CustNumber="543"]')
QUERY NODESVALUE1('../@SONumber')
XML_QUERYXMLFILE('TUTorderNamespaceDefault.xml')
QUERYNODES('//SalesOrder[@SONumber="12345"]/Line/Part')
QUERY NODESVALUE1('@PartNumber')
QUERY NODESVALUE2('../Quantity')
XML_QUERYXMLFILE('TUTorderNamespaceDefault.xml')
QUERYNODES('//Part[Price>2.99]')
QUERY NODESVALUE1('../../@SONumber')
QUERY NODESVALUE2('@PartNumber')
QUERY NODESVALUE3('Price')
QUERY NODESVALUE4('../Quantity')
QUERY NODESVALUE5('Price*../Quantity')
However, in XML documents that use more than one namespace and/or implement namespace prefixes, things can get a little more complicated. Consider the following alternate example XML document and contrast it to the earlier example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Orders SYSTEM "http://www.lansa.com/schemas/tutorder.dtd" >
<tut:Orders xmlns:tut="urn:schemas-lansa-com:tutorder.dtd">
<tut:SalesOrder SONumber="12345">
<tut:Customer CustNumber="543">
<tut:CustName>ABC Industries</tut:CustName>
<tut:Street>123 North St.</tut:Street>
<tut:City>Bankstown</tut:City>
<tut:State>NSW</tut:State>
<tut:PostCode>2087</tut:PostCode>
</tut:Customer>
<tut:OrderDate>2012-11-19</tut:OrderDate>
<tut:Line LineNumber="1">
<tut:Part PartNumber="123">
<tut:Description>Gasket Paper</tut:Description>
<tut:Price>9.95</tut:Price>
</tut:Part>
<tut:Quantity>10</tut:Quantity>
</tut:Line>
<tut:Line LineNumber="2">
<tut:Part PartNumber="456">
<tut:Description>Glue</tut:Description>
<tut:Price>13.27</tut:Price>
</tut:Part>
<tut:Quantity>5</tut:Quantity>
</tut:Line>
</tut:SalesOrder>
</tut:Orders>
This document contains a namespace declaration and uses the associated namespace prefix on the element names. The use of namespace features and especially of namespace prefixes can complicate the syntax of the XPath expressions necessary for a given query.
Again you should refer to the many resources available on the web concerning XML namespaces and how they affect XPath. One such reference is:
The easiest approach to formulating XPath expressions for use with such an instance document is to disregard the namespace(s). If the document is loaded WITHOUT the namespace-aware option (the default mode), then you can use
the same expressions as you would use with the earlier example. Each of the following queries work successfully with the namespace prefixed version of the document as shown above (note that the namespace prefix is omitted entirely from the XPath expressions):XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
QUERYNODES('/Orders/SalesOrder/@SONumber')
XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
QUERYNODES('/Orders/SalesOrder')
QUERYNODESVALUE1('Customer/@CustNumber')
XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
QUERYNODES('/Orders/SalesOrder/Customer[@CustNumber="543"]')
QUERY NODESVALUE1('../@SONumber')
XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
QUERYNODES('/Orders/SalesOrder[@SONumber="12345"]/Line/Part')
QUERY NODESVALUE1('@PartNumber')
QUERY NODESVALUE2('../Quantity')
XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
QUERYNODES('/Orders/SalesOrder/Line/Part[Price>2.99]')
QUERY NODESVALUE1('../../@SONumber')
QUERY NODESVALUE2('@PartNumber')
QUERY NODESVALUE3('Price')
QUERY NODESVALUE4('../Quantity')
QUERY NODESVALUE5('Price*../Quantity')
If, however, your document declares more than one namespace, and, especially where there would be a namespace collision without the use of the namespaces, it may be necessary to load the document in namespace-aware mode. This is done by specifying the *NAMESPACEAWARE keyword in the XMLOPTIONS parameter of the XML_QUERY activity. For example:
XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
XMLOPTIONS('*NAMESPACEAWARE')
QUERYNODES('//tut:SalesOrder/@SONumber')
However, once the document is loaded in namespace-aware mode, the example queries shown up to this point will no longer function because now the namespace forms a part of the identification of nodes in the XML document.
There are a variety of ways to formulate your XPath expressions such that they will function in the way you require in namespace-aware mode and it is well beyond the scope of this document to attempt to cover all the options. However, here are a few examples that might help to get you started:
1.This example uses the
XPath built-in function to select nodes based on their (the node name WITHOUT the namespace prefix):XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
XMLOPTIONS('*NAMESPACEAWARE')
QUERYNODES('//*[local-name() = "SalesOrder"]')
QUERYNODESVALUE1('@SONumber')
2.If multiple namespaces are used and 'SalesOrder' is ambiguous in this context, then you can extend the previous example to use the
XPath built-in function:XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
XMLOPTIONS('*NAMESPACEAWARE')
QUERYNODES('//*[local-name() = "SalesOrder"
and namespace-uri() = "urn:schemas-lansa-com:tutorder.dtd"]')
QUERYNODESVALUE1('@SONumber')
3.Alternatively, if you know that all instances of the XML document will use the same namespace prefixes (which, you should understand, is NOT strictly necessary for them to be valid, even though it may commonly be the case in practice), then you can include the namespace prefixes in your XPath expressions (provided the document
loaded in namespace-aware mode):XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
XMLOPTIONS('*NAMESPACEAWARE')
QUERYNODES('//tut:SalesOrder')
QUERYNODESVALUE1(@SONumber)
In summary, each of the following queries work successfully with the namespace prefixed version of the document as shown above, providing the document
loaded in namespace-aware mode AND providing the actual namespace prefix used in the XML document matches that assumed in the queries:XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
XMLOPTIONS('*NAMESPACEAWARE')
QUERYNODES('//tut:SalesOrder/@SONumber')
XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
XMLOPTIONS('*NAMESPACEAWARE')
QUERYNODES('//tut:SalesOrder')
QUERYNODESVALUE1('tut:Customer/@CustNumber')
XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
XMLOPTIONS('*NAMESPACEAWARE')
QUERYNODES('//tut:Customer[@CustNumber="543"]')
QUERYNODESVALUE1('../@SONumber')
XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
XMLOPTIONS('*NAMESPACEAWARE')
QUERYNODES('//tut:SalesOrder[@SONumber="12345"]/tut:Line/tut:Part')
QUERYNODESVALUE1('@PartNumber')
QUERYNODESVALUE2('../tut:Quantity')
XML_QUERYXMLFILE('TUTorderNamespacePrefixed.xml')
XMLOPTIONS('*NAMESPACEAWARE')
QUERYNODES('//tut:Part[tut:Price>2.99]')
QUERYNODESVALUE1('../../@SONumber')
QUERYNODESVALUE2('@PartNumber')
QUERYNODESVALUE3('tut:Price')
QUERYNODESVALUE4('../tut:Quantity')
QUERYNODESVALUE5('tut:Price*../tut:Quantity')