Quick Guide to XPath expressions for use with XMLQueryService

LANSA Integrator

Quick Guide to XPath expressions for use with XMLQueryService

There are many XPath resources available on the web.  To get you started, you could try the following:

 

This document does not intend or purport to provide a definitive description of or reference to XPath expression syntax.  However, for those readers who have not used XPath expressions before, this section will give a brief overview and examples that might help you get started with the XMLQueryService.  Refer to the following topics in this section:

  • ExampleXML
  • Introduction to XML Path Language (XPath)
  • XPath Examples for use with XMLQueryService
  • XML Namespaces and How They Affect XPath Expressions for XMLQueryService

 

Example XML

The examples provided later in this section will refer to the following simple example XML document:

<?xml version="1.0" encoding="UTF-8"?> 
 
<!DOCTYPE Orders SYSTEM "http://www.lansa.com/schemas/tutorder.dtd"  > 
 
<Orders> 
 
      <SalesOrder SONumber="12345">

         <Customer CustNumber="543">
            <CustName>ABC Industries</CustName> 
            <Street>123 North St.</Street> 
            <City>Bankstown</City> 
            <State>NSW</State> 
            <PostCode>2087</PostCode> 
         </Customer> 
 
         <OrderDate>2012-11-19</OrderDate> 
 
         <Line LineNumber="1"> 
            <Part PartNumber="123"> 
               <Description>Gasket Paper</Description> 
               <Price>9.95</Price> 
            </Part> 
            <Quantity>10</Quantity> 
         </Line> 
 
         <Line LineNumber="2"> 
            <Part PartNumber="456"> 
               <Description>Glue</Description> 
               <Price>13.27</Price> 
            </Part> 
            <Quantity>5</Quantity> 
         </Line> 
 
      </SalesOrder> 
 
</Orders>
 

Introduction to XML Path Language (XPath)

XPath is a syntax for constructing path expressions to select nodes in an XML document.  To some extent, these path expressions look very similar to path expressions you use when working with the file system on your computer.

In general, XPath recognises seven types of nodes, viz. element, attribute, text, namespace, processing-instruction, comment, and document nodes.  In the context of the XMLQueryService we are chiefly concerned with the element and attribute nodes and of course, the document node. 

In the example XML document shown above, some of the elements are <Orders>, <SalesOrder>, <Customer>, <OrderDate>, <Line> and <Part>, while the attributes include SONumber=, CustNumber=, LineNumber= and PartNumber=.

The following is an example XPath expression that will select the PartNumber= attribute of the first <Part> element in the second <Line> element in the first <SalesOrder> element of the example XML document:

/Orders/SalesOrder[1]/Line[2]/Part[1]/@PartNumber

 

Note that the selection of the <SalesOrder>, <Line> and <Part> elements in the above example are by ordinal index.  In particular, the selection of the <Line> element does NOT refer to the value of the LineNumber= attribute (although that is possible too, as you will see later).

XPath provides a large number of built-in functions that can manipulate and compare values in a variety of ways for more advanced usage.  For example, the following expression uses the contains built-in function to select all <Part> elements (wherever they occur) whose <Description> element contains the string "Paper":

//Part[contains(Description, "Paper")]

 

In XPath, you select a node or set of nodes, by following a path or steps.  Your XPath expression will often include one or of the following:

nodename

Selects all nodes with the specified name.

/

Selects from the root node

//

Selects nodes in the document from the current node that match the selection no matter where they are

.

Selects the current node

..

Selects the parent of the current node

@nodename

Selects attributes with the specified name

 

In XPath, a predicate is a sub-expression contained in square brackets that is used to select a specific node or a node that contains a specific value.  The following are some examples of XPath expressions that use predicates:

/Orders/SalesOrder[1]

Selects the first <SalesOrder> element that is a child of the <Orders> element.

/Orders/SalesOrder[last()]

Selects the last <SalesOrder> element that is a child of the <Orders> element.  (In the particular instance of the example XML document shown, there is only one <SalesOrder> element and so the result will be the same.)

//Part[Price<=10.00]

Selects <Part> elements, wherever they occur, whose <Price> element has a value less than or equal to 10.00.

 

There is much more to know about XPath expressions.  If you would like more information, you could start by referring to some of the links provided above.

 

Important note:  XML node names are case sensitive.  Your XPath expressions must specify the correct case when specifying element and attribute names.  For example, the expression '//salesorder' is NOT the same as '//SalesOrder'.  When used with the example XML document shown above, the former expression will FAIL to select ANY nodes, while the latter will select all <SalesOrder> elements, wherever they occur in the document.

 

XPath Examples for use with XMLQueryService

The following examples use XPath expressions in the parameters of the QUERY command of the XMLQueryService to select values from the example XML document shown above.

1.This example will select nothing because XML and XPath are case-sensitive and the wrong case is used to select the <SalesOrder> elements:

QUERY NODES(//SALESORDER) NODESVALUE1(@SONumber)

 

2.These two examples use alternate implementations to select all <SalesOrder> elements, and return the sales order number for each.  Functionally, they are equivalent (when used with the example XML document):

QUERY NODES(//SalesOrder/@SONumber)

 

QUERY NODES(//SalesOrder) NODESVALUE1(@SONumber)

 

3.Selects all <SalesOrder> elements, and returns the customer number for each:

QUERY NODES(//SalesOrder) NODESVALUE1(Customer/@CustNumber)

 

4.Selects <Customer> elements that have a value of '543' for their customer number and returns the sales order number of the parent <SalesOrder> element:

QUERY NODES(//Customer[@CustNumber="543"]) NODESVALUE1(../@SONumber)

 

5.Selects all <Part> elements for the <SalesOrder> element(s) with the order number specified and returns the part number and quantity for each:

QUERY NODES(//SalesOrder[@SONumber="12345"]/Line/Part)

     NODESVALUE1(@PartNumber)

     NODESVALUE2(../Quantity)

 

6.Selects all <Part> elements with a price greater than 2.99 and, for each, returns the order number, the part number, the price, the quantity and calculates and returns the extended value (price * quantity):

QUERY NODES(//Part[Price>2.99])

     NODESVALUE1(../../@SONumber)

     NODESVALUE2(@PartNumber)

     NODESVALUE3(Price)

     NODESVALUE4(../Quantity)

     NODESVALUE5(Price*../Quantity)

 

XML Namespaces and How They Affect XPath Expressions for XMLQueryService

The examples used so far operate on an XML document that contains no explicit namespace declarations and does not make use of namespace prefixes.  This is the simplest case, but frequently does not reflect the real world.

Consider this minor alteration to the example XML document that specifies a default namespace for the XML document:

<?xml version="1.0" encoding="UTF-8"?>

 

<!DOCTYPE Orders SYSTEM "http://www.lansa.com/schemas/tutorder.dtd"  >

 

<Orders xmlns="urn:schemas-lansa-com:tutorder.dtd">

 

     … etc …

 

</Orders>

 

Where a document makes use a single default namespace like this, the easiest approach to formulating XPath expressions for use with it is usually to disregard the namespace.  Since only one namespace is used and there are no namespace prefixes present on the node names, you can usually use the same expressions as you would use with the earlier example.  Each of the following queries work successfully with the example document that declares the default namespace, providing the document is not loaded in namespace-aware mode:

QUERY NODES(//SalesOrder/@SONumber)

 

QUERY NODES(//SalesOrder) NODESVALUE1(Customer/@CustNumber)

 

QUERY NODES(//Customer[@CustNumber="543"])

     NODESVALUE1(../@SONumber)

 

QUERY NODES(//SalesOrder[@SONumber="12345"]/Line/Part)

     NODESVALUE1(@PartNumber)

     NODESVALUE2(../Quantity)

 

QUERY NODES(//Part[Price>2.99])

     NODESVALUE1(../../@SONumber)

     NODESVALUE2(@PartNumber)

     NODESVALUE3(Price)

     NODESVALUE4(../Quantity)

     NODESVALUE5(Price*../Quantity)

 

However, in XML documents that use more than one namespace and/or implement namespace prefixes, things can get a little more complicated.  Consider the following alternate example XML document and contrast it to the earlier example:

<?xml version="1.0" encoding="UTF-8"?>

 

<!DOCTYPE Orders SYSTEM "http://www.lansa.com/schemas/tutorder.dtd"  >

 

<tut:Orders xmlns:tut="urn:schemas-lansa-com:tutorder.dtd">

 

      <tut:SalesOrder SONumber="12345">

 

         <tut:Customer CustNumber="543">

            <tut:CustName>ABC Industries</tut:CustName>

            <tut:Street>123 North St.</tut:Street>

            <tut:City>Bankstown</tut:City>

            <tut:State>NSW</tut:State>

            <tut:PostCode>2087</tut:PostCode>

         </tut:Customer>

 

         <tut:OrderDate>2012-11-19</tut:OrderDate>

 

         <tut:Line LineNumber="1">

            <tut:Part PartNumber="123">

               <tut:Description>Gasket Paper</tut:Description>

               <tut:Price>9.95</tut:Price>

            </tut:Part>

            <tut:Quantity>10</tut:Quantity>

         </tut:Line>

 

         <tut:Line LineNumber="2">

            <tut:Part PartNumber="456">

               <tut:Description>Glue</tut:Description>

               <tut:Price>13.27</tut:Price>

            </tut:Part>

            <tut:Quantity>5</tut:Quantity>

         </tut:Line>

 

      </tut:SalesOrder>

 

</tut:Orders>

 

This document contains a namespace declaration and uses the associated namespace prefix on the element names.  The use of namespace features and especially of namespace prefixes can complicate the syntax of the XPath expressions necessary for a given query.

Again you should refer to the many resources available on the web concerning XML namespaces and how they affect XPath.  One such reference is:

The easiest approach to formulating XPath expressions for use with such an instance document is to disregard the namespace(s).  If the document is loaded WITHOUT the namespace-aware option (the default mode), then you can use nearly the same expressions as you would use with the earlier example.  Each of the following queries work successfully with the namespace prefixed version of the document as shown above (note that the namespace prefix is omitted entirely from the XPath expressions):

QUERY NODES(/Orders/SalesOrder/@SONumber)

 

QUERY NODES(/Orders/SalesOrder) NODESVALUE1(Customer/@CustNumber)

 

QUERY NODES(/Orders/SalesOrder/Customer[@CustNumber="543"])

     NODESVALUE1(../@SONumber)

 

QUERY NODES(/Orders/SalesOrder[@SONumber="12345"]/Line/Part)

     NODESVALUE1(@PartNumber)

     NODESVALUE2(../Quantity)

 

QUERY NODES(/Orders/SalesOrder/Line/Part[Price>2.99])

     NODESVALUE1(../../@SONumber)

     NODESVALUE2(@PartNumber)

     NODESVALUE3(Price)

     NODESVALUE4(../Quantity)

     NODESVALUE5(Price*../Quantity)

 

If, however, your document declares more than one namespace, and, especially where there would be a namespace collision without the use of the namespaces, it may be necessary to load the document in namespace-aware mode.  This is done by specifying *YES for the NAMESPACEAWARE keyword on the LOAD command of the XMLQueryService.  For example:

LOAD FILE(salesorder.xml) NAMESPACEAWARE(*YES)

 

However, once the document is loaded in namespace-aware mode, the example queries shown up to this point will no longer function because now the namespace forms a part of the identification of nodes in the XML document.

There are a variety of ways to formulate your XPath expressions such that they will function in the way you require in namespace-aware mode and it is well beyond the scope of this document to attempt to cover all the options.  However, here are a few examples that might help to get you started:

1.  This example uses the local-name XPath built-in function to select nodes based on their local name (the node name WITHOUT the namespace prefix):

QUERY NODES(//*[local-name() = 'SalesOrder']) NODESVALUE1(@SONumber)

 

2.  If multiple namespaces are used and 'SalesOrder' is ambiguous in this context, then you can extend the previous example to use the namespace-uri XPath built-in function:

QUERY NODES(//*[local-name() = 'SalesOrder' and namespace-uri() = 'urn:schemas-lansa-com:tutorder.dtd'])

     NODESVALUE1(@SONumber)

 

3.  Alternatively, if you know that all instances of the XML document will use the same namespace prefixes (which, you should understand, is NOT strictly necessary for them to be valid, even though it may commonly be the case in practice), then you can include the namespace prefixes in your XPath expressions (provided the document is loaded in namespace-aware mode):

QUERY NODES(//tut:SalesOrder) NODESVALUE1(@SONumber)

 

In summary, each of the following queries work successfully with the namespace prefixed version of the document as shown above, providing the document is loaded in namespace-aware mode AND providing the actual namespace prefix used in the XML document matches that assumed in the queries:

QUERY NODES(//tut:SalesOrder/@SONumber)

 

QUERY NODES(//tut:SalesOrder) NODESVALUE1(tut:Customer/@CustNumber)

 

QUERY NODES(//tut:Customer[@CustNumber="543"])

     NODESVALUE1(../@SONumber)

 

QUERY NODES(//tut:SalesOrder[@SONumber="12345"]/tut:Line/tut:Part)

     NODESVALUE1(@PartNumber)

     NODESVALUE2(../tut:Quantity)

 

QUERY NODES(//tut:Part[tut:Price>2.99])

     NODESVALUE1(../../@SONumber)

     NODESVALUE2(@PartNumber)

     NODESVALUE3(tut:Price)

     NODESVALUE4(../tut:Quantity)

     NODESVALUE5(tut:Price*../tut:Quantity)