Processing Text Strings by Using String Functions

MSXML 5.0 SDK

Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - XPath Developer's Guide

Processing Text Strings by Using String Functions

The XPath string functions perform a wide variety of operations on text-string values.

In the following table, which summarizes the string functions, str represents a string passed as an argument; str*, a set of zero or more strings, delimited by commas; obj, an object of some arbitrary type, such as node-set or number; and num, an integer number. The character ? appended to any of these argument types means that the argument is completely optional. If a function has only a single optional argument which is omitted in a given call, the function is assumed to apply to the context node.

Format Description/Example
string(obj?) Converts the argument to a string value, which is then returned from the function.
concat(str, str, str*) Concatenates the various strings passed to it into a single string, which is returned from the function.
starts-with(str, str) Returns "true" if the first argument starts with the second, otherwise "false".
contains(str, str) Returns "true" if the first argument contains the second, otherwise "false".
substring(str, num, num?) Extracts a portion of the first argument, starting with the position supplied by the second argument, for a length of however many characters are in the third argument (if there is one). If the third argument is omitted, the function simply returns all characters in the first argument, starting at the position supplied by the second.
substring-before(str, str) Returns the portion of the first argument that precedes the value of the second argument.
substring-after(str, str) Returns the portion of the first argument that follows the value of the second argument.
string-length(str?) Returns the number of characters in the argument. If the argument is omitted, returns the number of characters in the current node.
normalize-space(str?) Examines the argument and strips out leading and trailing white space in it; also removes extraneous white space within the argument by replacing two or more occurrences of white space with a single space. The value returned by the function is this "stripped" string.
translate(str, str, str) Returns the first argument, replacing each occurrence of a character that matches one of the characters in the second argument with the character in the corresponding position in the third argument.

string(obj?)

When applied to the root element of our sample document, the following XPath expression:

string(*)

returns a single string containing a (space-delimited ) list of the entire document's text contents, or "374 12500.26 512 17692 161 8349.72 465 15239.6".

concat(str, str, str*)

We could produce a list of the regions in the sample document, including a label which varies by the region's position in the document, using an XSLT template rule such as:

<xsl:template match="sales">
    <xsl:for-each select="region">
        <h3>
            <xsl:value-of select="concat('Region ', string(position()), ': ', @name)"/>
        </h3>
    </xsl:for-each>
</xsl:template>
Note   The above template rule forces the value of the position() function, normally a number, to be converted to a string. In practice this is not necessary, since numbers are coerced to strings when needed by all these string-handling features. For more information about the position() function, see Processing Node-Sets by Using Node-Set Functions.

When viewed in Internet Explorer, this template rule displays:

Region 1: Northeast
Region 2: Southeast
Region 3: Southwest
Region 4: Northwest

starts-with(str, str)

This XPath location path:

//region[starts-with(@name, 'S')]

locates all <region> elements in the document whose name attributes start with the letter "S", or in this case the Southeast and Southwest regions.

contains(str, str)

The location path:

//region[contains(@name, 'east')]

locates all <region> elements in the document whose name attributes contain the string "east", or in this case the Northeast and Southeast regions.

substring(str, num, num?)

We could derive region name abbreviations for the four regions in our sample document using the following XSLT template rule:

<xsl:template match="region">
    <h3><xsl:value-of select="substring(@name, 1, 1)"/>
    <xsl:choose>
        <xsl:when test="contains(@name, 'west')">W</xsl:when>
        <xsl:otherwise>E</xsl:otherwise>
    </xsl:choose></h3>
</xsl:template>

The call to the substring() function returns the first letter in the <region> element's name attribute. The <xsl:choose> block then displays either a "W" or an "E" depending on whether the name contains the string "west" or not, respectively.

Internet Explorer displays the results of this template rule as:

NE
SE
SW
NW

substring-before(str, str)

To display whether a region lies in the northern or southern areas, you could use the substring-before() function as shown in the following sample.

Example

XML File (xpathfuncs.xml)

Change the href attribute in xpathfuncs.xml (shown in Sample XML Data File for XPath Functions) to reference funcsubstringbef.xsl.

XSLT File (funcsubstringbef.xsl)

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- suppress text nodes not covered in subsequent template rule -->
<xsl:template match="text()"/>

<xsl:template match="region">
    <h3>Region "<xsl:value-of select="@name"/>" is in the 
        <xsl:choose>
            <xsl:when test="contains(@name, 'west')">
                <xsl:value-of select="substring-before(@name, 'west')"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="substring-before(@name, 'east')"/>
            </xsl:otherwise>
        </xsl:choose>
    </h3>
</xsl:template>

</xsl:stylesheet>

Formatted Output

Region "Northeast" is in the North
Region "Southeast" is in the South
Region "Southwest" is in the South
Region "Northwest" is in the North

Processor Output

<?xml version="1.0" encoding="UTF-16"?><h3>Region "Northeast" is in the 
        North</h3><h3>Region "Southeast" is in the 
        South</h3><h3>Region "Southwest" is in the 
        South</h3><h3>Region "Northwest" is in the 
        North</h3>

substring-after(str, str)

You could invert the example used above, for the substring-before() function, to display whether a region is located in the eastern or western sales districts. To do so, you could use an XSLT template rule such as shown in the sample below.

In this example, unlike the parallel one for substring-before(), the name of the district ("east" or "west") is not capitalized. If you wanted to capitalize the district name, you could use the translate() function for this purpose.

Example

XML File (xpathfuncs.xml)

Change the href attribute in xpathfuncs.xml (shown in Sample XML Data File for XPath Functions) to reference funcsubstringaft.xsl.

XSLT File (funcsubstringaft.xsl)

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- suppress text nodes not covered in subsequent template rule -->
<xsl:template match="text()"/>

<xsl:template match="region">
    <h3>Region "<xsl:value-of select="@name"/>" is in the 
        <xsl:choose>
            <xsl:when test="contains(@name, 'North')">
                <xsl:value-of select="substring-after(@name, 'North')"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="substring-after(@name, 'South')"/>
            </xsl:otherwise>
        </xsl:choose>ern district
    </h3>
</xsl:template>

</xsl:stylesheet>

Formatted Output

Region "Northeast" is in the eastern district
Region "Southeast" is in the eastern district
Region "Southwest" is in the western district
Region "Northwest" is in the western district

Processor Output

<?xml version="1.0" encoding="UTF-16"?><h3>Region "Northeast" is in the 
        eastern district
    </h3><h3>Region "Southeast" is in the 
        eastern district
    </h3><h3>Region "Southwest" is in the 
        western district
    </h3><h3>Region "Northwest" is in the 
        western district
    </h3>

string-length(str?)

Unlike most XPath string functions, string-length() returns a number instead of a string. This number can be used as-is, or (as with other function references) passed as an argument to some other function.

In our sample XML document, notice that the value of some <units> contains a decimal portion, while the others contain integer values. Using this observation and some of the other XPath string functions in conjunction with string-length(), we could select regions which failed to sell an amount of least 10,000 with a template rule as shown in the XSLT sample file below.

First, the template rule defines a variable, amt_integ. This variable will contain either the value of the <amount> element, or that value less the decimal portion, depending on whether or not the value contains a decimal point. The template rule then tests this resulting "integer amount" to see if its string-length is less than 5—that is, if the value of the amt_integ variable is 9999 or less. If so, the region's name and the value of the <amount> element are displayed.

In this particular example, you could have used a much simpler and more direct numeric test against the value of the <amount> element. For example:

    <xsl:if test="amount &lt; 10000">

This direct numeric test would obviate the need to use the various string functions demonstrated in this example.

Example

XML File (xpathfuncs.xml)

Change the href attribute in xpathfuncs.xml (shown in Sample XML Data File for XPath Functions) to reference funcstringlen.xsl.

XSLT File (funcstringlen.xsl)

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="region">
    <xsl:variable name="amt_integ">
        <xsl:choose>
            <xsl:when test="contains(amount, '.')">
                <xsl:value-of select="substring-before(amount, '.')"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="amount"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:variable>
    <xsl:if test="string-length($amt_integ)&lt;5">
        <h3>
            <xsl:value-of select="@name"/> Region's sales only
            <xsl:value-of select="amount"/> this quarter.
        </h3>
    </xsl:if>
</xsl:template>

</xsl:stylesheet>

Formatted Output

Southwest Region's sales only 8349.72 this quarter.

Processor Output

<?xml version="1.0" encoding="UTF-16"?><h3>Southwest Region's sales only
            8349.72 this quarter.
        </h3>

normalize-space(str?)

This XPath string function is particularly useful when comparing two strings which possibly contain leading or trailing white space, especially newlines, which you might otherwise overlook in a source document.

For instance, consider the following two <sample> elements:

<sample>Now is the time...
</sample>

and:

<sample>Now is the time...</sample>

Any template in the following <xsl:if> element will not be instantiated:

<xsl:if test="sample[1]=sample[2]">

The trailing newline in the first <sample> is part of its content, so its string-value does not equal that of the second <sample>. For an accurate comparison between the two elements' contents, use this instead:

<xsl:if test="normalize-space(sample[1])=normalize-space(sample[2])">

translate(str, str, str)

To convert the names of the regions in our sample document to all uppercase, you could use the following:

translate(//region/@name, "abcdefghijklmnopsqrtuvwxyz", 
"ABCDEFGHIJKLMNOPSQRTUVWXYZ")

The translate() function scans the first argument—the value of the given <region> element's name attribute, in this case—looking for any of the individual characters in the second argument. For each matched character, the function substitutes the character from the third argument in the same position as the matched character. So each "a" becomes "A", each "b" becomes "B", and so on. The translated result is returned by the function, giving (in this case) "NORTHEAST", "SOUTHEAST", and so on.

Note   Any characters which do not appear in the second argument, such as the capital "N" and "S" in the various region names, are retained in the resulting string with no substitution made.
Also note that if the second argument contains more characters than the third, the effect will be to remove the missing characters in the returned value. For instance:
translate("Internet", "nter", "NTE")
will substitute a capital "N", "T", and "E" for the corresponding characters in the second string, but replace the lowercase "r" with nothing at all, i.e. remove it from the input string. The result is:
INTENET

Example

XML File (xpathfuncs.xml)

Change the href attribute in xpathfuncs.xml (shown in Sample XML Data File for XPath Functions) to reference functranslate.xsl.

XSLT File (functranslate.xsl)

<?xml version='1.0'?>
<xsl:stylesheet version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="sales">
   <xsl:value-of select='translate(//region/@name, 
"abcdefghijklmnopsqrtuvwxyz", "ABCDEFGHIJKLMNOPSQRTUVWXYZ")'/>
</xsl:template>

</xsl:stylesheet>

Formatted Output

NORTHEAST

Processor Output

<?xml version="1.0" encoding="UTF-16"?>NORTHEAST