Progress Acquires MarkLogic! Learn More

Punctuation in XPath, part 4: predicates (“[…]”)

Back to blog
6 minute read
Back to blog
6 minute read

Posts in this series:
Punctuation in XPath, part 1: dot (“.”)
Punctuation in XPath, part 2: slash (“/”)
Punctuation in XPath, part 3: “@” and “..”
Punctuation in XPath, part 4: predicates (“[…]”)
Punctuation in XPath, part 5: “//”

We’ve already seen some examples of predicates that use square brackets (“[…]”). In this post, we’ll look at exactly how they work, using the following sample document:

declare variable $doc := document {

Predicates are used to filter a sequence based on some test. Consider the following expression:

$doc/people/group/person[. eq 'June']

This expression selects all <person> elements and then filters out those elements whose string-value is not equal to “June”. The test expression . eq 'June' must return true for the node to be included in the final result.

Positional Predicates

Predicates can also be used to select nodes at a particular position within the sequence. For example, this expression selects each first <person> child of its parent:


In this case, since there are two <group> elements, we end up with two people in the result: Peter and June. As you can see, a number value in a predicate is treated differently than a boolean. If the test expression returns a number (as in the above case), then the predicate is interpreted like this:

$doc/people/group/person[position() eq 1]

However, you shouldn’t think of “[1]” merely as syntax sugar for “[position() eq 1]”. Any expression that returns a number is evaluated this way. For example, the number could be returned by a function call or stored in a variable, as in this case:


If the value of “$var” is a number, then it is treated as a positional predicate. If it’s anything else, however, then it’s treated like a normal test expression, using the normal rules for converting values to a boolean. For example, an empty string or an empty sequence are converted to false.

What if you only want the first <person> among all the <person> elements in the document, rather than every first child? In that case, you’d have to apply the predicate to the whole expression to its left ($doc/people/group/person), rather than just the last step (person). This can be done by using parentheses:


In this case, the predicate is no longer a part of the “person” axis step. Instead, it filters the entire expression to its left, returning only Peter.

Forward and Reverse Axes

Whenever a predicate is part of an axis step, it is treated specially depending on which axis is being used. In particular, what position() returns inside a predicate is dependent on whether a forward or reverse axis is being used. For forward axes, positions are assigned using document order. For reverse axes, positions are assigned using reverse document order. As you may recall from the last article, $doc/people/group/person is actually short for:


Since the child:: axis is one of the forward axes, that means that position() is assigned in document order. Putting it into the context of the document above, that means the context positions for elements returned by the last step (person) are assigned as follows:

Node Context position
<person>Peter</person> 1
<person>Paul</person> 2
<person>Mary</person> 3
<person>June</person> 1
<person>Ward</person> 2
<person>Beaver</person> 3

Hence $doc/people/group/person[1] returns both Peter and June, as we saw above. The “person[1]” step is evaluated twice (once for each <group>), which is why the numbering restarts for June in the above table.

Things are different if we use one of the five reverse axes (the other eight axes are all forward axes):

  1. parent::
  2. ancestor::
  3. ancestor-or-self::
  4. preceding::
  5. preceding-sibling::

In axis steps that use one of these axes, the context positions are assigned in reverse document order. Let’s start with a node deep within the document:

declare variable $beaver := $doc/people/group/person[. eq 'Beaver'];

Starting from <person>Beaver</person>, we can select some node sequences that come before it, using the reverse axes:

Expression What/”who” it selects
$beaver/preceding::person Peter, Paul, Mary, June, and Ward
$beaver/preceding-sibling::person June and Ward
$beaver/ancestor::* <people> and <group>

If you were to then add a positional predicate to the step, it would select the first one in reverse document order. In other words, “[1]” selects the last node in document order.

Expression What/”who” it selects
$beaver/preceding::person[1] Ward
$beaver/preceding-sibling::person[1] Ward
$beaver/ancestor::*[1] <group>

Taking the first example, using the “preceding” axis, here are the context positions as they’re assigned, working backwards from <person>Beaver</person>:

Node Context position
<person>Ward</person> 1
<person>June</person> 2
<person>Mary</person> 3
<person>Paul</person> 4
<person>Peter</person> 5

It’s easy to see from this that $beaver/preceding::person[1] returns Ward, $beaver/preceding::person[2] returns June, etc.

Now, here’s the surprising part: axis steps always return nodes in document order. What? Didn’t we just see an example of them being returned in reverse document order? Well, no. What we saw was the context positions being assigned in reverse document order. The node sequence that is actually returned will still always be in document order. To prove this, we can take the predicate outside the step (again, by adding parentheses):

Expression What/”who” it selects
($beaver/preceding::person)[1] Peter
($beaver/preceding-sibling::person)[1] June
($beaver/ancestor::*)[1] <people>

In the above cases, the predicate is not a part of an axis step and so it doesn’t matter what expression is to the left. It is simply filtered in sequence order. In each case, the parenthesized expression returns a sequence of nodes in document order (because path expressions returning nodes always return nodes in document order).

This is true for axis steps in general, even if “/” isn’t used. If a context node is defined (as it normally is in XSLT), then (ancestor::*)[1] is a legal expression and always returns the outermost element ancestor of the context node (first in document order), whereas ancestor::*[1] always returns the parent element of the context node (first in reverse document order).


To understand positional predicates, you need to be clear about what sequence of nodes is being filtered and how the context positions are assigned. In the general case, context positions are assigned according to the order of the sequence being filtered. The exception to this is when the predicate is part of an axis step that uses a reverse axis.

I’ll leave you with a teaser for the next and final post in this series (about what “//” means): what nodes does the following expression select, and why?


Ready for the last part of the series? Go to Punctuation in XPath, part 5: “//” to learn what double-slashes mean in XPath.

Share this article

Read More

Related Posts

Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.


Poker Fun with XQuery

In this post, we dive into building a full five-card draw poker game with a configurable number of players. Written in XQuery 1.0, along with MarkLogic extensions to the language, this game provides examples of some great programming capabilities, including usage of maps, recursions, random numbers, and side effects. Hopefully, we will show those new to XQuery a look at the language that they may not get to see in other tutorials or examples.

All Blog Articles

Protecting passwords in ml-gradle projects

If you are getting involved in a project using ml-gradle, this tip should come in handy if you are not allowed to put passwords (especially the admin password!) in plain text. Without this restriction, you may have multiple passwords in your file if there are multiple MarkLogic users that you need to configure. Instead of storing these passwords in, you can retrieve them from a location where they’re encrypted using a Gradle credentials plugin.

All Blog Articles

Getting Started with Apache Nifi: Migrating from Relational to MarkLogic

Apache NiFi introduces a code-free approach of migrating content directly from a relational database system into MarkLogic. Here we walk you through getting started with migrating data from a relational database into MarkLogic

All Blog Articles

Sign up for a Demo

Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.

Request a Demo