I often see the text() node test being used in much of the XQuery code I come across. It is, of course, perfectly legal and valid to use, and sometimes essential. But most often I see text() being used where the string() function should be used instead, making the code a sitting duck, waiting to be broken. Moreover, text() is rarely needed in simple queries. In my experience, it’s used inappropriately far more than it’s used appropriately, so I feel perfectly confident positing this: text() is a code smell. If you have a habit of using text() a lot, then read on.
The simplest way to explain what I mean is through an example. Consider the following query, which is intended to output one <li> element per <item> element in the input:
And consider some sample input (at “/test.xml”):
Running the above query yields exactly the desired result:
So what’s the problem? text() did just fine. Is it essential here, though? If we were to leave it out, using:
Then the <item> elements themselves would get copied to the result, which is not what we want:
Good point. But now let me throw a wrench in your code. Let’s say test.xml now looks like this:
Here is the new output from the original query that uses text():
Uh-oh. Now it’s doing what the code instructed (output one <li> per text node), but it’s not doing what we wanted. The presence of the comment (<!–was the second–>) caused the first <item> to contain not one, but two text nodes. No need to fret, we just need to fix the code. Let’s move the text() part to the expression inside the <li> element constructor:
Now we get the original desired output. Problem solved, right? …Not so fast. I’ve got another wrench:
Here’s what our newly fixed query outputs for the above document:
Where did the word “second” go? Well, it’s not a text node child of <item>, so it didn’t get copied through. Only the text node children of <item> were copied through, just as your code instructed.
You may be thinking: give me a break! You changed the data, and your code broke — big deal. That doesn’t mean text() is a code smell. After all, I can fix it again like this (by using
//text() instead of
To which I would have one question: do you always want to have to set yourself up for chasing down subtle bugs when seemingly innocuous changes are made to your data? There’s a more robust way. It’s called the string() function:
The string() function converts its argument to a string. In the case of a node, it returns the string-value of the node. In the case of an element (or a document node), the string-value of the node is the concatenation of the string-values of all its descendant text nodes. Lo and behold, that exactly what we were looking for! Now it doesn’t matter if any inline markup constructs — sub-elements, comments, or processing instructions — are added. Our code will continue to work as intended.
If you call string() with no arguments, the context node is taken to be the implicit, default argument. In other words,
string() is short for
string(.). That means you can use it as a direct replacement for
text() in many cases, and it will give you the desired result without being so easily breakable:
As I mentioned before, the text() node test (remember it’s not a function even though it looks like one) has its perfectly legitimate uses. Out of curiosity, I did a search on the code base for a production application I worked on that uses XQuery and XSLT code, and I only found two instances of text() in all of the view-related code. They were both cases where using text() was essential (and string() couldn’t be used instead). In contrast, there were many uses of string(). I mention this not in order to suggest that my code is absolutely exemplary (Ha!), but that I’m at least practicing what I’m preaching. If you’ve got a different view, feel free to share it in the comments section below!
Read about how name() is a code smell as well. And even more smelly.