Progress Acquires MarkLogic! Learn More
BLOG ARTICLE

Warning: text() is a code smell

Back to blog
01.10.2012
5 minute read
Back to blog
01.10.2012
5 minute read

I often see the text() node test being used in much of the XQuery code I come across. It is, of course, perfectly legal and valid to use, and sometimes essential. But most often I see text() being used where the string() function should be used instead, making the code a sitting duck, waiting to be broken. Moreover, text() is rarely needed in simple queries. In my experience, it’s used inappropriately far more than it’s used appropriately, so I feel perfectly confident positing this: text() is a code smell. If you have a habit of using text() a lot, then read on.

The simplest way to explain what I mean is through an example. Consider the following query, which is intended to output one <li> element per <item> element in the input:

<ul>{
  for $item in doc("/test.xml")/list/item/text()
  return <li>{$item}</li>
}</ul>

And consider some sample input (at “/test.xml”):

xdmp:document-insert("/test.xml",
<list>
  <item>My first item</item>
  <item>My second item</item>
  <item>My third item</item>
</list>
)

Running the above query yields exactly the desired result:

<ul>
  <li>My first item</li>
  <li>My second item</li>
  <li>My third item</li>
</ul>

So what’s the problem? text() did just fine. Is it essential here, though? If we were to leave it out, using:

<ul>{
  for $item in doc("/test.xml")/list/item(:/text():)
  return <li>{$item}</li>
}</ul>

Then the <item> elements themselves would get copied to the result, which is not what we want:

<ul>
  <li>
    <item>My first item</item>
  </li>
  <li>
    <item>My second item</item>
  </li>
  <li>
    <item>My third item</item>
  </li>
</ul>

Good point. But now let me throw a wrench in your code. Let’s say test.xml now looks like this:

xdmp:document-insert("/test.xml",
<list>
  <item>My first<!--was the second--> item</item>
  <item>My second item</item>
  <item>My third item</item>
</list>
)

Here is the new output from the original query that uses text():

<ul>
  <li>My first</li>
  <li> item</li>
  <li>My second item</li>
  <li>My third item</li>
</ul>

Uh-oh. Now it’s doing what the code instructed (output one <li> per text node), but it’s not doing what we wanted. The presence of the comment (<!–was the second–>) caused the first <item> to contain not one, but two text nodes. No need to fret, we just need to fix the code. Let’s move the text() part to the expression inside the <li> element constructor:

<ul>{
  for $item in doc("/test.xml")/list/item
  return <li>{$item/text()}</li>
}</ul>

Now we get the original desired output. Problem solved, right? …Not so fast. I’ve got another wrench:

xdmp:document-insert("/test.xml",
<list>
  <item>My first item</item>
  <item>My <em>second</em> item</item>
  <item>My third item</item>
</list>
)

Here’s what our newly fixed query outputs for the above document:

<ul>
  <li>My first item</li>
  <li>My  item</li>
  <li>My third item</li>
</ul>

Where did the word “second” go? Well, it’s not a text node child of <item>, so it didn’t get copied through. Only the text node children of <item> were copied through, just as your code instructed.

You may be thinking: give me a break! You changed the data, and your code broke — big deal. That doesn’t mean text() is a code smell. After all, I can fix it again like this (by using //text() instead of /text()):

<ul>{
  for $item in doc("/test.xml")/list/item
  return <li>{$item//text()}</li>
}</ul>

To which I would have one question: do you always want to have to set yourself up for chasing down subtle bugs when seemingly innocuous changes are made to your data? There’s a more robust way. It’s called the string() function:

<ul>{
  for $item in doc("/test.xml")/list/item
  return <li>{string($item)}</li>
}</ul>

The string() function converts its argument to a string. In the case of a node, it returns the string-value of the node. In the case of an element (or a document node), the string-value of the node is the concatenation of the string-values of all its descendant text nodes. Lo and behold, that exactly what we were looking for! Now it doesn’t matter if any inline markup constructs — sub-elements, comments, or processing instructions — are added. Our code will continue to work as intended.

If you call string() with no arguments, the context node is taken to be the implicit, default argument. In other words, string() is short for string(.). That means you can use it as a direct replacement for text() in many cases, and it will give you the desired result without being so easily breakable:

<ul>{
  for $item in doc("/test.xml")/list/item
  return <li>{$item/string()}</li>
}</ul>

As I mentioned before, the text() node test (remember it’s not a function even though it looks like one) has its perfectly legitimate uses. Out of curiosity, I did a search on the code base for a production application I worked on that uses XQuery and XSLT code, and I only found two instances of text() in all of the view-related code. They were both cases where using text() was essential (and string() couldn’t be used instead). In contrast, there were many uses of string(). I mention this not in order to suggest that my code is absolutely exemplary (Ha!), but that I’m at least practicing what I’m preaching. If you’ve got a different view, feel free to share it in the comments section below!

Additional Resources

Read about how name() is a code smell as well. And even more smelly.

Share this article

Read More

Related Posts

Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.

Tutorial

Poker Fun with XQuery

In this post, we dive into building a full five-card draw poker game with a configurable number of players. Written in XQuery 1.0, along with MarkLogic extensions to the language, this game provides examples of some great programming capabilities, including usage of maps, recursions, random numbers, and side effects. Hopefully, we will show those new to XQuery a look at the language that they may not get to see in other tutorials or examples.

All Blog Articles
Tutorial

Protecting passwords in ml-gradle projects

If you are getting involved in a project using ml-gradle, this tip should come in handy if you are not allowed to put passwords (especially the admin password!) in plain text. Without this restriction, you may have multiple passwords in your gradle.properties file if there are multiple MarkLogic users that you need to configure. Instead of storing these passwords in gradle.properties, you can retrieve them from a location where they’re encrypted using a Gradle credentials plugin.

All Blog Articles
Tutorial

Getting Started with Apache Nifi: Migrating from Relational to MarkLogic

Apache NiFi introduces a code-free approach of migrating content directly from a relational database system into MarkLogic. Here we walk you through getting started with migrating data from a relational database into MarkLogic

All Blog Articles

Sign up for a Demo

Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.

Request a Demo