Progress Acquires MarkLogic! Learn More

When to be lazy, and when to be eager, when querying the database

Back to blog
2 minute read
Back to blog
2 minute read

Generally, optimizations in the evaluator try to be lazy whenever they can when evaluating functions. This allows the application to process data only when needed and limits memory use to only what is needed. But sometimes, you need to evaluate eagerly to improve performance. Here we discuss one use case when it is best to specify eager evaluation.

Let’s say you need to write a script that collects all the values from a range index and groups them into “sets” of 1000. We may end up with something like this:

xquery version "1.0-ml";
declare namespace sample = "";
let $groupsize := 1000
let $values := cts:element-values(xs:QName("sample:value"))
let $count := fn:count($values)

let $groups :=
    for $i in (0 to (xs:int($count div $groupsize) + 1))
    let $group := $values[(($i * $groupsize) + 1) to (($i + 1) * $groupsize)]
    return fn:string-join($group, "|")

return (fn:count($groups), $groups)

This query performs fine on a small set of values (5000), but when we increase the number of values pulled from the range index, we see that this call

let $group := $values[(($i * $groupsize) + 1) to (($i + 1) * $groupsize)]

quickly becomes the long pole in the tent. In fact, for a sample size of 50,000 values (50 groups), 91% of the execution time is taken by this one call, 2.3 seconds for just 50 calls. Increasing the sample size to values above 1,000,000 and it’s clear that this query will no longer run in a reasonable amount of time. So what’s going on here?

As it turns out, our cts:element-values() call is returning a stream, which is loaded lazily as needed. Therefore, fifty sequence accesses are actually fifty stream accesses, each time streaming the results from the first item (grabbing items at the end of the stream takes longer).

Because we are iterating through the groups, this is a situation where we want eager evaluation, ensuring that you get all of your data back in-memory all at once before you begin to iterate through the data. The function cts:element-values() has an “eager” options flag. Another easy “trick” that will reliably ensure you’re working with the full in-memory sequence is to simply drop into a “sub-flwor” statement to generate a sequence from the return value:

let $values := for $i in cts:element-values(xs:QName("sample:value")) return $i

Now, $values holds the entire resulting dataset and now accessing subsequences within it is much faster, particularly values at the end of the sequence.


Additional Resources

Share this article

Read More

Related Posts

Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.


Poker Fun with XQuery

In this post, we dive into building a full five-card draw poker game with a configurable number of players. Written in XQuery 1.0, along with MarkLogic extensions to the language, this game provides examples of some great programming capabilities, including usage of maps, recursions, random numbers, and side effects. Hopefully, we will show those new to XQuery a look at the language that they may not get to see in other tutorials or examples.

All Blog Articles

Protecting passwords in ml-gradle projects

If you are getting involved in a project using ml-gradle, this tip should come in handy if you are not allowed to put passwords (especially the admin password!) in plain text. Without this restriction, you may have multiple passwords in your file if there are multiple MarkLogic users that you need to configure. Instead of storing these passwords in, you can retrieve them from a location where they’re encrypted using a Gradle credentials plugin.

All Blog Articles

Getting Started with Apache Nifi: Migrating from Relational to MarkLogic

Apache NiFi introduces a code-free approach of migrating content directly from a relational database system into MarkLogic. Here we walk you through getting started with migrating data from a relational database into MarkLogic

All Blog Articles

Sign up for a Demo

Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.

Request a Demo