Progress Acquires MarkLogic! Learn More

Lazy Evaluation Magic

Back to blog
3 minute read
Back to blog
3 minute read

Laziness is rarely ever viewed as a good quality, but when used appropriately, lazy evaluation can be a really great technique!

Lazy evaluation is the ability to process data only when needed. The example presented here shows how a call to cts:search() can be processed but actually evaluated at a later point when it is directly bound into an expression. This is particularly useful when we want to process a very long sequence (e.g. the search matches millions of documents) and we don’t really want to access all members of the sequence. Lazy Evaluation can be very useful however it is not always desired. For example, if we need to read out the contents of a range index, we may explicitly want to read all the contents of the index into memory. See When to be lazy, and when to be eager, when querying the database for an example of lazy evolution in action.

We recently implemented a common request for one of our customers: to produce a sequence of search results for “child” documents grouped by their “parent” document, and return only the most relevant hit for each parent. To clarify further, let’s imagine that the parent documents are chapters with tables of contents and the child documents are sections with the actual content. We want to search all sections and receive a list of chapters whose children match the query. This list is in descending relevance order. Each chapter with a hit should only appear once in the list.

Child documents can be identified by an element is-chapter containing a Boolean value set to false. Each child document has a parent chapter whose id is stored in the element chapter-id.

To help you visualize this, let’s look at some sample XML.

Here is a sample of a chapter document:

<document id="0001">

And here is a sample section document. Note that it has an element chapter-id which identifies the chapter to which it belongs:

<document id="0001-0001">

And here is an XQuery implementation of this requirement that makes use of lazy evaluation:

let $qtxt := "foo"
let $query := cts:and-query((
                  cts:element-range-query(xs:QName("is-chapter"),"=", "false"),
(:extract the parent doc id for every hit:)
let $p := function($result as node()) { $result/chapter-id/fn:string()  }                        
let $options :=  (cts:score-order("descending"), "unfiltered" )                     
    fn:distinct-values( fn:map($p, cts:search(/document, $query, $options)  ) )[1 to 10]

In the and-query we are performing a word search in all “non-chapter” docs (is-chapter = false). We then set up the final call of the query which returns a list of parents of the matched children in most relevant order.

Profiling demonstrates how lightning-fast the query can be executed, and how an increased amount of distinct values (the chapter-id's)can be found with remarkably little overhead even though the actual search query could match many documents.

In order to get the first 10 distinct values in our test case, for example, it needed to look at the first 12 search hits (the inline function $p above is called 12 times). To retrieve the first 20 distinct values, it needed to examine the first 31 search hits.

But shouldn’t there be massive performance penalty for retrieving all search results and then extracting the distinct values? Luckily, lazy evaluation and some useful features of the MarkLogic Search API can prevent that.

The call to cts:search returns an iterator and the profiler shows that this function is called just once. The call to fn:map also returns an iterator (although a profiler bug shows this as being invoked many times, it is only invoked once). The function fn:distinct-values takes advantage of these iterators and uses them to fill up the sequence of unique values. Thanks to these iterators, the query does not need to initially retrieve all search results, and in turn is super-efficient!

In summary, lazy evolution can be a very useful technique when selectively processing large result sets!!!

Share this article

Read More

Related Posts

Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.

Developer Insights

Multi-Model Search using Semantics and Optic API

The MarkLogic Optic API makes your searches smarter by incorporating semantic information about the world around you and this tutorial shows you just how to do it.

All Blog Articles
Developer Insights

Create Custom Steps Without Writing Code with Pipes

Are you someone who’s more comfortable working in Graphical User Interface (GUI) than writing code? Do you want to have a visual representation of your data transformation pipelines? What if there was a way to empower users to visually enrich content and drive data pipelines without writing code? With the community tool Pipes for MarkLogic […]

All Blog Articles
Developer Insights

Part 3: What’s New with JavaScript in MarkLogic 10?

Rest and Spread Properties in MarkLogic 10 In this last blog of the series, we’ll review over the new object rest and spread properties in MarkLogic 10. As mentioned previously, other newly introduced features of MarkLogic 10 include: The addition of JavaScript Modules, also known as MJS (discussed in detail in the first blog in this […]

All Blog Articles

Sign up for a Demo

Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.

Request a Demo