Data Platform

ProgressBlogs Lazy Evaluation Magic

Lazy Evaluation Magic

by Joe Crean

Posted on October 23, 2017 0 Comments

Laziness is rarely ever viewed as a good quality, but when used appropriately, lazy evaluation can be a really great technique!

Lazy evaluation is the ability to process data only when needed. The example presented here shows how a call to cts:search() can be processed but actually evaluated at a later point when it is directly bound into an expression. This is particularly useful when we want to process a very long sequence (e.g. the search matches millions of documents) and we don’t really want to access all members of the sequence. Lazy Evaluation can be very useful however it is not always desired. For example, if we need to read out the contents of a range index, we may explicitly want to read all the contents of the index into memory. See When to be lazy, and when to be eager, when querying the database for an example of lazy evolution in action.

We recently implemented a common request for one of our customers: to produce a sequence of search results for “child” documents grouped by their “parent” document, and return only the most relevant hit for each parent. To clarify further, let’s imagine that the parent documents are chapters with tables of contents and the child documents are sections with the actual content. We want to search all sections and receive a list of chapters whose children match the query. This list is in descending relevance order. Each chapter with a hit should only appear once in the list.

Child documents can be identified by an element is-chapter containing a Boolean value set to false. Each child document has a parent chapter whose id is stored in the element chapter-id.

To help you visualize this, let’s look at some sample XML.

Here is a sample of a chapter document:

<document id="0001">
    <is-chapter>true</is-chapter>
    <content>...details....</content>
</document>

And here is a sample section document. Note that it has an element chapter-id which identifies the chapter to which it belongs:

<document id="0001-0001">
    <chapter-id>0001</chapter-id>
    <is-chapter>false</is-chapter>
    <content>...details...</content>
</document>

And here is an XQuery implementation of this requirement that makes use of lazy evaluation:

let $qtxt := "foo"
let $query := cts:and-query((
                  cts:element-range-query(xs:QName("is-chapter"),"=", "false"),
                  cts:word-query($qtxt)
                ))
(:extract the parent doc id for every hit:)
let $p := function($result as node()) { $result/chapter-id/fn:string()  }                        
let $options :=  (cts:score-order("descending"), "unfiltered" )                     
return
    fn:distinct-values( fn:map($p, cts:search(/document, $query, $options)  ) )[1 to 10]

In the and-query we are performing a word search in all “non-chapter” docs (is-chapter = false). We then set up the final call of the query which returns a list of parents of the matched children in most relevant order.

Profiling demonstrates how lightning-fast the query can be executed, and how an increased amount of distinct values (the chapter-id's)can be found with remarkably little overhead even though the actual search query could match many documents.

In order to get the first 10 distinct values in our test case, for example, it needed to look at the first 12 search hits (the inline function $p above is called 12 times). To retrieve the first 20 distinct values, it needed to examine the first 31 search hits.

But shouldn’t there be massive performance penalty for retrieving all search results and then extracting the distinct values? Luckily, lazy evaluation and some useful features of the MarkLogic Search API can prevent that.

The call to cts:search returns an iterator and the profiler shows that this function is called just once. The call to fn:map also returns an iterator (although a profiler bug shows this as being invoked many times, it is only invoked once). The function fn:distinct-values takes advantage of these iterators and uses them to fill up the sequence of unique values. Thanks to these iterators, the query does not need to initially retrieve all search results, and in turn is super-efficient!

In summary, lazy evolution can be a very useful technique when selectively processing large result sets!!!

MarkLogic

Joe Crean

View all posts from Joe Crean on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.

Comments

Comments are disabled in preview mode.

Topics

More From Progress

Shadow Analytics: Why You Can’t Afford to Leave It Unchecked

Then, Now and Beyond: The Future of Back Office Software

2022 Progress Data Connectivity Report

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Country/Territory

Blog

MarkLogic

Semaphore

OpenEdge

DataDirect

Sitefinity

Telerik

Kendo UI

Corticon

DataDirect

MOVEit

Chef

Flowmon

Kemp LoadMaster

WhatsUp Gold

Telerik

Kendo UI

Fiddler

Test Studio

MOVEit

WS_FTP

Lazy Evaluation Magic

Joe Crean

Comments

Topics

Sitefinity Training and Certification Now Available.

More From Progress

Latest Stories in Your Inbox

Latest Stories
in Your Inbox