Data Platform

ProgressBlogs When to be lazy, and when to be eager, when querying the database

When to be lazy, and when to be eager, when querying the database

by Bradley Mann

Posted on January 07, 2014 0 Comments

Generally, optimizations in the evaluator try to be lazy whenever they can when evaluating functions. This allows the application to process data only when needed and limits memory use to only what is needed. But sometimes, you need to evaluate eagerly to improve performance. Here we discuss one use case when it is best to specify eager evaluation.

Let’s say you need to write a script that collects all the values from a range index and groups them into “sets” of 1000. We may end up with something like this:

xquery version "1.0-ml";
declare namespace sample = "http://marklogic.com/sample";
let $groupsize := 1000
let $values := cts:element-values(xs:QName("sample:value"))
let $count := fn:count($values)

let $groups :=
    for $i in (0 to (xs:int($count div $groupsize) + 1))
    let $group := $values[(($i * $groupsize) + 1) to (($i + 1) * $groupsize)]
    return fn:string-join($group, "|")

return (fn:count($groups), $groups)

This query performs fine on a small set of values (5000), but when we increase the number of values pulled from the range index, we see that this call

let $group := $values[(($i * $groupsize) + 1) to (($i + 1) * $groupsize)]

quickly becomes the long pole in the tent. In fact, for a sample size of 50,000 values (50 groups), 91% of the execution time is taken by this one call, 2.3 seconds for just 50 calls. Increasing the sample size to values above 1,000,000 and it’s clear that this query will no longer run in a reasonable amount of time. So what’s going on here?

As it turns out, our cts:element-values() call is returning a stream, which is loaded lazily as needed. Therefore, fifty sequence accesses are actually fifty stream accesses, each time streaming the results from the first item (grabbing items at the end of the stream takes longer).

Because we are iterating through the groups, this is a situation where we want eager evaluation, ensuring that you get all of your data back in-memory all at once before you begin to iterate through the data. The function cts:element-values() has an “eager” options flag. Another easy “trick” that will reliably ensure you’re working with the full in-memory sequence is to simply drop into a “sub-flwor” statement to generate a sequence from the return value:

let $values := for $i in cts:element-values(xs:QName("sample:value")) return $i

Now, $values holds the entire resulting dataset and now accessing subsequences within it is much faster, particularly values at the end of the sequence.

Additional Resources

Take the MarkLogic University Performance Workshop to better understand query evaluation and performance hits
Read another discussion of lazy evaluation in MarkLogic

MarkLogic

Bradley Mann

View all posts from Bradley Mann on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.

Comments

Comments are disabled in preview mode.

Topics

More From Progress

Shadow Analytics: Why You Can’t Afford to Leave It Unchecked

Then, Now and Beyond: The Future of Back Office Software

2022 Progress Data Connectivity Report

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Country/Territory

Blog

MarkLogic

Semaphore

OpenEdge

DataDirect

Sitefinity

Telerik

Kendo UI

Corticon

DataDirect

MOVEit

Chef

Flowmon

Kemp LoadMaster

WhatsUp Gold

Telerik

Kendo UI

Fiddler

Test Studio

MOVEit

WS_FTP

When to be lazy, and when to be eager, when querying the database

Additional Resources

Bradley Mann

Comments

Topics

Sitefinity Training and Certification Now Available.

More From Progress

Latest Stories in Your Inbox

Latest Stories
in Your Inbox