Early on my MarkLogic learning path, one tool I found useful to understanding how MarkLogic evaluates queries is the xdmp:query-trace() function, as it has helped me understand why a query runs fast or slow. As an example, in my last post on Good XML design and performance, I claimed that the following query would run fast, by leveraging MarkLogic’s Universal Index:
//group[@type eq 'widget']
It sure seems like it should be fast, and based on what I had read about MarkLogic internals, it certainly sounds like it would. But just to be extra paranoid, I ran a test in Query Console. The first step was to generate some sample data, using the following query:
for $n in (1 to 300) return xdmp:document-insert(concat("/group",$n,".xml"), document { let $pos := ($n mod 3) + 1 let $type := ("widget","person","place")[$pos] return <group type="{$type}">stuff</group> } )
A third of the documents will contain a <group>
with type=”widget”, a third with type=”person”, and a third with type=”place”. After loading the documents, I ran my test query in conjunction with xdmp:query-trace()
:
xdmp:query-trace(true()), //group[@type eq 'widget']
Passing true()
to xdmp:query-trace()
tells the server to output information to the error log about how it plans to run any searchable expressions it encounters in the following code, specifically what constraints are used and how many fragments are selected from the index for filtering. What I wanted to make sure is that MarkLogic would retrieve only those documents that I was interested in. If it selected 300 fragments (all the docs I loaded), that means it would have to look in each one before filtering out two-thirds of them (the ones whose @type value is something other than “widget”). Instead, the number I wanted to see was 100 (just the “widget” ones). Looking in the error log, this is what I saw (not including the timestamp and line number info):
Analyzing path: fn:collection()/descendant::group[@type eq "widget"] Step 1 is searchable: fn:collection() Step 2 is searchable: descendant::group[@type eq "widget"] Path is fully searchable. Gathering constraints. Comparison contributed hash value constraint: group/@type = "widget" Step 2 predicate 1 contributed 1 constraint: @type eq "widget" Comparison contributed hash value constraint: group/@type = "widget" Step 2 predicate 1 contributed 1 constraint: @type eq "widget" Step 2 contributed 2 constraints: descendant::group[@type eq "widget"] Executing search. Selected 100 fragments to filter
Fortunately, I could tell from the output that the index magic was indeed doing its job, since it only selected 100 fragments (documents), i.e. the ones that contain “widget”. And I could see that the XPath predicate, @type eq 'widget'
, is successfully interpreted as a constraint that can be resolved from the index. Yay! I can write with confidence!
Being paranoid, I wanted to do another test so I used the following query to generate some sample data (very similar to the above one):
for $n in (1 to 300) return xdmp:document-insert(concat("/logfile",$n,".xml"), document { let $pos := ($n mod 3) + 1 let $host := concat("host",$pos) return <logfile host="{$host}"/> } )
Here’s the test query:
xdmp:query-trace(true()), //logfile[@host eq 'host1']
And here’s the line I saw (and was hoping to see) at the end of the Error Log:
Selected 100 fragments to filter
Because of the small data set, the two examples here are fast regardless of what constraint is used (resolvable from the index or not). But when I’m dealing with millions of documents, I want to make sure that I’m effectively using the index. Using a small test data set with xdmp:query-trace()
is one way to find out whether the index is being leveraged effectively, and thus whether my queries will scale.
Experimenting with xdmp:query-trace()
(and the related xdmp:plan() function) are great ways to learn from the “bottom up”. For “top-down” learning, I highly recommend Jason Hunter’s paper “Inside MarkLogic”.
What about you? What functions or tools have you found helpful for learning MarkLogic? Feel free to comment below.
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
In this post, we dive into building a full five-card draw poker game with a configurable number of players. Written in XQuery 1.0, along with MarkLogic extensions to the language, this game provides examples of some great programming capabilities, including usage of maps, recursions, random numbers, and side effects. Hopefully, we will show those new to XQuery a look at the language that they may not get to see in other tutorials or examples.
If you are getting involved in a project using ml-gradle, this tip should come in handy if you are not allowed to put passwords (especially the admin password!) in plain text. Without this restriction, you may have multiple passwords in your gradle.properties file if there are multiple MarkLogic users that you need to configure. Instead of storing these passwords in gradle.properties, you can retrieve them from a location where they’re encrypted using a Gradle credentials plugin.
Apache NiFi introduces a code-free approach of migrating content directly from a relational database system into MarkLogic. Here we walk you through getting started with migrating data from a relational database into MarkLogic
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.
Request a Demo