Progress Acquires MarkLogic! Learn More

“Fragment”-ed thoughts

Back to blog
6 minute read
Back to blog
6 minute read

If you’ve been working with MarkLogic for any amount of time, you’ve probably come across the term “fragment.” For example, when running a query trace on your code, the result says things like “Selected 5 fragments to filter.” What exactly is a fragment? And why is it called that?

The simple answer is that, unless you’ve specifically enabled a feature called “fragmenting”, a fragment is the same thing as a document, the basic unit of storage in MarkLogic. In this case, when you see xdmp:plan() report that your query “selected 100 fragments,” it means that 100 candidate documents were found to match your query. (I say “candidate” because the final result set, unless you’re running an unfiltered search, may exclude some of those candidates after the filtering process.) So if you’re not using fragmenting, then you can just think “document” when you see the word “fragment”.

The general and consistent advice I’ve heard is to avoid using the fragmenting feature except when absolutely necessary. (Sometimes it may be useful to use fragmenting for the content of large books, e.g. broken up into chapters.) It’s much simpler to let fragmenting occur at the document level—in other words, to not break your documents into fragments. Under this generally advised scenario, you have one document fragment per document. Sometimes this means chopping up larger documents into smaller ones as a pre-processing step. With MarkLogic, it’s generally better to have a large number of small documents as opposed to a small number of large documents. The “Documents are Like Rows” section of Jason Hunter’s Inside MarkLogic paper gives two reasons for this:

First, locks are managed at the document level. A separate document for each item avoids lock contention. Second, all index, retrieval, and update actions happen at the fragment level. When finding an item, retrieving an item, or updating an item, that means it’s best to have each item in its own fragment. The easiest way to accomplish that is to put them in separate documents.

As it turns out, “document fragment” is just one kind of fragment—the kind you’ll most often be encountering. For example, try running the following query:


xdmp:plan is a pseudo-function (really a special form) that does not evaluate the expression you pass to it. Instead, it checks to see if the expression is searchable and, if so, constructs a query plan against index terms, runs the (unfiltered) query, and shows you an XML representation of the plan and how many fragments were selected. If you run the above query against a database with 100 documents (with fragmenting not enabled), then you’ll see this in the output:

<qry:query-plan xmlns:qry="">
  <qry:info-trace>Selected 100 fragments</qry:info-trace>
  <qry:result estimate="100"/>

The estimate is the same number you get when calling xdmp:estimate() (another special form) against the same expression.

As I alluded to above, there are other kinds of fragments: document properties and document locks. These also have their own XML representation, just like a normal document fragment. They are also associated with the document to which they apply by having the same URI. The difference is that they are accessed using different APIs. Whereas collection() and doc() return document fragments, they do not return document properties or locks. For those, you need to call other functions. For example, the following query will tell you how many document properties fragments are in your database:


Whereas this query will tell you how many document locks are currently in the database:


And this query will return the given document, its properties fragment, and its lock fragment (if the document is currently being locked):

let $uri := "/testDir/test.xml" return

Here’s the result I get from my database:

  <test>This is my document.</test>
  <prop:properties xmlns:prop="">
  <lock:lock xmlns:lock="">
        <lock:owner>Evan is editing this document</lock:owner>
        <sec:user-id xmlns:sec="">7071164303237443533</sec:user-id>

The first child of <result> is a copy of my document itself (the document fragment). The second child is a copy of the properties fragment. And the third is a copy of the lock fragment (which I previously acquired for the heck of it using xdmp:lock-acquire()). As you can see, all three of these are represented using XML. This means you can process them using the same functions and operators as you’d use on regular documents. Moreover, all three fragments have the same URI (“/testDir/test.xml”). This may seem strange, but it works out; the way you access the three kinds of fragments is different, so there’s no conflict.

There are of course many other functions for accessing and manipulating properties and lock fragments. The point here is that they exist, they’re stored as fragments, and they are accessed using different queries and functions than normal document fragments.

What about directories? Are they represented as fragments? Well, yes and no. There’s no separate type of “directory fragment.” However, directories are represented using none other than properties fragments! To prove this, all you have to do is get the properties document whose URI is a directory URI you know exists:


The result has an empty <prop:directory/> element which is a flag representing the fact that this properties fragment is actually a directory:

<prop:properties xmlns:prop="">

If directories are just properties fragments (and not document fragments), does that mean you could create a regular document using the same URI as a directory? Yes. But it’s not recommended.

Update: This article glosses over the distinction between directories and directory fragments. (Truth be told, I didn’t realize the difference at the time.) A directory is just a series of one or more steps in a slash-separated document URI. Directories are always indexed to support directory-related functions such as xdmp:directory() and cts:directory-query(). Directory fragments (as described near the end of this article) are an additional feature used to support WebDAV. They are not necessary for really anything else, and they will only exist if you create them or have your database configured to create them for you. (They also hamper scalability.) Check out Michael Blakeley’s excellent blog article explaining the distinction and nuances of each.

Share this article

Read More

Related Posts

Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.


Poker Fun with XQuery

In this post, we dive into building a full five-card draw poker game with a configurable number of players. Written in XQuery 1.0, along with MarkLogic extensions to the language, this game provides examples of some great programming capabilities, including usage of maps, recursions, random numbers, and side effects. Hopefully, we will show those new to XQuery a look at the language that they may not get to see in other tutorials or examples.

All Blog Articles

Protecting passwords in ml-gradle projects

If you are getting involved in a project using ml-gradle, this tip should come in handy if you are not allowed to put passwords (especially the admin password!) in plain text. Without this restriction, you may have multiple passwords in your file if there are multiple MarkLogic users that you need to configure. Instead of storing these passwords in, you can retrieve them from a location where they’re encrypted using a Gradle credentials plugin.

All Blog Articles

Getting Started with Apache Nifi: Migrating from Relational to MarkLogic

Apache NiFi introduces a code-free approach of migrating content directly from a relational database system into MarkLogic. Here we walk you through getting started with migrating data from a relational database into MarkLogic

All Blog Articles

Sign up for a Demo

Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.

Request a Demo