We’ve joined forces with Smartlogic to reveal smarter decisions—together.

Good XML design and performance

MarkLogic has always tried to ensure that well-designed XML performs well “as is” in MarkLogic Server. For example, if your schema uses descriptive and unique element names, that is not only going to make your application code clean and readable, but fast as well. On the other hand, if your schema contains a lot of generic element names (such as “item”) used in multiple ways, then it’s going to make for harder-to-read code (in XQuery or XSLT), and it might also require you to do some extra leg work to get the best performance.

For example, consider a schema that has a lot of elements named <group> (or <section> or <item> or some other generic name) but which play very different roles—in this case indicated by the value of an attribute:

<doc>
  <group type="widget">
    <item type="sprocket">...</item>
    ...
  </group>
  <group type="employee">
    <item type="executive">...</item>
    ...
  </group>
  <group type="place">
    <item type="city">...</item>
    ...
  </group>
</doc>

Since MarkLogic indexes elements by their name, it is not automatically going to make a distinction between the various <group> elements you have, because they have the same name. That being said, certain queries will still run maximally fast, such as when you want to restrict your results to a particular attribute value, using a simple XPath expression like this: //group[@type eq 'widget']. MarkLogic Server will use its Universal Index to avoid reading any documents that don’t have a <group> element whose “type” attribute is equal to “widget”. So we’re okay so far.

But there are still a few issues here. For one thing, your code will not be very readable. This expression:

//group[@type eq 'widget']/item[@type eq 'sprocket']

is pretty noisy compared to, for example:

//widgets/sprocket

which is what your code would look like if you used more descriptive element names.

The other issue is that you may run into some problems when you want to start doing more advanced, for instance, word search in subsets of your documents. Specifically, if you want to restrict your search results to all group elements except widget groups, that will be challenging. (Fields can help you do the converse, but in that case you may have to enumerate all the ones you are interested in getting results for.)

Another issue with the above design is that, despite the potential benefit of being data-driven and extensible, it’s not possible to apply schema constraints that are unique to specific classes of <group> elements (at least in W3C XML Schemas). You can’t, for example, restrict the content of <group> elements to <sprocket> and <gear> elements only when its type attribute is “widget”. If you want different content models, then you need to use different element names. Starting off with generic <group> elements may lead you down a slippery slope. You’ll find yourself using other generic names like “item”, and even then you won’t be able to effectively restrict the “type” values to only the applicable ones.

Here’s what an arguably better (and more readable) schema design would look like:

<doc>
  <widgets>
    <sprocket>...</sprocket>
    ...
  </widgets>
  <employees>
    <executive>...</executive>
    ...
  </employees>
  <places>
    <city>...</city>
    ...
  </places>
</doc>

To conclude, there are lots of good reasons to use descriptive, unique element names whenever possible, and doing so plays nicely with human readers, XQuery, XSLT, XML Schemas, and MarkLogic Server.

Start a discussion

Connect with the community

STACK OVERFLOW

EVENTS

GITHUB COMMUNITY

Most Recent

View All

The Future Is Already Here — It’s Just Not Very Evenly Distributed

Visionaries have cracked the code to achieving data agility - and it involves active metadata. Read this post to learn more about the patterns they use.
Read Article

Being a Visionary Isn’t Always Easy

Gartner has recognized MarkLogic as a Visionary in the Magic Quadrant for Cloud Database Management Systems. Our VP of Strategy Chuck Hollis explains what being a “visionary” means – to us and our customers.
Read Article

Log4j: An Update On “LogJam”

Get answers about the potential impact of the internet-wide Log4j vulnerability on MarkLogic environments
Read Article
This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.