How to Model and Manage Entities With UML

With its flexible data model and ability to store multiple schemas simultaneously, MarkLogic is an excellent database for data integration. But organizations need more than the ability to store multiple schemas simultaneously. They also need to query against fixed, predictable aspects of the data that represent real-world things, or entities. Examples of entities include Customers and Orders, Trades and Counterparties, or Providers and Outcomes.

The problem with traditional, relational databases is that they have a static data model and can store only one schema. This makes managing entities extremely challenging. The context and meaning of data is trapped in database queries, application code, outdated application specs, and entity-relationship diagrams (ERDs)—everywhere except in the database. This means it is near impossible to make sense of your data, and you can find yourself asking:

What data accurately represents our Customer? What are the defining properties? How is it related to other entities? Which systems can generate customers? How are customers represented to applications? Which customers do not adhere to the business rules?

What you need is a better way to manage entities and the messy, changing data from which they are derived. The right place to begin is with a conceptual data model, which is really just a visual catalog that captures the shared understanding of your entities and relationships to make data easier to govern and program. In Marklogic’s Entity Services approach, the model comes first, and when data is ingested into MarkLogic, it is validated and reshaped in accordance with the model, adhering to the rules and policies intended by the model. Data owners define the model; developers automatically generate data transformations, validation rules, index configuration, and SQL views, all of which reduce errors and help accommodate inevitable change.

Model-Driven Approach to Data Management

Yes, Entity Services is a watershed model-driven approach to the management of data in MarkLogic. When I first played with it, I was surprised how little input I had to provide to reap a treasure chest of outputs

In this post, I am going to introduce the concept of using Unified Modeling Language (UML) notation to visually depict the model and explain how to seamlessly transform the UML model to MarkLogic’s Entity Services model descriptor format. In a subsequent post, I will use the model and MarkLogic’s Entity Services library to generate several desirable, and in some cases essential, artifacts: ingestion conversion code, version management code, a Template-Driven Extraction (TDE) definition, an XML schema, query options, gradle database configuration, and test data.

Model Defines the Design

Building the right model up front is more than a means to generate code. The act of composing the model forces the design team to think through the data conceptually. Although a NoSQL database lets you delay some of the data modeling work, it is still essential for the design team to understand the structure of that data upfront.

One reason for this is simple architectural discipline: NoSQL embaces numerous varieties of data, but it is as important as ever to understand and rigorously model the data’s structure. Further, during implementation, the design team frequently refers back to the model because it is the clearest expression of the data structure, particularly the relationships between entities. MarkLogic aids this process by offering several strategies to relate entities: containment, reference, semantic association, and others. The design team consults the model to determine, for a specific relationship, which strategy to use.

A model in this sense is a more fundamental deliverable than the XML or JSON model descriptor document that is the input to the Entity Services code generation facility.

For this reason, I like to begin with a visual class model drawn using UML. UML is an expressive notation, popular among architects, and considered very powerful for depicting data models. UML maps well to MarkLogic Entity Services: classes map to entities, attributes map to properties, and extra bits needed by Entity Services can be specified in UML using stereotypes in a custom profile.

The next figure shows the end-to-end flow. You begin by building the model in the third-party modeling tool of your choice (for example, MagicDraw or Visual Paradigm). UML is a visual notation; you draw the model in the tool as boxes and arrows. You then export the model to UML’s standard externalizable format, XML Metadata Interchange (XMI). You pass this XMI model through an XQuery transformation module, running on MarkLogic itself, to transform it to MarkLogic’s Entity Services model descriptor format. Once you have that descriptor, you can generate all the other artifacts.


In my next post I’ll explain by example the design and use of this workflow. As you will see, this solution is a combination of MarkLogic’s out-of-the-box Entity Services capabilities, an additional open-source toolkit, and the third-party UML editor. The secret sauce, as you will see, is UML class relationships.