We’ve joined forces with Smartlogic to reveal smarter decisions—together.

Design Patterns: The Triple Provenance Pattern

MarkLogic design patterns are reusable solutions for many of the commonly occurring problems encountered when designing MarkLogic applications. These patterns may be unique to applications on MarkLogic or may be industry patterns that have MarkLogic specific considerations. Unlike recipes, MarkLogic design patterns are generally more abstract and applicable in multiple scenarios.

Triple Provenance with Document Annotations Design Pattern


Semantics applications often need to capture provenance information at the triple-level.

Using the Envelope Pattern, annotate JSON/XML serialization of triples with provenance details.


When building applications that leverage data from disparate sources, especially in a semantics context, it is common to want to capture provenance information, such as source and last updated time. With RDF alone, reification (i.e. statements about statements, see Reification on Semantic web) is a technique that can be used, but it results in a significant expansion in the number of triples needed and can greatly complicate SPARQL queries.

A solution that can provide provenance details for a triple without the added complexity of reification would be ideal. Fortunately, triples stored on documents in MarkLogic can take advantage of their serialization as JSON and XML to provide an additional level of context. This is achieved through additional metadata on the document, specifically on the triple objects.


This pattern is applicable when you need to capture triple-level provenance details. This pattern requires that triples be persisted on documents, not using MarkLogic Managed Triples. This pattern is suitable for cases where the provenance details does not need to be returned directly as part of a SPARQL query but rather it is acceptable to retrieve it off of the document.


The participants involved implementing this pattern are as follows:

  • Update code for annotating triples
  • Retrieval code for getting provenance detail for triple

Examples of each can be found under Sample Code below.


The retrieval code must be aware of how the update code has persisted the provenance details.


This pattern enables the persistence of provenance details for a given triple by storing annotations on triples serialized in JSON or XML. Retrieval of provenance details is facilitated through use of JavaScript or XQuery to path into documents, identify matching target triple and return the annotations.

To take advantage of this pattern, you cannot use Managed Triples and need to add provenance annotations during document insertion / update or prior to ingestion. You must also be able to identify the document where the triple resides.

This can be achieved through use of an identity triple that links the subject IRI to the URI of the document:

const subject = sem.iri("http://marklogic.com/resources/myEntity");
const uri = sem.iri("/content/myEntity.json");
sem.triple(subject,sem.iri("http://www.w3.org/2000/01/rdf-schema#isDefinedBy"), uri);

A trade-off using this pattern is that you cannot use pure SPARQL to get to the provenance details.


If you are implementing this pattern, it is important that there is a consistent process for adding and retrieving provenance details.

If your application uses Template Driven Extraction (TDE), you can wrap elements/properties you would like to annotate with provenance details like this:

    "metadata": [
            "propertyWrapper": {
                "systemOwner": "Joe Smith",
                "source": "DB1",
                "updateTime": "2017-05-17T14:17:38.786Z"
            "propertyWrapper": {
                "id": "ABC",
                "source": "DB1",
                "updateTime": "2017-05-17T14:17:38.786Z"

Here’s a sample TDE template:

 "template": {
 "context": "/metadata",
 "vars": [
 "name": "prefix-subjects",
 "val": "'http://example.org/subjects'"
 "name": "prefix-predicates",
 "val": "'http://example.org/predicates'"
 "name": "doc-id",
 "val": "sem:iri($prefix-subjects || '/' || fn:root(.)/metadata/propertyWrapper/id/fn:string())"
 "triples": [
 "subject" : {"val" : "$doc-id"},
 "predicate" : {"val" : "sem:iri($prefix-predicates || '/id')"},
 "object" : { "val" : "propertyWrapper/id/fn:string()", "invalidValues" : "ignore"}
 "subject" : {"val" : "$doc-id"},
 "predicate" : {"val" : "sem:iri($prefix-predicates || '/systemOwner')"},
 "object" : { "val" : "propertyWrapper/systemOwner/fn:string()", "invalidValues" : "ignore"},

Sample Code

On XML documents this can be most easily achieved by using attributes on the triples:

declare namespace prov = "http://marklogic.com/designPatterns/prov";
declare function prov:add-provenance($triple as sem:triple, $source as xs:string, $timestamp as xs:dateTime) {
  element sem:triple {
    attribute source {$source},
    attribute updatedTime {$timestamp},
    document {  $triple  }/element()/node()
let $triple := sem:triple(sem:iri("myIri"),sem:iri("myProperty"),"a value")
prov:add-provenance($triple,"DB", fn:current-dateTime())

Here is the sample approach in JSON, but instead of using attributes, we instead add properties to the triple object:

function addProvenance(triple, source, timestamp) {
 const t = xdmp.toJSON(triple).toObject();
 t.triple.source = source;
 t.triple.timestamp = timestamp;
 return t;
const t = sem.triple(sem.iri("JSON Annotation"), sem.iri("testProp"), "value");
addProvenance(t, "myDB", new Date().toJSON());

Here is an example of how you might retrieve the provenance details:

function getProvenance(uri, predicate, value) {
 const triple = fn.head(cts.doc(uri).xpath(`/triples/triple[predicate = '${predicate}' and object/value eq '${value}']`));
 const result = {};
 result.source = triple.source;
 result.timestamp = triple.timestamp;
 return result;

Related Patterns

Envelope Pattern


With triples alone, it can be challenging to capture provenance details without introducing complexity that negatively impacts the usability of your triples and query performance. Through use of MarkLogic’s multi-model support, we are able to take advantage of embedding triples on documents with annotations that provide additional context and can be retrieved easily using a small amount of JavaScript or XQuery code.

For more information on additional ways to take advantage of embedding triples on documents, see the Semantics guide and the chapter on Unmanaged Triples.

Start a discussion

Connect with the community




Most Recent

View All

Facts and What They Mean

In the digital era, data is cheap, interpretations are expensive. An agile semantic data platform combines facts and what they mean to create reusable organizational knowledge.
Read Article

Truth in ESG Labels

Managing a portfolio of investments for your client has never been simple - and doing so through an ESG lens raises the complexity to an almost mind-boggling level. Learn the signs your team has hit the wall with current tools - and how a semantic knowledge graph can help.
Read Article

4 Signs You’ve Got a Transaction Reconciliation Challenge

Many firms manage transaction reconciliation using smart people armed with spreadsheets - but that doesn't scale well. Learn what to look for, to know if you're creating new forms of risk for your firm.
Read Article
This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.