Progress Acquires MarkLogic! Learn More
Built-in RDF Triple Store for More Connected Data

Marklogic Semantics

As a multi-model database, MarkLogic combines the benefits of a document store and an RDF Triple Store. This approach is ideal for integrating and accessing all of your data. JSON and XML documents provide incredible flexibility for modeling entities, while RDF triples — the data format for semantic graph data — are ideal for storing relationships. MarkLogic Semantics is a great data format for storing metadata, improving data integration, and building applications using that integrated, highly connected data. Popular use cases leveraging MarkLogic Semantics include advanced search apps, recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security.

Learn More

Dean Allemang

It works, it just works.

I’m excited because MarkLogic claims to have brought RDF, XML, and other data together —including their various indexes— and they deliver. It’s hard but they went ahead and did it, and my experience is that it works great."

Semantic Data and Querying: RDF and SPARQL

Graph databases have risen quickly in popularity in recent years, and RDF Triple Stores—where semantic data is stored—are considered a type of graph database. When data starts to take on a graph structure in which entities (people, places, and things) and the relationships between them are the most important thing, it is better to use semantics, which provides better context for your data.

The standard way to represent semantic data is with RDF Triples (Resource Description Framework), and the standard query language is SPARQL. Triples are derived from subject-predicate-object constructions based on entities (people, places, or things) and their relationships. One example is, “John lives in London.” Another example is “London is in England.” Combining these two facts, inferences can be made, such as “John lives in England.”

In this way, simple facts can all be linked together to form a graph of hundreds of billions of facts and relationships. Such knowledge graphs power applications you use every day, including Google’s search and LinkedIn’s “people you may know” feature.

Kurt Cagle

RDF (and by extension SPARQL) becomes more important as the data models themselves become more complex, more associational, and more heterogeneous, simply because the variety of information will dominate over factors such as volume or velocity.”

Why RDF Triples?

It’s simple. Because it adds context to your data, which improves data integration.

Triples have an advantage over relational databases for many use cases involving relationships — you don’t need to worry about foreign keys, nested queries, and complex joins.


Triples are universally understood and can be easily searched and shared

Triples connect together to form graphs that are machine readable, and can even be used to infer new facts

Common standards are defined by W3C for RDF triples and the query language, SPARQL

Triple stores can scale to hundreds of billions of facts and relationships

Triple stores can leverage ontologies to organize and categorize data (ontologies are like taxonomies, but are richer and more useful)

New to the World of Data Integration? Start Here.

Download a free copy of our Data Hub Guide for Architects. This 75+ page eBook is the most authoritative guide to building and using data hubs in the industry, and is a must read for anyone architecting data integration solutions in the enterprise.

Get the eBook

Michael Henry | KPMG

Michael Henry - Semantics

“Using RDF triples allows us to create real time connections between data, such as organization structures and relationships between documents and data… We have scaled our platform to more than 40,000 documents per hour with an inventory of 50 million data points in 8 million documents. We have yet to reach a MarkLogic limitation.”

The MarkLogic Advantage: Multi-Model Database for Documents, Data, and Triples

Documents + Data + Triples.  MarkLogic natively stores documents, data, and triples together.

  • Fully Composable – MarkLogic is truly multi-model—you can write fully composable queries across all data types
  • Enterprise Ready – Full set of enterprise features features, including ACID transactions, security, disaster recovery, etc.
  • Sophisticated Indexing – Specialized triple index to improve query power and performance
  • Enormous Scale – MarkLogic stores hundreds of billions of triples

Why is the multi-model approach better? It unifies your data model, provides more flexibility, and better querying.

Download O’Reilly Multi-Model eBook

When to Use Semantics

Not every use case requires the MarkLogic Semantics option. Sometimes, the document model alone will work just fine. But, when your data involves relationships that you need to store and query at scale, then Semantics provides a great addition. Here are some of the most popular use cases for MarkLogic Semantics.

Integrated Master Data

MarkLogic Semantics acts as the glue for master data, providing an ideal model for reference data and metadata (provenance, lineage, etc.). MarkLogic stores entity data such as Customers and Orders as documents, and can store the relationships between those entities as RDF Triples. You can also describe metadata such as when a document was created, or how it relates to other documents using an ontology. With MarkLogic’s multi-model capabilities, these semantic relationships can be stored inside the documents themselves, or as standalone RDF Triples.

MarkLogic Semantics can help deliver personalized, real-time recommendations and intelligently expand search queries. Graphs are all about highly connected data, and with MarkLogic Semantics, you can leverage those relationships to suggest related people, products, questions, or anything else that is in the graph to help improve the front-end user experience. You can also intelligently expand searches based on semantic ontologies. Even if a document doesn’t mention that keyword you searched for, you still get an expanded set of results that are relevant. It’s a smarter way to build search apps.

MarkLogic Semantics makes it possible for financial services firms to examine relationships between parties and counterparties to uncover liability exposure or potential fraud. Or, for insurance companies, it makes it possible to uncover crime rings and fraudulent claims since there are usually connections that can be drawn between billing addresses, known associates, and historical records. Often these connections are lost in un-integrated or un-indexed data. Semantics brings it all to the surface — quickly.

Intelligence data can be significant in volume, complex in structure, and comes streaming in from multiple sources in different formats and types. To make sense of it all, it needs to be integrated. And, to analyze it all, you need to understand the relationships. MarkLogic Semantics enables you to connect data and visualize the relationships in order to draw conclusions. Whether it is a person of interest that the military is tracking, or police forces tracking neighborhood wrongdoing, MarkLogic Semantics makes it easier than ever before to use your data more intelligently.

MarkLogic Semantics makes it possible to leverage the trillions of triples available in the world that describe all sorts of things about the world. These facts are freely available — just see DBPedia, the CIO Factbook, and Geonames. Or, you can use your own. Either way, those triples can form the fabric of a knowledge graph to help improve search and discovery. For example, it may be helpful to surface facts about London when a user searches for London, or facts about who owns a company and what its subsidiaries are when a user searches for that company. There are limitless possibilities with the world of linked data.

MarkLogic Semantics helps manage IT assets across large organizations or really any asset. Most large organizations have hundreds, if not thousands of IT assets. They are valuable, but require lots of ongoing maintenance. Consider the racks and servers in a data center. With MarkLogic Semantics, you can store the data about that as triples, and run a simple query to say, “show me a list of all the Dell servers that are more than two years old” and get an instant result.

Supported Semantic Features

See the complete list
  • Store and manage hundreds of billions of RDF triples
  • Query across documents, data, and triples
  • Triple index for sub-second search results
  • Triple cache for high performance across large clusters
  • Bulk-load triples via MarkLogic Content Pump (mlcp)
  • Provenance and reification by adding metadata
  • XQuery helper modules for serializations and transitive closures
  • Updates, aggregates via MarkLogic APIs
  • Graph traversal with property paths and transitive closures
  • Semantic inference using rule sets at query time
  • Supplied rule sets for RDFS, RDFS+, and OWL Horst
  • Support for user-defined rule sets
  • Ontology Driven Entity Extraction
  • Full support for SPARQL 1.1
  • SPARQL endpoint and graph store protocol support
  • SPARQL from server-side JavaScript, Node.js
  • Support for Jena and Sesame APIs
  • Full integration with semantics technology partners (Smartlogic, Pool Party, Cambridge Semantics)
  • MarkLogic enterprise features: ACID transactions, certified security (at document/triple level), high availability and disaster recovery, scalability and elasticity

Ontology Driven Entity Extraction

This unique, MarkLogic Semantics feature improves search and classification by identifying entities in free-flowing text. Use the feature to automatically identify entities (people, places, and things) in free-flowing text and then return a list of those entities (extraction) or mark them up in the document (enrichment).

Entities are defined in a user-maintained dictionary, or you can build a dictionary automatically from a SKOS Ontology. If you need NLP (Natural Language Processing) to define an entity, you can use third-party tools such as Smartlogic, PoolParty, Expert System, NetOwl, or Calais.

Diagram of content being extracted based on entity rules
Diagram of content being enriched by entity rules

Feature-Rich and Built for the Enterprise

Connect With An Expert Explore All Features