Benefits of a Multi-Model Database
Multi-model databases provide an elegant solution to the challenge of managing heterogeneous data.
Multi-model databases provide an elegant solution to the challenge of managing heterogeneous data.
In contrast to polyglot persistence, where an application integrates multiple database models, a multi-model database naturally supports multiple data models in their native form using a single, integrated backend. Whereas polyglot persistence results in data silos and multiple interfaces that require complex integration workflows, a multi-model database facilitates integrated data and provides a unified interface for data consistency, security, and access.
Just imagine the simplicity your data architecture gains by modeling entities (like patient’s medical record) using the right mix of data models in a multi-model database vs. using polyglot persistence, which will require orchestrating various services by applications to maintain data consistency and security for end-user consumption.
MarkLogic Server is a multi-model database that combines the benefits of document, semantic graph, geospatial, and relational models into a single, scalable, high-performance operational database. It provides native storage for JSON, XML, text, RDF triples, geospatial, and binaries (e.g., PDFs, images, videos) with a unified search and query interface. Hence, you get the flexibility to choose the right mix of data models for your use-case without sacrificing data consistency. For example, you can build a reliable 360 view of your business entities (like customers etc.) to enable multiple use cases (like loyalty programs, personalization etc.).
This illustration, shows a document describing a person: Jen, and some triples that describe facts and relationships about Jen. And, if you wanted to represent Jen in a relational view, you can do that too.
The ability to store and query multiple data models in multi-model databases results in unprecedented flexibility, operational efficiency, and DevOps agility when integrating data from silos. For instance, relational databases require you to define one agreed-upon uber schema upfront and map all your source data using cumbersome ETL processes that are difficult to maintain as business needs evolve. In contrast, multi-model databases are schema-agnostic so that multiple schemas (or data models) can coexist and you can flexibly evolve the schema to promptly meet new business needs.
As a multi-model database, MarkLogic Server makes it easy to load any data as is and provides flexibility to make iterative changes faster while preserving lineage, provenance, and other metadata. You can add data sources as needed, load structured and unstructured data, and make schema changes to enable new use cases without impacting existing applications or re-ingesting the source data. It is for this reason that MarkLogic Server is the foundation of the MarkLogic Data Hub Platform to integrate, curate, and manage multi-structured data. It makes it possible to integrate data of any type (like IoT, Clickstream, Mainframe, ERP etc.) from any source (like Oracle, SQL Server, Teradata, Hadoop etc.).
A multi-model database supports multiple data models, indexes, and programming languages to enable multiple use cases while providing a unified data security, governance, and consistency model. With MarkLogic Server, you can expect the following:
The document database model is the most flexible of the NoSQL data models, and the most popular. Documents are ideal for handling varied and complex hierarchical data. Humans can read them, they closely map to the conceptual or business model of the data, and they avoid the impedance mismatch problem that relational databases have.
Whether it’s Java objects that represent business entities or free-flowing text from a “document” in the more traditional sense (Microsoft Word documents, PDFs, etc.), they are all naturally stored as JSON and XML documents with strong consistency in MarkLogic Server.
To securely access and share documents, MarkLogic Server provides a built-in search engine, document and element level security controls, redaction policies, and more. The search engine automatically indexes documents for full-text search on ingestion and gives you the flexibility to define additional indexes (e.g., range indexes, geospatial indexes) and customize relevance ranking. This and various other out-of-box features (like facets, snippets, etc.) enable you to quickly build advanced search applications.
In summary, here are the main benefits of using the document database model:
Documents are fantastic for storing business entities, but when it comes to entity relationships, a semantic graph database model—another popular NoSQL model—is best. It’s designed to store and manage relationships among people, customers, providers, or any other entity of interest.
Additionally, MarkLogic Server provides a semantic graph data model in the form of a built-in RDF triple store, which stores and manages semantic data. We call this capability MarkLogic Semantics. Semantics enhances the document model by providing a smart way to connect and enhance the JSON and XML documents. This facilitates data integration and enables more powerful querying to discover relationships and make inferences.
Semantics also provides context for your data by storing metadata (e.g. ontologies). For example, consider a product catalog that has information about parts, and one part is listed with a size of “42”. But, where is the contextual information: What are the units of “42”? What is the tolerance? Who measured it? When was it measured? This contextual information is the semantics data, which can be stored as RDF triples in MarkLogic Server.
Similar to the document model, MarkLogic Server’s built-in search engine indexes RDF triples for fast execution of semantic searches using SPARQL queries. You can easily compose complex queries that combine semantic and document searches to discover insights.
The document data model provides the flexibility to store geospatial data. MarkLogic Server can natively store, manage, and search geospatial data, including points of interest, intersecting paths, and regions of interest. This enables you to answer the “where” question in the context of all your other data (entities, relationships, etc.).
MarkLogic Server built-in search engine indexes geospatial data to power location-based search queries and alerts for geospatial applications. Learn more about how customers are using Geospatial to implement powerful location-based search applications.
Relational data models are useful for a reason. Sometimes, it’s really convenient to have structured views of your data in a tabular form that you can query with good ol’ standard SQL. With MarkLogic, your developers will feel right at home.
MarkLogic Server supports standard SQL. It allows you to create relational views on top of your data for SQL analytics without compromising data security. The underlying data never changes — it’s still available in its original format in MarkLogic Server.
The underlying technology that makes this level of SQL support possible is unique to MarkLogic Server. It’s called Template Driven Extraction (TDE). It enables you to define a relational lens over your data (or entities) so you can query it using standard SQL. Hence, you can use familiar BI tools for operational analytics.
Multi-model databases provide a unified search interface to query multiple data models using integrated indexes. Typically, you have to choose and manage specific indexes for each data type. On the other hand, MarkLogic Server has an integrated suite of indexes that allow fast data access – immediately after data is loaded. A multi-model database works more like Google — Google doesn’t require web pages to fit a certain format, it just indexes them and makes them accessible via a unified search interface.
MarkLogic Server’s built-in search engine indexes all data types and delivers exceptional search performance. Hence, users can quickly search data across multiple data models with a single, composable query. For example, you can combine semantic and search queries to find patients who are uninsured and suffer from chronic illness.
Multi-model databases provide industry-standard query languages and APIs to flexibly store and access data for all the supported data models. With MarkLogic Server, users can query data using Search, SQL, SPARQL, or REST API. It also supports multiple programming languages like JavaScript, Node, Java, and XQuery.
As a true multi-model database, MarkLogic Server also provides its Optic API as a unified query interface for multi-model data access. It provides flexible and easy access to data across all data models. You can create single, composable query across documents, relational views, and semantic graphs (in any combination). For example, you can use the Optic API to search and filter documents, execute relational operations (like join or aggregate), and retrieve (or construct) documents on output. Try doing that with another multi-model database!
A multi-model database complements its data modeling flexibility and unified query interface with a single data security, governance, and transactional model. As a unified data platform, it increases developers’ productivity and operational efficiency.
As a true multi-model database, MarkLogic Server provides a unified data security, governance, and consistency model. It uses a shared-nothing architecture to provide scalability and availability, and reduces the operational footprint for development, testing, upgrades, backup and recovery, and more.
You should be aware of multi-model imposters, as many vendors are falsely advertising multi-model databases. There are two kinds of multi-model imposters in the market:
Underneath though, these relational databases are still relational, not truly multi-model. Oracle’s documentation even states that Oracle 19c does not natively store JSON: “In Oracle Database, JSON data is stored using the common SQL data type VARCHAR2, CLOB, and BLOB (unlike XML data, which is stored using abstract SQL data type XMLType).” As a result, in order to retrieve a value the entire JSON document must be traversed to locate the data. This is slow. There are two approaches Oracle recommends to improve performance. One is to extract the data into a materialized view, pushing values into another table (i.e. shredding). The other is a JSON search index, which does not maintain ACID compliance – it is only updated periodically when triggered. For more details on MarkLogic vs Oracle, please click here. Other relational databases claiming multi-model status run into similar issues when they are not storing the NoSQL data types natively.
In general, multi-model workloads with multi-model imposters will be hard, brittle, or both. In addition to running into simple challenges like querying documents, they also are not able to do more advanced functions like link documents together with triples or query XML and JSON together – tasks that come easy with MarkLogic Server. To learn more on how MarkLogic compares with other data management solutions in the market, please click here.