While we have seen wide adoption of non-relational databases in the marketplace to build new applications and modernize legacy applications, why do we still see such big challenges with data integration?
Organizations have amassed large volumes of diverse data usually spread across many systems that operate in silos. And, these diverse data silos are only growing in number with the adoption of cloud services and applications. Hence, in order to build applications that meet quickly evolving business needs, organizations need to:
This comparison looks at how organizations can meet their data integration needs using the two cloud data platforms — MongoDB Atlas and MarkLogic Data Hub Service. In particular, it compares capabilities and underlying differences between the two cloud services for integrating data across an enterprise.
|MarkLogic Data Hub Service||MongoDB Atlas|
|Security & Governance||
MongoDB Atlas is a fully managed cloud database service (DBaaS). It makes it easy for organizations to migrate their applications to the cloud, use consumption-based pricing, and offload the administrative tasks to run, scale, and manage the MongoDB NoSQL database.
MongoDB has grown to be the most popular document database, leveraging its open source license and ease-of-use. Its distributed, scale-out architecture and unified query interface for data aggregation enable developers to quickly build highly-available and responsive applications. And, unlike relational databases, its document data model gives the flexibility to easily evolve application data models without having to remodel database schema, thus helping developers be more productive. More recently, new features like multi-document ACID transactions and new cloud services like MongoDB Stitch have made MongoDB Atlas more appealing for a broader set of use cases.
MarkLogic Data Hub Service is a serverless cloud data hub for agile data integration. It makes it easy for organizations to migrate their data integration workloads to the cloud and run transactional, operational, and analytical applications at scale for a predictable and low cost.
Superior to a DBaaS offering, Data Hub Service provides a unified architecture to ingest and curate any data, and govern the entire data integration lifecycle. The full suite of capabilities includes multi-model data management, built-in search, transactions, fine-grained security, Smart Mastering, and more. With these capabilities, organizations can quickly integrate data across their enterprise to create durable data assets that serve multiple use cases and user personas.
These are some of the core benefits of the Data Hub Service:
|MarkLogic Data Hub Service||MongoDB Atlas|
As the above table illustrates, MarkLogic Data Hub Service supports a variety of data formats, making it easier to accomplish data integration by ingesting data in its native form. The original data can then be progressively transformed into canonical data using built-in data curation tools, all the while recording valuable provenance and lineage metadata. Hence, the metadata and data stay together across the lifecycle and can be securely searched and queried for safe data sharing.
MarkLogic Data Hub Service and MongoDB Atlas are both built on a NoSQL (non-relational) foundation, but they serve rather different purposes. The sections below are illustrative of the key differences between the two data management platforms.
MarkLogic Data Hub Service is a cloud data hub featuring a full-stack enterprise data integration platform with none of the operational burden. Its unified architecture results in faster time-to-market and lower total cost of ownership (TCO) than stitching together your own data hub solution for data integration.
Data Hub Service provides the flexibility to load data as is, integrate and curate that data incrementally, and enrich it with semantic metadata — all without having to compromise on data quality, security, and governance. And, it integrates seamlessly with AWS and Azure platform services so that organizations can easily integrate Data Hub Service into their enterprise data architecture. Users can leverage various cloud platform services to build cloud-native applications, workflows, and pipelines that orchestrate data integration workloads and data services in Data Hub Service.
MongoDB Atlas is a document DBaaS offering. MongoDB also offers several add-on cloud services like Atlas Search, MongoDB Stitch, MongoDB Charts that integrate with Atlas and are billed separately from Atlas.
MongoDB Atlas does not offer fine-grained data security policies and application services to build data services (or REST APIs). As a result, applications have to implement data processing logic and data security policies outside the database, making applications more vulnerable to security holes and scalability issues. In contrast, data services (or REST APIs) in MarkLogic Data Hub Service are implemented in the database and benefit from the built-in search engine, caching, and fine-grained data access policies for speedy and secure data processing.
Organizations have the option to subscribe to the MongoDB Stitch service that provides application services and fine-grained data access policies for Atlas, but with extra consumption-based cost resulting in higher TCO.
Lastly, MongoDB Atlas does not provide data integration tooling out-of-the-box. Hence, organizations will have to use 3rd party-tools to integrate data for various use cases like operational data layer, legacy modernization. This not only increases TCO but also raises governance challenges of keeping data and metadata consistent across multiple tools. In contrast, MarkLogic Data Hub Service tracks data and metadata together through the integration lifecycle so organizations can govern and secure it all in one place. Hence, MarkLogic has a significant advantage over MongoDB in integrating data from silos.
MarkLogic Data Hub Service is a single, purpose-built service for agile integration of multi-structured data. It enables users to take an iterative approach for building the canonical model, harmonizing and mastering the data, and enriching it with semantics for faster delivery of data services to applications. Using this approach, organizations have achieved much faster results.
Data Hub Service provides end-to-end data curation tooling that is not only adept in processing non-relational data but also tracks lineage, provenance, and other metadata so that users can govern the entire process. The low-code/no-code data curation tools include the following capabilities:
These point-and-click tools support progressive data curation. Users can validate every step of the process and fix issues (or accommodate changes) quickly by looping back to any step of the data integration lifecycle. Hence, the canonical model can easily be extended to include new data sources and deliver governed, curated data to solve immediate business problems.
To compare, MongoDB Atlas will require users to integrate various third-party tools like ETL (for harmonizing data), MDM (for mastering data), a graph database (for semantic enrichment), and more for integrating data from silos. Operationalizing and maintaining this data integration architecture is more expensive. It requires running more services, which increases the infrastructure footprint (like data duplication, backups etc.), DevOps complexity, and governance challenges (like fragmented security policies, lineage, etc.).
Also, the third-party data integration tools (like ETL, MDM) are more adept at handling relational data than non-relational data. Hence, using these tools to harmonize and master complex hierarchical documents will require custom code that can be difficult to maintain and scale.
The unified architecture of MarkLogic Data Hub Service brings operational simplicity and increases productivity to build a secure and trusted integration hub. Using the iterative model-driven approach, organizations can easily integrate various data formats (like IoT, Mainframe, ERP, etc.) from any source (like Oracle, Teradata, Hadoop, etc.) to deliver consistent and fit-for-purpose data assets.
MarkLogic Data Hub Service is built on the proven foundation of MarkLogic Server, a multi-model database that stores data as documents and semantic graphs, and supports relational views for SQL querying. It gives the flexibility to manage high-level business concepts from multiple silos, materializing them as entities and relationships within a single integrated backend.
Users can store data in multiple formats (like JSON, XML, RDF triples, etc.), even for the same entity, and access data using standard query languages (like SQL, SPARQL) or REST APIs. This integration of semantic and document data adds meaning to data by capturing hierarchical relationships. For example, users can store entities as JSON documents and enrich them with domain-specific ontologies using RDF triples to build a knowledge graph for semantic searches. This makes MarkLogic much more adept at serving as a unified data intelligence platform to manage entities and relationships.
MongoDB Atlas is a single-model document database. It does not support any other data models natively. It stores data as BSON, a binary serialization of JSON documents and provides a flexible aggregation framework to process documents using various operators (like join, filter, etc.).
The document model makes for a super flexible database. However, it’s not efficient to store and analyze relationships or highly connected data. MongoDB Atlas provides an aggregation operator that performs recursive search on documents for simple field-value based graph traversal. However, it cannot support complex pattern matching graph queries to analyze multiple relationships in a data set. Hence, users cannot build analytical apps on graph data using MongoDB Atlas. In contrast, MarkLogic Data Hub Service natively supports graphs to describe relationships and enrich data with domain knowledge for semantic searches. With this approach, users can quickly build advanced analytical apps like social graphs, fraud detection, and more.
Lastly, collections in MongoDB Atlas are more restrictive as a document can only be assigned to one collection and the security policies are defined on collections, not documents. In contrast, collections in MarkLogic Data Hub Service are more like labels that allow for flexible slice and dice searches. And, the security policies can be defined at a more fine-grained level. They can be applied at the collection, document, and sub-document level.
When integrating data from silos, organizations need a flexible data foundation and MongoDB Atlas’ document-only data model is not enough. With MarkLogic Data Hub Service, users get a flexible multi-model data foundation in a single integrated backend for easier, faster, and more secure data integration.
MarkLogic has been powering high-performance transactional applications (or system of record) for over a decade. MarkLogic Data Hub Service supports transactions with 100% ACID compliance by providing strong data consistency for both read and write operations. It is built on the proven foundation of MarkLogic Server and provides enterprises with uncompromising data integrity and durability to run large-scale, operational systems for mission-critical use cases.
High-performance ACID transactions are a fundamental feature of operational and transactional data hubs. With Data Hub Service, users can quickly integrate data across lines of business to create a unified, consistent, and real-time view of data for operational applications. For more details on how MarkLogic supports ACID transactions, read this blog.
MongoDB support for transactions has improved over the years but it is still not a viable option for any use case that requires guaranteed consistency at scale with 100% ACID compliance. As it turns out, it’s a feature developers really want, and one that MongoDB developers have wanted for years.
Its recent release added support for multi-document ACID transactions on replica set and sharded cluster deployments. By default, transactions in MongoDB are not ACID compliant due to weak defaults for read and write concern. Hence, the onus is on developers to explicitly set the strongest level of read concern (i.e. snapshot) and write concern (i.e. majority) on transactions to achieve ACID compliance. However, an independent report on the ACID compliance of MongoDB version 4.6.2 found that even at the strongest levels of read and write concern, MongoDB failed to preserve snapshot isolation. It concludes that MongoDB’s claim of “full ACID transactions” in lieu of “snapshot isolation” is questionable or misleading.
Additionally, the performance of strongly consistent write operation (majority write concern) suffers from high latency and low throughput because MongoDB Atlas uses asynchronous data replication (or master-slave replication). In contrast, MarkLogic Data Hub Service uses synchronous data replication.
In general, applications cannot depend on MongoDB Atlas for data integrity and durability because it supports read and write operations with tunable consistency (i.e. ACID compliance can be enabled/disabled on a query by query basis). Hence, applications will have to use strongly consistent read operations (like linearizable read concern) to avoid dirty and stale reads. This again results in high latency given quorum at read time.
The lack of 100% ACID-compliant transactions in MongoDB Atlas means that applications will have to code transaction logic that is difficult to maintain and scale. Hence, MongoDB Atlas is not suited for mission-critical transactional and operational applications.
MarkLogic Data Hub Service provides fine-grained access control using document and element-level security, and rules-based redaction policies for data loss prevention. The security policies are defined using a flexible role-based model to redact and control access down to the level of individual fields. Once defined, these granular policies are enforced by the database as implicit constraints based on the user’s security profile. As a result, admins get to centrally administer security policies that are applied consistently across all data access (i.e. across both original data and integrated data) so that the organization can confidently share data and serve multiple use-cases and use-personas. In short, MarkLogic makes data integration a good thing for security and data governance. For more details on security, please refer to the Trust Center.
To compare, MongoDB does not have the same level of fine-grained security for controlling data access and sharing of sensitive data. As documented, MongoDB Atlas only secures data at the collection level—not the document or sub-document level. Hence, the onus of ensuring field-level protection lies with the developers writing the applications. As documented, developers must code applications to include field-level redaction logic (like a filter) on every query to the database. In contrast, MarkLogic Data Hub Service security policies are enforced by the database, thus, freeing applications from implementing data security policies.
In the quest for data security, it is important to still maintain data sharing. MarkLogic’s robust security controls are proven in mission-critical systems around the world. MarkLogic has earned the trust and reputation of protecting data assets for major financial services organizations, healthcare providers, and government institutions.
MarkLogic has customers storing upwards of three Petabytes of data. MarkLogic Data Hub Service uses a distributed shared-nothing architecture that scales elastically without having to worry about complex data sharding.
It uses a single-tenant, active-active high availability deployment architecture with synchronous data replication and automated failover. And, it independently auto-scales operational, analytical, and data integration workloads, and storage for high performance and reliability.
Lastly, with features like automated data balancing and built-in search engine, users get a consistent query performance as their applications scale and minimizes the need for database caching by applications. For more details, please visit the Data Hub Service page.
MongoDB Atlas has a distributed architecture and provides horizontal scale-out using data sharding. Sharding allows MongoDB Replica Set deployments to scale beyond the limitations of a single server. This approach means that organizations must undertake a complex, manual migration effort to convert replica set deployment to sharded cluster, where each shard is modeled as an independent replica set.
Sharding is a complex undertaking for DataOps teams, as applications have to be consciously designed to benefit from data sharding (i.e. design data models based on sharding key, which cannot be changed later on). Moreover, multi-document ACID transactions in MongoDB suffer from high latency and weak data consistency at scale. Hence, users cannot use MongoDB Atlas for data curation and operational workloads. In contrast, MarkLogic Data Hub Service provides a 100% ACID compliant scale-out architecture.
Building a 100% ACID compliant, distributed scale-out system is hard. With MarkLogic Data Hub Service, organizations can simplify their data architecture and run mission-critical workloads at scale with none of the operational burden.
MarkLogic Data Hub Service is built to deliver high performance and increase self-service consumption of both original data and integrated data. It uses a search-based approach for data access and provides a unified search interface to query multiple data models.
It comes with a built-in search engine that auto indexes the structure and content (includes words, phrases, relationships, and values) of documents for efficient and fast search queries. Collectively, this indexed information is known as the Universal Index that indexes data immediately when it is loaded. Hence, users can run full-text search on any ingested content without having to set up specific indexes, thus, helping to profile original data without any formal modeling.
The indexes are always in sync with data making it easy to support transactional and operational use-cases. To support complex queries, there are additional indexes like range indexes, reverse indexes (used for real-time alerts), geospatial indexes, triple indexes (used for semantic graphs or RDF data) and more.
Lastly, with numerous out-of-the-box search features, users can quickly build advanced search applications. The list of search features includes proximity, wildcard, punctuation-sensitive, diacritic-sensitive, case-sensitive, spelling correction, thesaurus, snippets, facets, highlight search results, type-ahead features, stemming, relevance ranking, multi-language support, and more.
To compare, MongoDB Atlas supports ad-hoc document queries and aggregation queries using an aggregation pipeline framework that processes documents as a multi-stage pipeline. The aggregation pipeline framework is a unified query interface to create composable queries and provides a rich set of operators for various data operations (like redaction, facets, aggregation, join, filter, graph, geospatial, etc.).
Like other DBaaS offerings, MongoDB Atlas also relies on indexes to speed up query execution. By default, it creates a primary index (i.e. unique index on ‘_id’ field) and users can create additional secondary indexes (like Compound, Geospatial, Text, etc.) on any field(s) in the document. However, this requires DataOps teams to spend considerable effort in creating and managing indexes for query optimizations. For instance, in order to define indexes (like single-field, compound), users need to understand all data access patterns and how index intersections are used by MongoDB to resolve queries. In contrast, MarkLogic Data Hub Service’s built-in search and universal index make index management simpler and more efficient.
Additionally, the text search feature in MongoDB Atlas is limited (e.g. no relevance ranking) to build high-performance search applications. To overcome this limitation, MongoDB released Atlas Search (based on Apache Lucene) as a separate service that provides advanced search capabilities using an aggregation pipeline operator. However, unlike MarkLogic Data Hub Service, Atlas Search indexes are eventually consistent. Hence, Atlas Search does not provide read-after-write consistency even for single document transactions. Moreover, Atlas Search does not auto-index, nor is it used to power all data access.
Lastly, MongoDB Atlas does not support industry-standard query languages (like SQL) but uses JSON syntax for querying documents.
Having the ability to use multiple lenses for analyzing multi-structured data is crucial to integrating data from silos. With MarkLogic Data Hub Service, users can do SQL analytics, semantic graph queries, documents search, and even combine them using Optic API to support various use cases and user personas. And, because the indexes are all optimized as part of a unified platform, the platform delivers consistent performance and scale.
MarkLogic Data Hub Service offers consumption-based pricing based on cloud credits. Organizations can pay as they go or get savings by buying credits in bulk in advance. The pricing is all-inclusive (includes hardware, software, operations, provisioning, and 24/7 support) and simple – organizations pay for compute, storage, and bandwidth. Also, it’s predictable when it comes to handling demand spikes due to time-varying (or unpredictable) workloads.
Organizations can accumulate up to 12x of their unused compute credits that are utilized during periods of demand spikes for bursting (or auto-scaling of compute) at no extra cost. This results in substantial savings as organizations do not have to provision capacity for peak usage. They also avoid costly billing due to auto-scaling when a demand spike happens. For more details on the pricing, visit the Data Hub Service page.
MongoDB Atlas also offers consumption-based pricing, but at a higher overall cost. And, there is no free automated bursting to meet demand spikes.
MongoDB Atlas pricing is more restrictive and expensive. You need to buy add-ons for LDAP integration, 24/7 support, BI connector, Stitch (application services for Atlas) and more. In contrast, MarkLogic Data Hub Service pricing is all-inclusive and predictable.
Transparent and predictable billing is an important consideration when using cloud services. With MarkLogic Data Hub Service, organizations can run data integration workloads and operational and analytical applications with high performance and reliability at a predictable and low cost.
MarkLogic Data Hub Service is ideal for complex data integration use cases – especially when you have large data sets with multiple data models. Whenever data is rapidly changing or the business needs are quickly evolving, it will work better in a Data Hub with a multi-model database.
Below are three broad buckets of Data Hub use cases:
There are industry-specific use cases as well, like consolidating financial trading data or building a universal bill of materials for a manufacturer.
Below are a few examples, where organizations specifically chose MarkLogic over MongoDB:
MongoDB Atlas gained popularity among developers as an open-source database and is a good choice for organizations looking for a cloud-neutral and easy-to-use document database for new, non-transactional applications.
For enterprise organizations that are looking to integrate data and power mission-critical use cases with a scalable, multi-model database, MarkLogic Data Hub Service is a better fit.
Read the MarkLogic Data Hub Service documentation.