Join us May 20-21 for MarkLogic World 2020 in Chicago

MarkLogic FAQs

General Questions

MarkLogic provides the following products:

  • MarkLogic Data Hub Platform: A full-featured data integration and curation platform for all of your enterprise data. It can be deployed as a fully managed cloud service (MarkLogic Data Hub Service) or as a self-managed data hub on-premises or in a public cloud.
  • MarkLogic Data Hub Service: A cloud-neutral, fully managed cloud data hub service that features the full-stack MarkLogic Data Hub Platform to build your enterprise data hub for fast data integration and data management.
  • MarkLogic Server: A multi-model database with modern NoSQL and trusted enterprise capabilities. It is the foundation for the MarkLogic Data Hub Platform. It can be deployed in any environment (on-premises or public cloud) or can be consumed serverless with the fully managed MarkLogic Data Hub Service.

MarkLogic also develops associated tools and connectors for the ecosystem, which includes various APIs and connectors to MuleSoft, Apache NiFi, Kafka, Hadoop, Esri, and many BI tools.

MarkLogic solves the complex data silo problem. Almost every organization is facing challenges in managing and getting value from data due to data silo proliferation. Silos are tech debt accumulated over time as data migrated across various tools for different needs, and are on the rise with the adoption of cloud services. The result is that organizations face challenges building new data-driven apps and innovating in general because they do not have a stable integration hub for their most important data assets. To better understand the silo problem and learn how to “escape the matrix“, take this quick tour.

Simply put, the main reason customers use MarkLogic products is that it simplifies their overall approach to complex data integration projects. By simplifying data integration, MarkLogic provides faster time to results, lower costs, and increased security of critical data assets. And it is future-proof, as customers can easily migrate their MarkLogic workloads to the cloud. For specific examples of the benefits of MarkLogic products, please visit our customer section.

The MarkLogic Data Hub Platform is a unified data platform that integrates, curates, and manages your multi-structured data. It works by ingesting data from any source, indexing it for immediate query and search, and curating it through a process of harmonization, mastering, and enrichment. It is powered by MarkLogic Server and can be deployed as a fully managed cloud service (MarkLogic Data Hub Service) or as a self-managed data hub on-premises or in a public cloud.

MarkLogic Data Hub Service is a fully managed cloud data hub. It is a cloud-neutral, serverless, SaaS solution that delivers best-in-class performance, scale, and granular security for fast data integration and data management. With MarkLogic Data Hub Service, organizations can speed up their cloud migration, moving their data integration projects from self-managed, on-premises deployments to a fully-managed cloud environment in months, not years.

MarkLogic Server is an enterprise-grade multi-model database. It is designed with both modern NoSQL and trusted enterprise capabilities to power transactional, operational, and analytical applications. It natively handles document data, semantic graph data, and geospatial data. You can also create relational views on top of the document model for SQL analytics. This all happens with a unified back-end that has built-in search, granular security controls, distributed ACID transactions, and more. These capabilities make it the best database to power a data hub. In fact, the MarkLogic Data Hub Platform is powered by MarkLogic Server.

MarkLogic Data Hub Platform offers a simpler, more agile approach to data integration than any other data integration and data management platform. Here’s why:

  • Fast pipelines vs waiting on complex ETL — MarkLogic Data Hub Platform ingests raw data as is, relying on its multi-model approach to handle any incoming data sources. This is in contrast to other platforms that require upfront data modeling and ETL, which can take months or years.
  • One unified platform vs many tools bolted together — The MarkLogic Data Hub Platform architecture is simple and holistic. It combines a multi-model database, full-text search, data ingestion tools, data harmonization and mastering capabilities, machine learning, and application services all within a single unified platform. To build a similar architecture with open source components or services offered by the cloud vendors, you would have to bolt together up to a dozen different things.
  • Agile data curation vs waterfall approach — With MarkLogic Data Hub Platform, users curate data iteratively. Users start with the business question, then define the necessary data service required. Then, only the data that is required for that initial data service is curated. As more data services are defined, more data is curated. This is very different from the old, “big bang” approach that required all data to be modeled and curated in advance.
  • Serious enterprise-grade security vs “check the box” security — MarkLogic has always taken security very seriously. MarkLogic Data Hub Platform was designed from the start to handle the mission-critical use cases. When it comes to data integration, this means full data lineage tracking and audit capabilities, granular role-based access control, advanced encryption, and more. The benefit is that you can really trust MarkLogic with important data assets, and you actually improve shareability of those assets because they are better governed.
  • Straightforward development vs requiring new skills — First, MarkLogic Data Hub Platform uses industry-standard APIs and programming languages and has easy integrations so that it plugs into existing environments. Second, it is designed so that developers can think of data as objects, not just rows and columns (though it does support standard SQL). This means there is no disconnect between applications that consume data as objects and the data layer. The sum of these benefits is that developers find MarkLogic easier and more natural to work with.

Broadly speaking, MarkLogic is used for data integration. As it applies to solving your business problems, this can take shape across many use cases. Often, it means using MarkLogic as an integration hub to build a reliable 360° view of a thing that’s important to your business, across many disconnected data sources – whether that “thing” be a customer you are serving, a product you are manufacturing, a program whose effectiveness you are evaluating, a person you are investigating, or any other entity of interest. MarkLogic is also great to use for regulatory compliance reporting and for metadata and content catalogs – or as a data platform for Artificial Intelligence and Machine Learning. Another broad benefit of MarkLogic is that it can speed up a customer’s journey to the cloud. It is very easy to build a data hub on-premises, migrate it to a cloud environment, and then eventually progress to a fully managed Data Hub Service.

MarkLogic is used in industries throughout the public and private sectors, including Financial Services, Insurance, Pharma, Manufacturing, Media (Publishing and Entertainment), National Security, and Government. All of these industries have experienced significant disruption requiring them to make better use of existing and new data sources – while ensuring their integrated data is reliable and secure.

We are a data platform company, not an application software company. That said, our solutions engineers and consultants can work with you to develop the applications that your business needs or to serve up integrated data to applications you already use, like BI tools or GIS.

MarkLogic Data Hub Platform is available to customers as a fully managed cloud service with MarkLogic Data Hub Service. For a self-managed data hub (on-premises or in a public cloud), customers will license MarkLogic Server and get MarkLogic Data Hub Software free of charge.

MarkLogic Data Hub Service, our fully managed cloud service, is not licensed, but rather consumed by purchasing MCUs (MarkLogic Capacity Units). MCUs provide hourly compute capacity and are purchased with pay-as-you-go (or usage-based) billing, or customers can make an up-front commitment with prepaid (or reserved) billing. Data Hub Service also charges customers an associated usage-based fee for the storage consumed. Bandwidth costs for running Data Hub Service are a pass-through from the public cloud providers. The benefit of this offering is that customers get everything the MarkLogic Data Hub Platform has to offer in the cloud, with one single usage-based bill.

MarkLogic Server is licensed as proprietary software with Essential Enterprise subscription licenses for 8-core packs for deployment in customer data centers or in a public cloud. Customers can also purchase Essential Enterprise licenses directly through the public cloud marketplaces on AWS or Microsoft Azure. For developers, who want to get started immediately, MarkLogic also offers a free, full-featured Developer Edition.

Are you interested in more information about MarkLogic products or to get a quote?

Contact Us Today

Data Hub Service

Superior to a database-as-a-service offering, MarkLogic Data Hub Service offers a secure, future-ready data hub platform — a complete solution to integrate a variety of data. It is a fully-managed cloud data integration solution based on MarkLogic Data Hub platform. It provides the ability to ingest and curate any data using fast data pipelines, and to execute pipelines or access data via standard interfaces like REST APIs. Finally, there is no customer lock-in. It is easy to get data in and out of Data Hub Service. And, as a cloud-neutral service, it supports open-standards for ingesting, querying, and sharing data.

Data Hub Service is a serverless cloud data hub solution — inclusive of hardware, software, operations, and support. As an open data integration platform, it supports open standards like REST, ODBC, and Java to load, search, and share data. This results in flexibility for customers to use their favorite tools to load data as is and perform instant analysis on source data.

It is available on AWS (and coming soon on Azure), giving customers the choice to pick their preferred public cloud provider. It stores customer’s data in the selected cloud region, in a dedicated VPC. Therefore, the data is stored and transacted through infrastructure and secured endpoints specifically dedicated to the customer. The secured endpoints are used to onboard users, assign roles, and deploy and access data hub on Data Hub Service.

It comes bundled with Marklogic Data Hub software and MarkLogic Database Server to build, deploy, and run your data hub. The Data Hub software provides a UI to configure data pipelines, build data services, and deploy them to the Data Hub Service. Customers can deploy one or more data hubs to Data Hub Service and get the same set of SLAs from MarkLogic.

With Data Hub Service, MarkLogic handles provisioning capacity on-demand, high availability, workload-aware autoscaling, and automating cluster lifecycle management. As a result, customers gain agility in getting value from data with significant TCO savings. For e.g., Data Hub Service frees up DataOps team from complex data sharding to scale applications by providing dynamic scaling with automatic rebalancing for maximum performance. Additionally, it runs queries using fast integrated search that minimizes the need for database caching by applications.

Data Hub Service provides an iterative, flexible data modeling approach to manage change, enable multiple use-cases, and fast-track results. It delivers incredibly fast data ingestion performance and supports batch or real-time data curation pipelines that scales horizontally. It lets users query using Search, SQL, REST API, Java SDK, or familiar BI tools across operational and transactional data. The following lists a few capabilities of the Data Hub Service:

Multi-Model Data Management

  • Document data model, XML and JSON optimized
  • Semantic graph and search to connect, enhance, and query documents
  • Geospatial support for location-based search queries
  • Full-text search
  • Relational views for SQL analytics
  • Embedded machine learning and bring your own model using ONNX open standard

Data Orchestration

  • Load data as is and perform instant analysis
  • Data services for schema-on-read access
  • Real-time alerts and triggers for event-driven applications
  • Data provenance and lineage for improved governance
  • Data harmonization, enrichment, and mastering for high data fidelity

Data Security

  • Data anonymization and redaction for secure sharing
  • Single Sign-On (SSO) authentication using LDAP, SAML, or Kerberos
  • Role-based access control with document and element level security
  • Encryption at-rest and in-motion using client-owned encryption keys

Cloud Scale

  • Distributed ACID transactions
  • Big data (Petabyte) capacity
  • Auto-scaling for compute-intensive workloads
  • Massively parallel, scale-out, shared-nothing architecture
  • Active-Active high availability (HA) with synchronous data replication and automated failover

Cloud Economics

  • Free automated compute burst
  • Always-on, enterprise-grade reliability
  • 24*7 monitoring, auto backups, and upgrades

As a cloud-neutral serverless solution, Data Hub Service charges for the compute and storage consumed. Also, there will be nominal bandwidth (or data transfer) fees as charged by the public cloud provider.

The usage-based fee for storage is priced monthly (US $/GB/Month). It is normalized across all of the data managed by a customer’s Data Hub Service instance. It includes the customer’s data, indexes, logs, backups, etc. and is billed until the instance is deleted.

The compute capacity of Data Hub Service is measured in MarkLogic Capacity Units (MCUs). It is priced hourly (US $/MCU/Hour) using either consumption-based or prepaid billing. Customers get a significant discount (approximately 43%) for an upfront annual commitment using prepaid billing. Customer billing for compute can use a combination of prepaid and consumption-based billing.

If customers do not buy compute capacity upfront using prepaid billing, then it is billed using consumption-based billing. However, if customers buy compute capacity upfront (i.e. prepaid MCUs/hour) and are consuming compute in excess of the prepaid compute capacity, then any excess compute utilization (i.e. MCUs/hour) is billed using consumption-based billing. Also, since customers buy compute capacity upfront (i.e. prepaid MCUs/hour), any underutilization of compute has no impact on the billing.

Customers are billed for compute as long as the Data Hub Service instance is live (i.e. running idle or being utilized). The billing stops when the customer terminates (i.e. stops or deletes) the service. For an inactive (or stopped) Data Hub Service instance, the billing for storage continues even though billing for compute stops because customer’s data is retained. If the customer terminates (or deletes) the Data Hub Service instance then billing stops for both compute and storage.

Data Hub Service offers transparent and predictable pricing to serve unpredictable workloads or workloads with time-varying demand. The reason why the cost is predictable and low stems from how the compute credits are accumulated and utilized.

For a live Data Hub Service instance, customers will accumulate compute credits (measured in MCUs), if the service is not being utilized (i.e. idling) or under-utilized (i.e. running below baseline (or provisioned) compute capacity).

The maximum compute credits that can be accumulated is limited to 12X of the baseline compute capacity. For instance, if a customer sizes a Data Hub Service instance for 32 MCUs per Hour then the maximum compute credits that can be accumulated is 384 MCUs. This accumulation of compute credits applies to both prepaid and consumption-based billing.

Production Data Hub Service instances allow for auto-scaling of compute capacity. This auto-scaling provides free compute burst up to 12X the baseline compute capacity (e.g. 64 baseline MCUs / hour supports free compute burst up to 64 * 12 = 768 MCUs).

The accumulated compute credits are used when Data Hub Service runs in burst mode (i.e. above the baseline compute capacity provisioned by the customer). Hence, instead of getting billed for burst compute via consumption-based billing, customers get compute burst for free as long as there are sufficient compute credits. If customers do not have sufficient compute credits then Data Hub Service compute capacity is throttled to the baseline compute capacity. As a result, customers are not billed for overuse of compute capacity because any overuse of compute capacity is paid for by the accumulated compute credits.

The accumulated compute credits do not expire as long as the Data Hub Service instance is live. Also, the credits roll over from one billing cycle to the next for the life of the Data Hub Service contract. However, if a customer terminates Data Hub Service instance or resizes the instance (i.e. scales up or down the baseline compute capacity) then all accumulated compute credits expire.

Data Hub Service billing is integrated with the public cloud provider billing. Customers will receive a monthly invoice for Data Hub Service in their public cloud provider account.

Please note that even though Data Hub Service bills separately for compute and storage, the monthly bill in the customer’s public cloud provider account will only have a single line item – MarkLogic Cloud Service – the service subscribed to with the public cloud provider. The billing line item for MarkLogic Cloud Service will aggregate billing across compute and storage for MarkLogic Data Hub Service.

A MarkLogic customer account is created when a user subscribes to MarkLogic Cloud Service via your public cloud provider. Within the MarkLogic account, customers can create one or more Data Hub Service instances. The billing is aggregated across all Data Hub Service instances in the customer’s account.

Data Hub Service does not enforce a hard limit on the storage allowed for a chosen compute capacity – the service will never block data ingestion process. However, Data Hub Service can perform poorly if the data volume exceeds suggested storage volume for the provisioned (or baseline) compute capacity. For optimal performance, it is recommended to keep baseline compute capacity consistent with data volume stored in Data Hub Service. Please refer to the Data Hub Service pricing guide for more details.

Data Hub Service offers 99.95% read and write availability to build highly available apps. It provides active-active high availability (HA) with synchronous data replication. Each self-healing Data Hub Service cluster is distributed by design within a cloud region. For instance, Data Hub Service on AWS provides HA across three availability zones in a region. This enables elastic read and write scalability with zero data loss (i.e. RPO = 0) due to synchronous replication. The service automatically recovers from an availability zone (or data center) failure in less than a minute (i.e. RTO = 1 minute) and will cancel any in-flight transactions (i.e. ACID-compliant). For information, please see the Data Hub Service service level agreement. Please contact MarkLogic sales, if you need cross-region high availability to support disaster recovery requirements.

Data Hub Service is a single-tenant data platform with a shared-nothing architecture. It runs in the MarkLogic owned VPC, dedicated to the customer, in the region selected by the customer.

Customers can manage Data Hub Service users using their LDAP server and configure user privileges using role-based access control. Users can securely access Data Hub Service via public IP or can securely connect using VPC peering when Data Hub Service is configured with a private IP.

Data Hub Service stores encrypted data using customer-owned keys. Hence, MarkLogic’s operational support team will never be able to see a customer’s data. Learn more about Data Hub Service security and compliance certifications. Please consult release notes for details on support for external KMS.

Data Hub Service provides full weekly and incremental daily backups out-of-the-box. These automatic backups have a retention policy of four weeks. In order to perform point-in-time restore from backups, please contact MarkLogic Support.

MarkLogic is responsible for Data Hub Service upgrades. Minor upgrades are performed in a rolling fashion to guarantee availability of service during upgrades. Major upgrades can result in minor planned downtime during maintenance windows. MarkLogic will notify users about upcoming upgrades and perform upgrades during scheduled maintenance windows.

Data Hub Service will support multiple versions of the MarkLogic Data Hub. Customers can choose the version to deploy and if a version is no longer supported (due to end-of-life), customers will be notified and will be required to deploy their data hub with the new supported version.

Data Hub Service offers a general-purpose data platform that provides immediate business value and re-use for a variety of use cases. Many organizations start with simple use cases like search applications but then quickly scale to complex use cases like Customer 360 when they see how fast it is to achieve business results.

MarkLogic Server

MarkLogic Server is a multi-model database with modern NoSQL capabilities and trusted enterprise capabilities. It is the best database to power a data hub. You can deploy it on-premises or in the public cloud or consume it as a serverless, fully-managed cloud service with MarkLogic Data Hub Service.

Enterprises need a database that supports more than just a single data model to unlock the value from all of their data in an efficient manner. A multi-model database combines the benefits of graph, document, geospatial, and relational models into a single, scalable, high-performance operational database. It creates integrated indexes and provides unified query interface to securely access multiple types of data.

As a multi-model database, MarkLogic Server natively stores JSON, XML, text, geospatial, semantic triples and binaries (e.g. PDFs, videos) in one unified platform. This ability to store and query document data, location data, semantic graph data, and build relational views for SQL analytics results in unprecedented flexibility and agility when integrating data from silos.

Yes, MarkLogic Server natively stores JSON documents. You can ingest JSON and other documents without worrying about schema complexity.

In addition, you can use RDF Triples to enrich JSON documents with semantic metadata (or ontologies) or connect JSON documents to build a knowledge graph. This is important for performing semantic search queries to explore relationships and make new inferences, and also for data integration.

Also, unlike other document databases, MarkLogic Server has built-in search to index all of the data at ingest time so that users can do immediate data discovery.

MarkLogic Server has had ACID transactions since its first version, allowing users to build operational and transactional applications. MarkLogic Server’s ACID properties are highly scalable across large clusters and apply to multi-document transactions, multi-statement transactions, and XA transactions (i.e. transactions between clusters). While standard in relational databases, ACID transactions differentiate MarkLogic Server from other NoSQL databases, providing the unique reliability to run large-scale operational systems for mission-critical use cases.

Built-in search is extremely useful for both data integration and application development. MarkLogic Server provides sophisticated indexing capabilities that reduce the time and effort to build and configure indexes for standard queries, and do not require a bolt-on search engine for full-text search like other databases. This enables users to immediately search and discover any new data loaded into MarkLogic Server, and also keep track of data as it is harmonized. Users can leverage search when building both transactional and analytical applications that require powerful queries (like geospatial, semantic) to be run efficiently, and when building Google-like, full-text search features into an application.

MarkLogic Server achieves HA/DR using a shared-nothing architecture that provides redundancy for failover and high-performance scaling to ensure that your data is always available.

  • Shared nothing architecture has no master-slave relationships, eliminating any single point of failure.
  • MarkLogic Server has point-in-time recovery and ACID transactions to ensure full redundancy and consistency
  • Changes do not require a server restart (re-indexing, adding nodes, or changing configurations)
  • Database replication between sites is secured with SSL out-of-the-box
  • Incremental backups consume less storage and can be completed quickly

MarkLogic Server runs some of the biggest NoSQL systems in the world. It is trusted by large investment banks, major healthcare organizations, and classified government systems to securely integrate their most critical data assets. It is a massively parallel processing database that scales horizontally in clusters on commodity hardware to hundreds of nodes, petabytes of data, and billions of documents—and still processes tens of thousands of transactions per second. When demand dissipates, MarkLogic Server can scale back down without having to worry about complex sharding. Organizations can easily run large scale web applications and handle incredible volumes of data on MarkLogic Server.

MarkLogic provides flexible deployment options. You can choose either self managed or fully managed with MarkLogic Data Hub Service. With self managed, you can deploy MarkLogic Server either on-premises or in the cloud.

MarkLogic Server handles both operational and transactional workloads, at scale, in a single database. This is possible because it provides modern NoSQL capabilities and trusted enterprise capabilities like flexible data model, ACID transactions, full-text search, real-time alerts, SQL analytics, and more. You can easily build transactional applications and perform real-time operational analytics.

MarkLogic Server is a proprietary software but provides a free developer’s license. This gives developers the ability to download and install the full capabilities of MarkLogic Server and get going in a few minutes. MarkLogic also makes some of its projects open source.

In general, MarkLogic’s philosophy is to build a proprietary core that uses open standards and provide open source connectors and APIs for easy integration into existing environments and ecosystems. This results in an open data platform with zero vendor lock-in and helps create a sustainable business model.

MarkLogic Server is licensed in 8-core packs for on-premises or cloud deployment. The number of licenses is determined based on the infrastructure, data volume, and your intended goals. You also have the option to consume MarkLogic Server as a serverless, fully-managed cloud service with MarkLogic Data Hub Service. MarkLogic Sales can help you optimize your licensing options.

Are you interested in more information about MarkLogic products or to get a quote?

Contact Sales Today

Yes. Any of the MarkLogic Server additional options that are available can be bought at any time. For example, if a customer starts with no options, but wants to add Semantics later on, that is okay.

Customer Success

Yes. MarkLogic Consulting is a professional services arm of MarkLogic Corporation that provides support for implementations, expert advice, and also full-service development. Our consultants can assist your team with whatever level of support you need to ensure your next MarkLogic project is a success.

Yes. There are many projects that have a mix of resources, including MarkLogic consultants and a mix of one or more other partners. We have a large number of other organizations that we’ve worked with and have experience using MarkLogic.

See Preferred Service Partners

Every consulting engagement is different. There are many factors that are important to consider:  Existing experience with MarkLogic, how many data sources there are to be loaded, desired timeline, ongoing enhancements, and budget or contract vehicle in place. In general, our goal is to partner with customers and act as an enabler so that a customer can, as soon as possible, take the training wheels off and be successful on their own with MarkLogic. But, each project is unique and should be discussed so that we can tailor resourcing needs to your business goals.

Yes. Sometimes seeing is believing and customers want us to prove that our technology works with their data and environment to solve their particular business challenge. We enjoy showing off our technology and can work with you to scope a proof-of-concept to your needs. Learn more about our consulting Quick Start Services.

In general, the hourly billing rate for MarkLogic consultants is in line with the industry average. For each project, MarkLogic will work with you to understand the factors that can affect overall costs such as the number of data sources, amount of data harmonization required, application requirements, the level of your team’s involvement, and other factors that determine the overall cost.

Yes. Our consultants work with many government organizations, particularly in the U.S., that require active security clearances.

Training & Community

Free technical training is available through MarkLogic University. Training is organized into role-based learning tracks and courses are available through a variety of formats including live instructor-led online training events, self-paced video-based learning, and feature-focused tutorials. All training is designed to provide hands-on, project-based learning.

After completing a training course, attendees can complete a free learning assessment to validate that they have successfully achieved the learning objectives outlined for a specific training course.

And after completing their core learning track, technologists can continue to develop their skills by completing relevant specialty courses, getting involved with the MarkLogic Community, or completing the Data Hub Flight School project simulation.

Yes. After completing training and working on a project to gain some real-world experience we recommend that MarkLogic Professionals validate their expertise by earning the MarkLogic Professional Certification.

MarkLogic offers certification programs for developers and administrators. Earning the certification requires the successful completion of a written exam followed by the successful completion of a hands-on project-based exam.

A developer who is completely new to MarkLogic technology can complete the core MarkLogic Developer track in just four days.

And to make it even easier and more convenient to learn MarkLogic, we offer training through both scheduled online training events and self-paced training so that developers can have flexible learning options available to them.

This enables the learner to focus on getting trained while also ensuring they have time to dedicate to their normal job responsibilities.

MarkLogic has 250k active users who visit our developer site, explore the docs, take MarkLogic courses, contribute to tools, and download our products.

We encourage our community to ask and answer questions on Stack Overflow. Be sure to tag your question!

Sign Up for Our Live Demo

See how MarkLogic integrates data faster, reduces costs, and enables secure data sharing.

Register Now

This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.