Get the best of real-time Big Data Applications with the benefits of batch processing and cost-effective archival storage by combining Hadoop with MarkLogic. The MarkLogic Connector for Hadoop allows you to augment Hadoop capabilities with MarkLogic’s real-time capabilities to create opportunities for new, immediate business insights.


Real-time Your Hadoop

Seamlessly combine the power of MapReduce with MarkLogic’s real-time, interactive analysis and indexing on a single, unified platform.


MarkLogic and Hadoop

Hadoop as ETL

Raw data stored in HDFS can be refined and transformed by Hadoop MapReduce before being consumed by MarkLogic via the Connector for Hadoop. This allows you to leverage a single ETL infrastructure for all your data-driven applications.

MarkLogic Connector for Hadoop

The MarkLogic Connector for Hadoop is a powerful drop-in extension for Hadoop that provides efficient two-way communication between MarkLogic and the Hadoop distributed file system (HDFS), using standard MapReduce jobs. It simplifies parallel loading from HDFS to MarkLogic and provides the ability to leverage MarkLogic’s indexes for MapReduce processing. The Connector for Hadoop allows for efficient parallel reads and writes between a MapReduce job and a MarkLogic database. Reads can additionally use MarkLogic’s rich indexes and security model.

MapReduce and MarkLogic

Intermediate Intelligence

The Connector allows you to run Hadoop MapReduce jobs directly on data in a MarkLogic database. This lets Hadoop take MarkLogic data directly, refine or transform it, and push the results directly back into MarkLogic. For example:

  1. Financial data stored in MarkLogic is accessed by Hadoop.
  2. Hadoop then performs a variety of compliance calculations.
  3. The results of those calculations are automatically appended to the original records in MarkLogic.

Hadoop for Archival Storage

Organizations need ready access to operational data along with the ability to retain older information in less expensive archival storage, where it can be accessed as needed. Customers can keep their infrequently-accessed, non-operational data in HDFS/Hadoop, and load it into MarkLogic when it’s required.

Enhanced Security

MapReduce running in situ on data in MarkLogic can leverage MarkLogic’s granular security model to restrict analytic jobs to only the data they are entitled to—critical for many government and Enterprise applications.


Run MarkLogic Directly on Hadoop Distributed File System

MarkLogic is a real-time database for Hadoop that leverages HDFS for scalability, performance, and availability enabling a fluid mix of data between operational and analytic workloads. You can run the MarkLogic Enterprise NoSQL database on top of HDFS to provide ACID transactions, role-based security, full-text search, and the flexibility of a granular document data model for real-time applications all within your existing Hadoop infrastructure. Learn more about MarkLogic on HDFS.


MarkLogic Benefits

With Hadoop and MarkLogic together, you can segregate and move your data among different storage and computation environments.