MarkLogic & Hadoop
“The MarkLogic Connector for Hadoop allows us to do much more with data we extract using Hadoop. We can now run detailed, rich analytics of image fingerprints by putting the data into MarkLogic.
Real-time Your Hadoop
Seamlessly combine the power of MapReduce with MarkLogic’s real-time, interactive analysis and indexing on a single, unified platform.
- Get more power out of Hadoop. Hadoop and MarkLogic together can allow you to tackle problems that would be difficult or impossible to address by either technology alone.
- Save money by leveraging common infrastructure. Using MarkLogic and Hadoop Distributed File System (HDFS) enables common batch-processing infrastructure to be used across many different projects and applications.
- Enterprise-class support for Hadoop. Our partnership with Intel provides a strong, supported platform for building secure, enterprise-class Big Data Applications with Apache Hadoop.
Hadoop as ETL
Raw data stored in HDFS can be refined and transformed by Hadoop MapReduce before being consumed by MarkLogic via the Connector for Hadoop. This allows you to leverage a single ETL infrastructure for all your data-driven applications.
MarkLogic Connector for Hadoop
The MarkLogic Connector for Hadoop is a powerful drop-in extension for Hadoop that provides efficient two-way communication between MarkLogic and the Hadoop distributed file system (HDFS), using standard MapReduce jobs. It simplifies parallel loading from HDFS to MarkLogic and provides the ability to leverage MarkLogic’s indexes for MapReduce processing. The Connector for Hadoop allows for efficient parallel reads and writes between a MapReduce job and a MarkLogic database. Reads can additionally use MarkLogic’s rich indexes and security model.
The Connector allows you to run Hadoop MapReduce jobs directly on data in a MarkLogic database. This lets Hadoop take MarkLogic data directly, refine or transform it, and push the results directly back into MarkLogic. For example:
- Financial data stored in MarkLogic is accessed by Hadoop.
- Hadoop then performs a variety of compliance calculations.
- The results of those calculations are automatically appended to the original records in MarkLogic.
Hadoop for Archival Storage
Organizations need ready access to operational data along with the ability to retain older information in less expensive archival storage, where it can be accessed as needed. Customers can keep their infrequently-accessed, non-operational data in HDFS/Hadoop, and load it into MarkLogic when it’s required.
MapReduce running in situ on data in MarkLogic can leverage MarkLogic’s granular security model to restrict analytic jobs to only the data they are entitled to—critical for many government and Enterprise applications.
Run MarkLogic Directly on Hadoop Distributed File System
MarkLogic is a real-time database for Hadoop that leverages HDFS for scalability, performance, and availability enabling a fluid mix of data between operational and analytic workloads. You can run the MarkLogic Enterprise NoSQL database on top of HDFS to provide ACID transactions, role-based security, full-text search, and the flexibility of a granular document data model for real-time applications all within your existing Hadoop infrastructure. Learn more about MarkLogic on HDFS.
With Hadoop and MarkLogic together, you can segregate and move your data among different storage and computation environments.
- Financial services and government agencies often have data retention and legal reporting requirements. Hadoop and MarkLogic are a powerful solution for storing, accessing, reporting-on, and analyzing massive amounts of information.
- Hadoop running with the Connector can take advantage of MarkLogic’s sophisticated security model for sensitive data.
- Using Hadoop and MarkLogic together enables businesses to run more efficiently by letting them optimize trade offs among cost, performance, availability, and flexibility.