Hadoop Distributed File System (HDFS) is the next generation of shared storage and the foundation of the Hadoop ecosystem. It provides reliable file storage at massive scale on commodity hardware. Most applications, however, need fast and secure access to specific pieces of data, not large files. Get more value from your Hadoop investments by running the MarkLogic Enterprise NoSQL database on top of HDFS to provide ACID transactions, role-based security, full-text search, and the flexibility of a granular document data model for real-time applications – all within your existing Hadoop infrastructure.


Unify Your Operational and Analytic Workloads

Today you can configure MarkLogic to use direct attached or shared SAN storage, and now you can also include Hadoop’s distributed file system. This provides a real-time database for Hadoop that leverages HDFS for scalability, performance, and availability enabling a fluid mix of data between operational and analytic workloads. Segregating data across different storage and computation tiers lets users optimize cost, performance, availability, and flexibility. Users can store data, indexes, and journals across a mixture of local (RAM, HDD, and SSD), SAN, and HDFS-based storage. This provides both secure, low-latency access to operational data and economical storage and processing of the remaining long tail.


Like HBase, But With Enterprise Capabilities Included

Just about anywhere you can access the file system in MarkLogic, you can now reference HDFS. From the DBA’s perspective HDFS is just another file system. You can mount data and index partitions directly on HDFS. Architecturally, this is similar to the way that HBase works with Hadoop’s distributed file system. MarkLogic has the added benefit of production-tested enterprise features, like ACID transactions, replication, and full-text search built directly into the database kernel.


Benefits of Running MarkLogic on Hadoop Distributed File System (HDFS)