I was just looking at your MarkLogic listing, and wanted to make you aware that there are actually three ways in which MarkLogic supports MapReduce:
1. Our Hadoop connector allows customers to run Hadoop MapReduce jobs on data in MarkLogic. Doing this can be much faster and more secure than running Hadoop MapReduce jobs over the same corpus in HDFS because you can specify query constraints on the universe of documents you want to map, and we’ll use our fast index-based query to only map the documents that conform to those constraints. In addition, we will automatically security-trim the set of documents to map.
2. As of MarkLogic 7 (shipped November 2013), we support using HDFS to store our native data storage format (known as forests), so you can run your database directly from HDFS as if it were any other file system. If you do this, you can also run MapReduce jobs directly against those forests, even if they are not attached to a running MarkLogic instance. This is very useful for cases where customers want to archive data on HDFS, detach it so that it isn’t consuming MarkLogic compute cycles, but still have access to it for batch analytics without having to perform any ETL on the data. What’s more, in this scenario if the customer does want to interactively query the data, it can be re-mounted and queried in seconds.
3. Finally, we also have a non-Hadoop, in-database MapReduce capability for computing aggregates over large amounts of data in real time. The way it works is that customers can write a user defined function (UDF) in C++ that uses the map/reduce pattern. These functions are pushed to the nodes where the data is managed and are executed in-process with the server process. The source data for aggregation comes from our range indexes, which are memory-mapped files, so the entire process happens in memory, which allows it to work in real time.
It would be great to get our listing corrected to reflect this. Ideally you could list three different ways we allow MapReduce in that column (text in parentheses would work for the info icon tooltip):
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
Get info on recent and upcoming product updates from John Snelson, head of the MarkLogic product architecture team.
The MarkLogic Kafka Connector makes it easy to move data between the two systems, without the need for custom code.
MarkLogic 11 introduces support for GraphQL queries that run against views in your MarkLogic database. Customers interested in or already using GraphQL can now securely query MarkLogic via this increasingly popular query language.
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.Request a Demo