In the past few years, the number of organizations using Hadoop—or contemplating using it—has grown astronomically. Each organization has common questions about whether they are really ready to implement Hadoop, and what the best practices are for being successful.
For these reasons, TDWI developed an Online Hadoop Readiness Assessment and Guide to help organizations as they start working with Hadoop. TDWI is an organization that provides research and advice for everything data related. The Assessment they created is free and it provides a great way to analyze each dimension of readiness, including organizational readiness, Big Data readiness, data management readiness, analytics readiness, and IT readiness.
Example of how the TDWI Hadoop Assessment scores results
One of the initial challenges that people have when getting started with Hadoop is simply navigating the myriad of components that have popped up in recent years. I was at the Strata Hadoop conference in New York a month ago and based on what I saw, I can understand the confusion around Hadoop with all of the crazy names being advertised: Mahout, Ambari, Avro, Datafu, Oozie, Tez, Chukwa, Trafodion, etc.
A few of the more popular Hadoop projects shown here
The quickly changing landscape of the Hadoop ecosystem is what makes Hadoop planning ever more critical today. Hadoop is no longer just HDFS and MapReduce (MapReduce seems to actually be fallign quite a bit in popularity), but a family of tools that all fall under the broad umbrella of Hadoop and are at various levels of maturity ranging from “University lab side-project” to production use at large companies.
We need resources to navigate the growing complexity in the Hadoop ecosystem
There are many customers that we talk to that are already using Hadoop, and so the question comes up quite frequently, “Why do we need MarkLogic if we’re already using Hadoop?”
To put it simply, MarkLogic provides an enterprise-class, operational database and Hadoop does not. Hadoop has many benefits, but it currently lacks some enterprise features that organizations require for production environments (e.g., Hadoop does not have robust security, and it does not carry the necessary integrity constraints for ACID transactions).
Typically, customers rely on MarkLogic to provide a persistent, operational database for low-latency transactions and they use Hadoop as a low-cost place to store data and do batch analytics. Integrating both systems is quite easy because there is a MarkLogic connector for Hadoop. And, there is a lot of parity in how MarkLogic and Hadoop handle data, and both systems actually rely on MapReduce for loading data and doing analytics.
Customers such as KPMG, McGraw Financial, and a top investment bank have all found this division of labor between MarkLogic and Hadoop to work quite well. Below is a graphic that shows at a high level how these customers are using MarkLogic and Hadoop. Actual production system vary greatly due to the number of different Hadoop components, but the general architectural pattern is shown here—MarkLogic is the database, and Hadoop provides a low-cost storage option for structured and unstructured data. More info on MarkLogic and Hadoop can be found here.
The MarkLogic Connector for Hadoop provides a seamless integration
So, with that introduction, we encourage you to try out the online TDWI Assessment Tool, download the Guide, and see whether your organization’s readiness for Hadoop.
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
A data platform lets you collect, process, analyze, and share data across systems of record, systems of engagement, and systems of insight.
We’re all drowning in data. Keeping up with our data – and our understanding of it – requires using tools in new ways to unify data, metadata, and meaning.
A knowledge graph – a metadata structure sitting on a machine somewhere – has very interesting potential, but can’t do very much by itself. How do we put it to work?
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.Request a Demo