A Data Integration Wizard Talks About Microservices, Containers, and Data Governance
Companies large and small are looking for quick, flexible and safe solutions to integrate their data. Many are now turning to microservices and containers to help them achieve data governance.
As one of Marklogic’s “Data Integration Wizards” involved in six large-scale data integration projects, Puneet Rawal, a Principal Sales Engineer at MarkLogic, has some great insights to share about the latest trends in data integration. I had a chance to sit with him as he prepared his talks for the upcoming MarkLogic World Conference, May 7-10.
What does a typical data integration project look like for you?
Some of the organizations I work with have been using MarkLogic for many years and are now branching out into new use cases. Others, like a large bank I’m working with, are relatively new to using MarkLogic.
MarkLogic veteran-customers, on the other hand, often have established best practices and frameworks. It’s exciting to work with the newer customers though, because they get to start fresh with the newest tools available.
Every organization I work with, regardless of industry, faces similar challenges. They have a huge diversity of data, and they need a hub that they can use to integrate and leverage it.
How does MarkLogic help solve these data integration challenges?
These organizations are solving really difficult data integration challenges. A multi-model approach and flexibility with data modeling has huge advantages. It’s easy for these organizations to adopt and deploy applications, especially large organizations that need to get value really, really quickly.
If I was working for a big relational vendor doing the same sorts of projects, I might be working on the same project for years at a time. Fortunately, with MarkLogic, customers only have to put up with me for a few months before they see their first project in production.
Can you give an example of an organization that transitioned from relational to MarkLogic?
I can give you numerous examples of relational to MarkLogic transitions. But, here’s one good one:
I was recently speaking with an architect I work with at a large manufacturer. They tried using Oracle to integrate the data but after two years they hadn’t integrated anything. He said that if they continued down the same path with relational, they would complete the integration project by the time he retired. He is currently in his mid-thirties.
That organization, a multi-billion-dollar manufacturer, had over 200 ERP systems. Each of those systems had 2,000 to 4,000 domains and sub-domains—sales, purchase orders, inventory, etc. They needed to integrate those ERP systems to get a unified picture of their financial situation. For reporting and auditing in particular, they needed a unified view.
With MarkLogic, on the other hand, they were able to get their first project into production within seven months. That’s the typical data integration project I work on.
Most customers are facing the same data integration headaches—so, do all MarkLogic implementations look the same?
Each project I work on differs somewhat based on the organization’s level of NoSQL experience, and how willing they are to adopt a new paradigm.
For example, that large manufacturer I mentioned was already in the midst of tackling their data integration challenge with Oracle. Other organizations, like another large bank I work with, realized that relational wasn’t the answer and some of their developers had already started down a path using MongoDB.
With that bank, it was actually easier because they had already acknowledged they needed a change and had embraced a document model. Now they just needed the right technology for the job.
After some sessions with their developers to walk through how MarkLogic compared, they eventually came around to seeing how MarkLogic was a huge improvement over MongoDB. For example, they realized they could stop worrying about ACID transactions and could adopt a simpler architecture with built-in search and more robust disaster recovery and security.
We’re hearing a lot about microservices, do you see that as a big trend in the enterprise? What other trends do you see?
Yes, absolutely – using microservices is a huge trend. It allows large teams working on large projects to break things down into bite-sized chunks. It puts the business problem first and forces you to think about the interfaces. It’s more agile.
To aid the use of microservices, I’m also seeing the adoption of containers. Using containers is an enormous paradigm shift, and few technologies have seen such fast adoption. In 2016, over half of companies were investing in containers. Today, over two-thirds of companies are investing in containers.
And, then, to make containers easier to deploy, I’m seeing the use of cloud frameworks. For example, Cloud Foundry is a PaaS that uses a containers-based architecture and aids the rapid development and deployment of applications (i.e. continuous delivery). Cloud Foundry runs on all the leading cloud providers and is having a really positive impact on the efficiency of developers at large organizations.
Is MarkLogic compatible with microservices and containers?
At MarkLogic, we’re excited about microservices and containers. We believe containers and microservices are the paradigm for modern architectures. And, both work well with MarkLogic.
Currently, you can use the leading container technology, Docker, with MarkLogic. We provide full support to our customers that want to use Docker in development and test environments. MarkLogic also supports developing microservices, which of course is more of an approach than a technology.
What are the typical microservices architectures you see?
For the customers I work with, I see two approaches, both of which are valid.
- The first approach is to simply use MarkLogic as part of an independent micro-service.
- The second approach is to use MarkLogic as an Operational Data Hub that has all microservices pointing to it.
In supporting an independent microservice, MarkLogic can be used to handle daily operations. Imagine a stock portfolio that changes in value, and you need a process that updates the value of the portfolio every day. Values get stored in MarkLogic, which just acts as an independent storage layer. If one part of the architecture fails, it’s easy to spin it up again.
The Operational Data Hub approach is much more interesting because the hub is used to integrate data from many sources and then data is accessed through microservices .
The organizations I work with may want to store all of their HR data together. Or their trade data. Or insurance claims data. All of that data comes into MarkLogic from upstream sources and is harmonized in MarkLogic. Then, that data is accessed by downstream systems for analysis, reporting, and other applications. For all of those connections, you can use microservices. Here is an interesting blog about different microservices models that can be deployed with MarkLogic.
With this architecture, some customers might use an API management platform like Mulesoft to help manage the connections to MarkLogic in an Operational Data Hub. Some customers may offload data to Hadoop. Some might use Splunk for log consolidation.
Regardless of the different component technologies used, the important thing is to organize around agility and the idea of “API-first.” Get value out of the data as soon as possible and build the APIs to meet that initial business need. You can focus on features and performance in your apps later.
But, that doesn’t mean sacrificing data governance I hope?
Correct, data security – and data governance more generally – is very important for all customers. With most customers I work with, tools and processes were already in place to make sure data is high in quality, sensitive data is protected, and the right people have access to it.
For example, one customer uses an IBM Data Catalog that has an information analyzer and policy enforcement engine. It’s good because it helps consolidate data governance under one umbrella. Our goal with MarkLogic is not to disrupt that infrastructure. We just work to enhance it and make it data governance easier so that policies managed by that tools are easier to enforce in the MarkLogic data layer.
If you want to hear more about large data integration projects, Puneet will be speaking in two different talks with Aetna and Northern Trust at our upcoming MarkLogic World event in San Francisco on May 7-10, 2018.