HOAP, HTAP, Translytical, Huh? The Benefit Behind the Buzzword
I talk a lot about the benefit of having a unified data management strategy and platform. This was a key point in this year’s MarkLogic World talk, and thinking back, it has been a key point for us for years. There is a lot that goes into what makes a data management strategy unified, but from a very simple, high level, what I mean is being able to do a lot of different things in one platform.
There is a point solution for everything these days as it relates to data management. If an organization is using different point solutions or even different instances of the same point solution across their use cases they’re just creating more data silos.
Experts recommend the hybrid approach
There is a trend that is finally gaining attention as a good step in achieving a more unified approach: the ability to use a single platform for both operational and analytical use cases. Every analyst firm has a different term for it: HOAP (hybrid operational analytical processing), HTAP (hybrid transactional analytical processing), or translytical, to name a few.
451 Research – who uses the term HOAP by the way – notes the uptick they see in enterprises gravitating to hybrid workloads, along with an increase in the number of vendors developing products to satisfy the hybrid processing need. As an example of the benefit, 451 Research explains: “For instance, combing transactional systems with analytics does reduce IT complexity. HOAP can also address customer engagement and possibly increase sales by applying recommendations to incoming transactions.”
Gartner has also combined what was previously two Magic Quadrant reports – the Operational Database Management Systems (ODBMS) and the Data Management Solutions for Analytics (DMSA) – into one upcoming report, as use cases are increasingly unable to be segmented neatly into operational or analytical.
Everyone tells you to think of your data as a strategic asset. What does that mean? Put simply, it means curating and stewarding your data for multiple present and future use cases, so that your data integration efforts are focused more on solving your pressing business problems by building a durable data asset that can be leveraged across your organization than just by building another point solution to the problem at hand. Done well, this lets you spend less effort onboarding new use cases, and makes governance of the data much easier.
So how do we do this?
The model dilemma
At the end of the day, data integration projects are all about taking data from many siloed sources and providing that data to operational or analytic uses. That data comes with a model from its source systems. For example, data coming from an ERP system comes modeled in a way that is specific to that system and optimized for that system’s purpose. You can’t tell the ERP system to change the way it models, stores, and manages data just to make it more convenient for you to re-use it. You have to accept that data as it is. We call the model coming from the source system the source model.
Similarly, every use for data also needs to consume that data in a particular way. If you’re putting data into a data warehouse, that warehouse has a format and schema that the data has to conform to. If you’re sending customer profile data to a web page, that web application is going to need that data probably as a JSON object. If you’re analyzing data in a BI tool, most likely you’ll need to get that data into a table of rows and columns. We call the model that’s needed to serve the use case the consumption model.
It’s a hard problem to solve to have to transform many source models into many consumption models. There’s a lot of work you need to do to massage that data into a form that’s fit for purpose – and every time something changes or a new consumption model needs to be added to serve a new use case, you have to re-do a lot of that work throughout the system. It’s also hard to track how data got from multiple sources through all those transformations to those multiple consuming systems.
Enter a third model: the entity model
With MarkLogic Data Hub, we solve this by abstracting the source models from the consumption models with a third model, which we call the entity model.
Think of the entity model as a neutral model designed not for a specific use case, but to best represent how you think about your data in the business. This third model acts as a shield protecting upstream and downstream systems from the impacts of change – so if one day you have a new source model, or your source model changes, you can plug the new source model into your entity model without having to change any of the downstream systems that rely on data from your hub. It also gives you a single place to govern that data by attaching metadata to it. For example, defining what constitutes high-value data for your organization, and what policies attach to that data is much easier if you have a model that represents your business concepts, rather than a particular source or consumption format.
And of course because of MarkLogic’s flexible schema and indexing, you don’t have to spend years figuring out the perfect entity model for all your use cases. You can start by making a simple model that solves your most pressing business problem, and then you can grow and refactor the model from there – all while abstracting upstream and downstream systems from changes in the model. As your model grows and changes, you’re enriching and enhancing your data, which makes it more valuable both for current and future use cases. That’s the durable data asset.
A unified data management strategy
The entity model allows for what I put simply at the beginning of this blog – to do a lot of different things in one platform, going far beyond operational and analytical use cases.
Read 451 Research’s take on this in 451 Perspective: A HOAP-ful future for hybrid operational and analytical processing.
Click here for more information on MarkLogic Data Hub Service.