A database platform captures information, and serves it up for later use.
Database tech has historically evolved by data type. The need to manage tabular data gave us relational models. Map coordinates needed a geospatial model. Hierarchies wanted a graph representation. Documents demanded, well, a document model. And so on.
As applications were created, they were built around the data types they required.
Modern cloud-native applications use a collection of microservices, each encapsulating its own data, with each using its own data types and database as needed. Developers will select a database technology based on whatever simple types are needed.
Applications built this way lend themselves to divide-and-conquer, which can result in faster deployments and simplified maintenance. A change in one microservice doesn’t have to impact other ones. Yet no good deed goes unpunished.
From an operational perspective, these can be complex beasts with many moving pieces, built with a smorgasbord of data management technologies. In this regard, they are not that different from the complex legacy application landscapes that preceded them.
Nor do these “polyglot persistence” applications do any favors for business users looking to get a cross-functional view of data. Source data must be extracted from each data store, assembled, harmonized, etc. Again, not that dissimilar from our legacy application landscapes.
Multi-model databases arose as an answer to one — or sometimes both — of these problems.
Multi-model databases can store multiple data types in a single instance, avoiding the operational complexity that inevitably results from having many things to be responsible for.
Without multi-model, each specialized data store has to be learned, configured, monitored, secured, and optimized. What might have sounded simple at the outset quickly becomes a can of worms as the numbers grow.
Multi-model becomes attractive in this context, as the operations team can use a single data store for multiple data types. Developers can continue to encapsulate data inside their microservices.
All other things being equal, this can be a win.
Don’t forget, we haven’t done anything for the business users with this approach. They will still find it difficult to connect multiple data types from multiple sources, whether it’s stored in a multi-model database or not.
To better understand what business people need — and how a multi-model database can help them — we have to take a quick step back.
Imagine you were researching your family, and had created a collection of index cards, each with information on known relatives. There’d be a lot of different ways to look at all of them.
You could arrange them as a graph if you wanted to see a family tree. You could ask for a list of names and addresses — much like a relational database. If there are photos or videos of them, that’d be nice to have as well, maybe for creating a new video.
You could plot addresses on a map, and see who lives next to each other — a geospatial representation. You could keep a small log on each index card: when it was created, last updated, by who and so on, introducing a small measure of provenance. You might even keep older versions of cards, just in case.
The data on each card could be connected in any variety of ways, depending on what you’re looking for. And you’d use a different schema for each. A “schema” defines how we’d like our raw data to look: a phone number, an address, a point on the map, etc. It defines a human lens to what ultimately is a bunch of 1s and 0s.
What you’d want to avoid is having to make yet another copy of all your index cards depending on the specific data and the type of connection you wanted.
You don’t want to have to make a copy of everything just to view the family tree. Or another copy to get a list of names and addresses. Yet another copy to organize the media. Another copy again to view a map. Every time you wanted to make new connections, you’d make another copy.
Does this sound inefficient? It is. Complexity is being pushed to the specific use case — and user. This is exactly what can happen in enterprise IT: everyone wants a distinct view, so yet another copy is made.
Wouldn’t it be great if everyone could work off of essentially the same “golden copy” of data? And just serve it up as they’d like?
Let’s take our family research project to the next step. Imagine a website with literally hundreds of millions of people researching their family history, each wanting information from a vast number of sources.
And you are now in charge of the application architecture.
You’d have to be able to ingest data in any form, serve it up in any form, and let people make any connection that might be relevant to them.
Every user might have a distinct interpretation of how they’d like to view their data and its connections, for example — do adoptions count?
This same data flow pattern — multiple data sources, multiple data types, multiple users, multiple lenses — is not at all uncommon. A multi-model database can potentially provide the required agile yet robust platform required.
Interestingly enough, all database vendors claiming multi-model can store effectively, but not all can necessarily serve effectively. Distinctions do matter.
I don’t want to pick on the Oracle Database here, but it’s familiar to many. Oracle can easily store multiple data types quite effectively, so it’s multi-model. Serving that same data back to the business with agility and flexibility — not so much. Especially if they’re looking for connections across multiple data types.
To be fair, I can imagine use cases where a multi-model database that stores well — but does not serve well — might make sense for a particular, isolated application. If there is limited business interest in the underlying data, why bother?
My answer would be that “things change.” All data is potentially useful in a different context. As an example, nobody thought email and messages were important — until they were. Given that application architectures tend to outlast the people who create them … it’s something to consider.
One tranche of multi-model database technologies are appealing to developers. They want to do what needs to be done quickly, preferably with a tool that can be used on the next gig. If you’re developing a new application, speedy development is always popular.
Another tranche of multi-model database tools might not be the developer’s first choice, but are appealing to the operations team. They want a simpler, better, easier-to-manage footprint, so a win there.
A third, somewhat smaller tranche might be called the perfect trifecta: business users get data agility, developers get simplicity, the operations team gets a simplified footprint.
If you have multiple business functions at the table — or expect them before long — best to consider the third option.
Giving business users what they need usually leads to all sorts of great outcomes: new and better offerings, improved customer engagement, better risk mitigation, and more.
We at MarkLogic would be pleased to share what we can do with complex data. Including that genealogy website, among other interesting things.
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
A data platform lets you collect, process, analyze, and share data across systems of record, systems of engagement, and systems of insight.
We’re all drowning in data. Keeping up with our data – and our understanding of it – requires using tools in new ways to unify data, metadata, and meaning.
A knowledge graph – a metadata structure sitting on a machine somewhere – has very interesting potential, but can’t do very much by itself. How do we put it to work?
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.Request a Demo