Building Your Next Data Platform: Choices Ahead
There’s a drawing I’ve seen literally dozens of times in customer conference rooms.
On one side, there’s a long list of potential data sources. Different applications, different data, different formats, different owners.
On the other side, there are one or more really important business processes that want to cut across a majority of that data in interesting ways.
And the team is trying to figure out that middle part. It can be a hard problem.
Some Simple Examples?
Let’s say a bank wanted to deliver a premium customer experience to their high net worth clients. At some point, you’d realize that you want to connect any scrap of information you had about that customer — in whatever format — to deliver a better service. Call it Customer 360.
But there’s a problem — you’re a full-service financial services firm with a diverse portfolio. And you also partner with others and their systems as well. That’s a lot of different information to try and connect in a useful way.
Or let’s say you’re in the business of researching new therapies across a broad portfolio. You’d want to present any researcher with useful, potentially relevant information, anywhere it could be found. But there’s a very long list of potential places.
Maybe your concern is spotting potential fraud in the context of multiple, complex transactions across many applications and systems.
In each case, you’ve got a strong incentive to connect data in ways it wasn’t originally intended to be connected.
So you go looking for alternatives. And — like anything else — patterns and anti-patterns quickly emerge.
Here’s How the Thinking Might Go
Obviously, there’s going to be work to do in transforming the raw data sources into something of value. The first question is — who does the work? If the sources of relevant data are few (and their owners are cooperative), some of the work can be pushed in that direction.
But as the number of relevant data contributors increases, you can’t count on everybody to shoulder your workload. You may also end up with a very complex, brittle, inflexible architecture that won’t react quickly when presented with new circumstances.
You decide a better approach might be to ingest all the different data as it’s given to you, add value to it in various ways, and then deliver it to the consumer(s) as they’d prefer to consume it.
Enter the world of data marts, data warehouses, data labs, and similar. At its core, there will be a data management platform of some sort.
Doing this sort of thing with tabular data (think rows and columns) and well-behaved “unstructured” data (think XML and JSON) is hard enough.
But — just for fun — let’s make the problem even harder?
Let’s throw in some really rich data. Real world documents. Emails with attachments. Geospatial tagged data. Structured lab notes. Logs and traces. Maybe even stuff that you’re not quite sure about, but it sure looks interesting!
An Important Fork In the Road
It is at this exact point that leadership will have to make a few important architectural decisions as to how they’re going to build their platform, especially as they consider richer data types.
One is to avoid dealing with more difficult data types until absolutely necessary. It’s seductive — we’ll just work on the harder stuff later, right? Not surprisingly, we see a lot of it.
One potential problem is that implementor priorities may not align with business priorities. It’s often the case that the difficult data is important as well, perhaps even more important.
And when you eventually have to deal with rich, complex data, you end up attempting to glue various data management components together. Remember, the goal here is to connect information *across* sources, and not only within them.
Some people get to the same place a different way. They start by looking for a collection of special-purpose databases (usually open sourced) that they intend to use as a toolbox. Some assembly will be required.
Choosing multiple best-of-breed data management tools means you’ve decided the potential for optimization outweighs the cost, complexity, and technical risk of assembling your own. And I can certainly name situations where that might make sense.
For everyone else, the idea of a single multi-model database that can add multiple forms of value to any type of data starts to look very appealing. Speed and agility might matter more than building the ultimate mousetrap.
But there’s one more important aspect to consider.
Are You Building the New Truth?
Let’s step back a moment, and take a look at a very successful pattern I’ve seen across multiple industries.
If you are seen as successful at building this new data factory, your baby becomes the new system of record — the One Source Of Truth — for all of the newer, high-value business processes that will be constructed on top of it.
The price of success? You’ll have to deliver a full production platform, along with everything that entails: scalability, availability, recoverability, advanced security, and more.
That’s another reason why the best-of-breed vs. universal platform decision is so important.
And That’s Where MarkLogic Comes In
Now that you understand the problem, hopefully you can better appreciate just what I find so appealing about the MarkLogic Server, and the ecosystem it brings with it.
It solves a very interesting problem in a very unique way that you’ll find in a surprisingly large number of places today, with more in the future.
Maybe even in your world?