Gartner Cloud DBMS Report Names MarkLogic a Visionary

Choices In Enterprise Data Infrastructure

If you’re an armchair student of public policy, you know that infrastructure (transportation, communication, etc.) really matters in any economic growth discussion.

Invest in the right kind of infrastructure — at the right time — and you get great results. Make a poor choice, and the results are usually obvious as well.

If all of our organizations are trying to get more data and information literate — inform better decisions, re-engineer processes around connected data, etc. — there’s going to be some infrastructure involved.

For decades, I’ve watched the skirmishes play out between thirsty data consumers and drought-stricken IT groups trying their best to supply them.

Their thirst is easy to understand: give them the data to answer their questions, and they’ll quickly start asking harder questions that require, errr, more data. That thirst isn’t going away, nor should it.

IT is always in a tough place, but real-world experience has shown that some answers are better than others. The conceptual goal is to build better data infrastructure between people and data.

At risk of oversimplification, let’s take a look at different ways to build data infrastructure.

Here’s the Data, Come and Get It

woman carrying water on her head

The simplest approach is to run reports and/or dumps against production databases, and make them available to authorized downstream consumers. I think of it as asking people to carry water on their head.

Not exactly the most efficient way to get data into the hands of the people who need it, but — hey — if you’re thirsty, there’s water here.

From a pure efficiency perspective, there’s obvious room for improvement. Data consumers have to be aware of where the data is, how it was captured, how it’s structured. They have to invest effort to move data to another location, and then start massaging it to be usable.

This approach puts the onus entirely on the consumer, and that’s not ideal.

How About We Build You a Water Tank?

The search for a somewhat better answer led to the evolution of data marts, data warehouses, and the like. IT is willing to move the data on a regular basis to a place where you can work on it.

You can analyze more data that way, and people will stop bugging IT for ad-hoc data requests.

But as a consumer, you have no control over the data’s format, its cleanliness, or any other aspects — that’s your job.

water tank

Back to our water and infrastructure analogy, IT is willing to help you build a big tank — and periodically fill it with some sort of water. The rest is up to you, of course. Not ideal, but better than having to carry water.

Obviously, this pattern leads to many different and specialized water tanks, each aligned with unique missions. This results in a few, difficult problems.

First, it’s not efficient. Lots of data marts and warehouses, lots of tech, lots of effort, complexity, etc. Any long-term goal of simplifying and standardizing takes a serious hit.

Second, it can lead to poor outcomes. You now have multiple, disconnected “sources of truth” scattered throughout your organization. That makes it hard to make informed decisions around important things in your world: maybe customers, products, health outcomes, etc.

Modern Plumbing, Anyone?

modern kitchen faucet

What we’d like to get to is something analogous to modern plumbing: high-quality water, any temperature, use it any way you want, etc. Simply turn on the faucet.

If we dig a little deeper, there are some interesting aspects to this analogy. The consumer doesn’t have to care where the water is coming from: river, reservoir, rainfall, etc. The water is tested regularly, and delivered at sufficient quality for most purposes.

If you need something special, like distilled water for your newborn, the effort is minimal. If someone doesn’t like the shared service, anyone is welcome to dig a well, invest in pumps and filtration, etc.

The real benefit? As a consumer, I can get on with life without having to think much about water, or being thirsty. But there was some serious infrastructure that made it all happen.

Data Fabrics, Data Mesh, Data Pipelines, and More

Many of the newer memes in this space try to capture this infrastructure-oriented approach to making more and better data available, and presenting it to people in a way so they can easily consume and make better decisions.

As long as we’re thinking about the ideal data infrastructure along these lines, what would we put on our list?

I think we’d start by insisting that we could ingest, process, and add value to data no matter where it’s coming from, in any form.

The data should be immediately usable — to some degree — upon ingestion. Sure, we can add refinements later, but the notion of using data “where is, as is” — and not having to impose a format on it a priori — is very appealing.

People want to search and structure their data in different ways. Everyone has their own lens. It should be easy to build any lens you might need.

There’s the familiar rows and columns, documents, relationship graphs, geospatially, RDF triples, ontologies, and so on. Again, in an ideal world, why would you arbitrarily restrict people from looking at their data in a particular way?

Not everyone lives in spreadsheets. And the answers to life’s really interesting questions don’t typically live in spreadsheets. The really useful stuff usually lies in connecting scraps of data that weren’t intended to be connected.

Finally, this is real infrastructure. It has to be scalable, robust, recoverable, secure, auditable, etc., etc. When the town water system has a bad day, everyone has a bad day.

Data Infrastructure Can Be Fun

Perhaps one of the most interesting parts of my job is learning what people have done with modern data infrastructure.

It’s always the same pattern: what new and impactful things can now be done by simply cutting across and connecting multiple data sources quickly and efficiently? It’s fun to see the enthusiasm as the team now realizes they can drive a slew of cool new applications and really move the needle.

Just like modern plumbing has done for most of us.

Chuck joined the MarkLogic team in 2021, coming from Oracle as SVP Portfolio Management. Prior to Oracle, he was at VMware working on virtual storage. Chuck came to VMware after almost 20 years at EMC, working in a variety of field, product, and alliance leadership roles.

Chuck lives in Vero Beach, Florida with his wife and three dogs. He enjoys discussing the big ideas that are shaping the IT industry.

Start a discussion

Connect with the community

STACK OVERFLOW

EVENTS

GITHUB COMMUNITY

Most Recent

View All

Multi-Model: The Next Step In Database Technology?

Does your database just store multiple data types, or can it also serve them back to the business with agility and flexibility? Get some tips on things to consider when evaluating multi-model database technologies.
Read Article

What Makes Complex Data Different

How do you know when you have complex data, and why is it important? Industry veteran Chuck Hollis explains.
Read Article

Standardizing Internal Data Models on FHIR

Learn about MarkLogic's work on a FHIR-based standardized data model to support persisted payer data for our Medicaid Accelerators.
Read Article
This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.