The title of the blog had been so appealing: “No More Silos: How to Integrate Your Databases with Apache Kafka and CDC“—but then I got hit with a power question:
Why introduce a database into an architecture if we could use a streaming platform such as Kafka instead?”
Because, as a Solution Architect at MarkLogic, I have to say I’m quite partial to adding databases into architectures. And, the blog had already raised some very good points, like this one:
It’s important to challenge assumptions about how systems are built.”
“Yes,” I agreed. But surely not the we-need-a-database-here assumption.
But it’s a good question, isn’t it? In an event-driven architecture, why should systems get data out of databases when they can get it straight from Kafka? Wouldn’t it be easier, quicker and cheaper to cut out the middleman? Do you really need a database if you’re already streaming data into Kafka?
This question posed a bit of a problem for me. And as they say, a problem shared is a problem two people have, so I decided to share that problem with some fellow MarkLogic folks.
David Gorbet, Engineering SVP at MarkLogic, wasn’t ruffled by it. Although he agreed and stated that “a message-/event-based system is a smart way to go for many problems, and I think Kafka is a good technology to use for this,” he made it clear that for many architectures, a database is essential. That’s because if a database is used to harmonize siloed data (including your event messages) then:
If there’s ever a question about the data, you can use persistence and indexing of messages to enable traceability for operational issues. You may not need to keep all messages indefinitely, but you should be thinking about keeping them around and queryable for long enough to trace errors.”
And it’s not just spotting errors that a database within an event-driven architecture can help with:
It’s also going to prevent data inconsistencies that are inevitable with individual microservices having separate, overlapping data stores, significantly simplify the security architecture, and provide one place to secure and apply policy to data for things like it being fit-for-purpose for GDPR reporting, anonymization, etc., as well as being a way to track sources and uses of data.”
David left me with something to ponder:
It’s not that an architecture without a database for persistence is wrong, it’s just incomplete; it’s basically an application integration architecture, not a data integration architecture.”
Now, it turned out that Ken Krupa, MarkLogic’s VP of Global Solutions Engineering, had heard that question (“Do we need a database in an event-based architecture?”) before. He explained how he’d found that customers had been unable to get an agreed-to, trusted, comprehensive view of things from messages alone. In fact, one customer referred to it as being:
… like trying to reconstruct the chicken from the chicken soup.”
As Kafka becomes ever more popular, and more architectures that span the whole enterprise are built using event-based patterns, the question of why databases should be introduced at all looks as though it is set to be one that a lot of people are going to be asking.
However, I think the answer to this really boils down to one key factor: If you’ll ever need to get an answer about an entity as it exists across the business as a whole (and in a hurry), you’ll need a persistent, harmonized representation of it that you can easily and quickly retrieve (for example, via indexes).
And if you’re not sure if you’ll need a database? Just remember the chicken.