MLW Preview: Introducing the Data-Centered Data Center
I had a chance to catch up with David Gorbet, VP Engineering at MarkLogic about his upcoming keynote at MarkLogic World. He might be an engineer, but he is an engaging and colorful speaker – who likes to show as well as tell. His talk will be about changing the approach to data management from application-centric to data-centric. The result? The Data-Centered Data Center.
That mouthful is a sea-change. “Today’s way of developing is to build out a database to power applications,” David begins. “This database needs a schema, and that schema is optimized for the application. To build this schema, you need to understand both the data you’ll be using, and the queries that the application requires. So you have to know in advance everything the application is going to do before you can build anything. What’s more, you then have to ETL this data from wherever it lives into the application-specific database.
“Now, if you want another application, you have to do the same thing. Pretty soon, you have hundreds of data stores with data duplicated all over the place. Actually, it’s not really duplicated, it’s data derived from other data, because as you ETL the data you change its form losing some of the context and combining what’s left with bits of data from other sources. That’s even worse than straight-up duplication because provenance is seldom retained through this process, so it’s really hard to tell where data came from and trace it back to its source.”
The Data-Centered approach is to focus on the data, its use, and its governance through its lifecycle as the primary consideration. It’s architected to allow a single management and governance model, and to bring the applications to the data, rather than copying data to the applications. With the right technologies, you can build a data-centered data center that minimizes all the data duplication, gives you consistent data governance, enables flexibility both in application development over the data and in scaling up and down capacity to match demand, allowing you to manage your data securely and cost-effectively throughout its lifecycle.
So without giving the entire talk away, here are some key elements to David’s DCDC.
- Change your mindset. Think about the data as the center of everything.
- Get the right technology stack; functionality for transactions, search and discovery, analytics, and batch computation with a single governance and scale model. You need a storage system that gives great SLAs on high-value data and great TCO on lower-value data, without ETL. You need the ability to expand and contract compute power to serve the application needs in real time without downtime, and to run this infrastructure on premises or in the cloud.
- Manage data throughout its lifecycle.
- Make sure technology is Enterprise-grade including ACID
- Rev up the UX team*
David’s keynote is on Wednesday, April 9 in San Francisco.
*They will be building so many applications once your architecture is data centered!