A large customer of ours is standardizing on Data as a Service (DaaS) for systems development throughout their enterprise. Their plans are great, but I see confusion even within their own team on what, exactly, Data as a Service is and what benefits it is supposed to provide, so I’m writing this blog post to clarify the what, the how and the why of DaaS.
MarkLogic is an Enterprise NoSQL database platform that natively exposes REST or SOAP services (and has for over 10 years) — you could say MarkLogic was DaaS before DaaS was cool. As a result, we have a lot of experience with how to do (and how not to do) DaaS, and what benefit to expect.
What is Data as a Service?
DaaS is one of the new “as a service” approaches, that abstracts some complex, costly software tasks to make it easier to manage and more cost effective. However, most “as a service” offerings, such as SaaS or PaaS, focus on shrink-wrapped, generic services such as human resources software, CRM software, or relational SQL persistence. These are arguably well-understood, commodity services.
In contrast, data is the most valuable thing in an enterprise, and data services should be customized and designed to meet the individual needs of your company or organization. There is therefore a lot of design and thinking needed to incorporate DaaS into your enterprise, and the data services you create must be designed and refined so they support your specific operations.
DaaS is an approach to make data available whenever it is needed, and fits into the larger “SOA” Service-oriented Architecture design pattern. DaaS is an approach, within SOA, that values, shares and focuses on data.
The DaaS approach is in contrast to starting with higher level “business services,” and providing data to those services as an afterthought through whatever means are convenient. Paradoxically, your real business needs are better served in the long term by understanding and modeling the data, rather than focusing on so-called “business services” at the expense of your data services.
My client had a lot of this right – they released a visionary architectural guidance document specifying how data services would be defined and released. Yet they were in danger of encountering some of the failure modes I’ve listed below – which I believe is due to the fact that they lost track along the way of why they were taking a DaaS approach and what benefit it is supposed to provide.
Why Data as a Service?
First of all – creating a bunch of services, which move data around, does not constitute DaaS, unless they are designed to yield certain benefits. Here are key benefits that motivate and define DaaS:
Valuable, re-usable and uniform
Data Services in a DaaS environment should have value across multiple projects, and the value of the data formats and data services should be designed to both outlast and exceed the value of the particular systems that first use the data services.
Security, in particular, must be uniform and ubiquitous. It is a barrier to adoption if some underlying systems use different security models. Different groups will not share data without built-in security, and too much data without controls becomes a privacy and compliance risk.
Virtual data and abstraction
Data Services should abstract away from underlying data stores and locations, including “silo busting” combinations of data from multiple sources in multiple formats, presented seamlessly as one service.
How to Fail at Developing DaaS
The biggest danger to the successful adoption of a new paradigm is our existing expertise that has worked well for decades – our knowledge about the old paradigm. Let’s talk about what not to do first, and then next week I’ll write another post on success factors.
Do not focus on the plumbing (enabling technologies)
Think about what would happen if you hired a plumber as the architect for your house. You’d likely end up with pipes, valves and other exposed internals running through your living room – complicating and cluttering, rather than making your house livable. Plumbing should be hidden and transparent, and invisibly enable your structure to function.
DaaS is an architectural pattern. Most developers know how to add SOAP services with WSDL definitions or REST calls, and passing XML (or JSON or RDF) around. This technique may be necessary, but doing so gratuitously does not help create a DaaS architecture. Focus on data formats and service definitions, not the protocols and technologies used to expose and wire them up.
Don’t Confuse DaaS With Cloud
Just as plumbing is an enabling technology, cloud computing is an infrastructure approach. DaaS is about the architecture, so must focus on how data is formatted and transmitted, and the interfaces between subsystems. What servers a system runs on is very important, but should not be confused with the DaaS pattern. Yes, you can put a server hosting DaaS services in the cloud. No, you don’t have to.
Understand Why DaaS is Not an Enterprise Data Warehouse
EDW efforts often fail because of modeling complexity. DaaS is more agile in that you can roll out individual services without modeling your entire enterprise first.
A “big design up front” modeling exercise that involves underlying databases and E-R diagrams will have the same failure modes as a large Enterprise Data Warehouse. Which is to say: many.
Forget about relational modeling (at least at first)
In DaaS, the service formats are king – aka the “wire” formats used to integrate components in your enterprise. The point is to abstract away from which underlying system or systems participate in serving the request, including abstracting away from your relational database and its physical model. The underlying systems could be sets of SOAP services from COTS products, relational databases, search engines, NoSQL databases, or triple stores.
Well – don’t completely forget it. Just don’t let it drive your service modeling activities. Great service modelers tend to be involved in standards bodies, XML standardization, perhaps object modeling, but not (primarily) E-R modeling or 3rd normal form.
Minimize data movement
Re-think what “business services” are and classify them as data services if they filter, join, transform, validate, copy, export or batch-process a lot of data. This will allow data-intensive tasks to be performed “close to the data” and realize benefits of data locality. In other words, push data-intensive tasks down to your data services, rather than putting that burden on the business services.
Don’t let data service development be anyone’s second job
A mentor of mine once pointed out that “every organization is destined to build an enterprise architecture that mirrors their org chart” and I have found that to be absolutely true.
If you let your business service modelers, developers or DBAs define your data services, you will end up with services that are only good for the immediate task at hand, and do not provide the lasting value and abstractions you need for a good DaaS architecture.
Instead, empower a team to own the data services and take a stand for clean data services that yield lasting value. That debate and negotiation will improve your entire enterprise.
MarkLogic’s History With DaaS
I started out with the heady claim that MarkLogic has been doing DaaS since before DaaS existed as an identified pattern. As DaaS becomes popular, MarkLogic customers are saying to themselves “Oh, that’s what we’ve been doing all these years.” Why were MarkLogic customers so far ahead of this trend?
It is not that our customers – or MarkLogic for that matter – identified DaaS a decade ago and advocated for it as such. MarkLogic was focused on handling “any data, in any schema” at massive scale. But what happened is that we started to work with customers who had large volumes of data streaming in – often in XML or JSON formats defined by standards bodies. Our customers quickly learned that the overhead, delay and expense of mapping and re-formatting that data to relational schemas yielded no value, so they put the wire format data directly into MarkLogic and moved on.
We started out with a large footprint in publishing, and saw a lot of DocBook. Then Microsoft started to use OfficeXML internally in its .docx, .pptx, and xlsx formats, so we stored that. Then the financial industry adopted FpML and FIXML – again we handle it natively. DoD uses DDMS and other government agencies use NIEM. You get the idea – it turns out almost everyone has an existing data standard for much of their data nowadays.
Voila. MarkLogic’s customers focused on the data that is used to communicate and integrate, no longer spending time or money reformatting data for relational, persistent storage, and that amounts to “DaaS before DaaS was cool.”
But that bit of history veers off into the benefits of DaaS and how to do it right – which is the topic for my next post in a week or two.