Gartner Cloud DBMS Report Names MarkLogic a Visionary

DaaS: Benefits and How to Do it Right

A few weeks ago, I blogged about Data as a Service – “DaaS” – what it is, pitfalls and why it is valuable. Today I’m following that up with strategies and techniques for getting it right.

My summary from the previous blog post

I defined DaaS as an architectural pattern where your most valuable services and data formats are developed first and intended for re-use over the long term. Some parts of the last blog that are good to remember here are the definition and expected value from DaaS.

DaaS Defined

DaaS is an approach to make data available whenever it is needed, and fits into the larger “SOA” Service-oriented Architecture design pattern. DaaS is an approach, within SOA, that values, shares and focuses on data.

DaaS Value and Key Attributes

    • High-value, re-usable data services
    • Secure (security model is embedded in the services)
    • Focused on developing valuable data formats and services

Some Things DaaS Is Not

  • Not merely the use of some techniques like SOAP, ESBs, Hadoop
  • Not an Enterprise Data Warehouse
  • Not tied to the database model, particularly not tied to a Relational E-R model

Success Factors for Data as a Service

On to the main point of this follow-up blog: What are some things you can do to have a successful Data as a Service implementation that includes a set of harmonized, high-value, durable data services?

Think First About Data Services – and Build a Team Around It

A key to success is to have the right team roles and focus. As a mentor from my early years pointed out, “every organization is destined to build their org chart into all their software systems.” For DaaS, this means that if you don’t define data service modeling as a separate role you won’t get Data Services as a high-quality deliverable.

Rather than have your System Architects and Designers have some notion of re-use and data services built into their day jobs, empower a separate group of Data Service Architects to models data formats and services, with an eye toward what will provide lasting value for the enterprise.

Focus on the “Wire Formats”

Wire formats are the data structures that integrate systems across your enterprise. They are usually XML messages that travel as REST or SOAP messages, but lately also include JSON and RDF payloads sent as REST messages.

Don’t build a comprehensive, relational model up front, in 3rd normal form. These models are complex, tightly coupled to a relational database, resistant to change and offer no abstraction as they are a physical model. Instead, focus on the payloads in data services. Big modeling up front is typical of an enterprise data modeling approach, which is slow to yield benefits and prone to failure.

Another way to think of it is that every wire format is a de-facto “Interface Control Document” (ICD) that specifies the contract between data providers and consumers, enabling stability as both systems evolve.

Re-Use & Recycle

Where possible, align your internal Data Service formats with industry and global standards such as Dublin Core, HL7, DocBook, NEIM, DDMS, and the like. These standards are compiled by working groups or companies who have done a lot of hard-fought data modeling for you. It will also be easier to transform your data into standard formats for integration with other systems or internal components if your formats are at least close to existing standard formats. Note that this does not mean building a full implementation of a complex standard, since many of your Data Services will only use a small subset of a larger standard.

Think About Metadata & RDF

Just as data formats and access patterns should be uniform, so should the ways you expose and query metadata (what is available, what formats exist, sources) and RDF (semantic data and relationships about your data). There’s a lot to talk about around metadata and RDF that won’t fit in this blog post, though.

Push Security Down to the Data Services

Don’t allow services to be exposed without security. Ultimately, that would be an obstacle to data sharing within your enterprise, complicate and weaken your enterprise security posture, and slow down applications at runtime by forcing the calling applications to all implement and filter data for security.

MarkLogic & DaaS

So those are some general tips for data modeling toward DaaS. But this is a MarkLogic Blog too, so here is how the MarkLogic Server product can be used to facilitate this.

Key features of what we now call “DaaS” have been baked into MarkLogic for over a decade. Things like built-in security, data transforms (using XSLT, JavaScript, XQuery), text search, geospatial search, alerting/monitoring, clustering, high availability, DR, elasticity, and data adapters are all included. At this point, MarkLogic natively stores almost any data format without mapping it to relational tables: XML, RDF, JSON, Text, and Binary formats, including metadata or text extraction adapters for most binaries. RESTful and SOAP services are provided out of the box to expose it all.

So as an organization starts to focus on the messages that move around the enterprise, and the wire formats that define those messages, MarkLogic becomes a cheap and appealing place to store that data, together with metadata, provenance and relationships. Better yet, MarkLogic persists it all natively with zero modeling.

This makes MarkLogic a natural component of most DaaS solutions, but by no means the only component. Existing, legacy systems, relational databases, and almost anything that contributes value and has data can and should be integrated into a DaaS architectural approach over time.

Simple Advice

To sum it up – create a team that is empowered to advocate for data as a valuable, secure asset that will last many years and is exposed in a coherent, secure way.

This team should focus on the “wire formats” of the messages flying around your enterprise – these data formats define how your data will be understood and re-used throughout your enterprise. The alternative is to allow direct access to underlying databases, which is complex, lacks a good security model, and quickly becomes an obstacle to change as dependencies accumulate between multiple applications and their underlying databases.

Solutions Director

Damon is a passionate “Mark-Logician,” having been with the company for over 7 years as it has evolved into the company it is today. He has worked on or led some of the largest MarkLogic projects for customers ranging from the US Intelligence Community to HealthCare.gov to private insurance companies.

Prior to joining MarkLogic, Damon held positions spanning product development for multiple startups, founding of one startup, consulting for a semantic technology company, and leading the architecture for the IMSMA humanitarian landmine remediation and tracking system.

He holds a BA in Mathematics from the University of Chicago and a Ph.D. in Computer Science from Tulane University.

Start a discussion

Connect with the community

STACK OVERFLOW

EVENTS

GITHUB COMMUNITY

Most Recent

View All

Digital Acceleration Series: Powering MDM with MarkLogic

Our next event series covers key aspects of MDM including data integration, third-party data, data governance, and data security -- and how MarkLogic brings all of these elements together in one future-facing, agile MDM data hub.
Read Article

Of Data Warehouses, Data Marts, Data Lakes … and Data Hubs

New technology solutions arise in response to new business needs. Learn why a data hub platform makes the most sense for complex data.
Read Article

5 Key Findings from MarkLogic-Sponsored Financial Data Leaders Study

Financial institutions differ in their levels of maturity in managing and utilizing their enterprise data. To understand trends and winning strategies in getting the greatest value from this data, we recently co-sponsored a survey with the Financial Information Management WBR Insights research division.
Read Article
This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.