We’ve joined forces with Smartlogic to reveal smarter decisions—together.

Beware of “Graft” on GDPR and CCPA

Don’t be offended by the play on words. Graph databases are very powerful and not generally involved in bribery. In fact, given its ability to discover fraudulent activity through the relationships it captures, a graph database is quite good at uncovering transgressions such as payoffs and other forms of corruption.

With that caveat aside, let’s explore why a graph database should not be the ONLY data-management technology for capturing various 360 views—especially for customers—in the context of GDPR and the recently enacted California Consumer Privacy Act (CCPA).

As my colleague, David Gorbet, wrote in a recent SC Media article, California Consumer Privacy Act: Challenge and Opportunity, CCPA is

considered the most comprehensive of any state privacy law, provides consumers with new rights, including a right to transparency about data collection, a right to be forgotten and a right to opt out of having their data sold.”

David goes on to discuss the importance of viewing data as an asset, inventorying it properly, centralizing governance policies and moving past point solutions.

Attempting to do all of this strictly with a graph database is not the right approach. As with highly normalized relational databases, collecting all there is to know about a customer and shredding it into a graph model is like taking apart one’s car and putting its thousands of pieces on shelves each time one enters their garage. Needless to say, the task of assembling the car for day-to-day use becomes expensive, tedious and unreliable (oops! forgot the brake liners).

A better approach for meeting regulatory requirements and reducing the risk of non-compliance is to implement a multi-model strategy. Such an approach incorporates document, relational and graph structures along with their respective query mechanisms, i.e., NoSQL document search, SQL relational access and SPARQL semantic/graph access. In fact, having the ability to leverage all of these access mechanisms in a single, complex query across all three data models simultaneously is a powerful feature for GDPR/CCPA solutions.

As described in David’s article and Companies: Lean into Consumer Privacy to Win (by another colleague, Ken Krupa),

It’s difficult to ensure trust and accountability in data when data is sourced from different silos and applied to many different use cases.”

Think of all the touchpoints an enterprise has with its consumers and the form in which those interactions are captured. For example:

  • Orders for purchases are likely captured in several relational databases of transactions spanning the enterprise.
  • Profile information is likely kept in several document databases.
  • Householding information, i.e., relationships to a spouse, children or friend, could be kept in a graph database.

Information is naturally kept in table form for transactions, document form for profile information and graph form for relationships that spider out from consumers to spouses, friends and other associations.

In a multi-model approach, pulling this information together in response to a customer request to “forget me” would be fulfilled first by performing a powerful document search. The documents (e.g., XML, JSON or free text) would contain much of the sought-after information and link to other information via graph structures.

Returning to the “car shredding/assembly” analogy, this would be like keeping the engine, transmission, wheels and body intact so as to retain their integrity as composite entities, but retaining the ability to reassemble them with “Transformer”-like agility (and coolness I might add) into a complete view of a car … or customer in our case.

A query that simultaneously performs a NoSQL search across documents, an SQL query against relational rows and a SPARQL query against semantic graphs gets all the data more reliably, which greatly reduces the risk of non-compliance. Also, by filtering first with search, it mitigates the need for a massive compute infrastructure required to rejoin customer data, at scale, when everything is stored in a graph model.

One final point. It’s possible to pull together the recommended solution with readily available technology components such as an open source NoSQL document database, relational database, search engine and graph database. But, integrating all of these fast-moving pieces into a reliable, enterprise-ready platform that accounts for security, data consistency, ACID transactions and overall governance is a formidable challenge.

MarkLogic’s Data Hub Platform addresses this challenge. As a multi-model database with NoSQL search, SQL access and SPARQL query features, it relieves enterprises of the burden to expend valuable technical resources on integration tasks and allows them to focus on higher-value business activities. MarkLogic’s Data Hub is a platform that can help an enterprise optimize resources, reduce risk and remain compliant with GDPR and CCPA regulations.

Learn More

Michael Malgeri - Principal Technologist | MarkLogic

Michael Malgeri is a Principal Technologist with MarkLogic. He works with companies to match their business requirements with MarkLogic’s enterprise NoSQL database and semantic features. He helps organizations reduce costs, automate processes, find new opportunities and create applications that bring high value to businesses and their customers. Michael focuses on the media and entertainment industry, where content providers, distributors and related companies are seeking to leverage the power of data in order to capture new opportunities driven by expanding global information consumption.

Michael holds Master’s Degrees in Computer Science, Business and Mechanical Engineering. He's been a Certified Project Management Professional since 2011.

Start a discussion

Connect with the community




Most Recent

View All

Facts and What They Mean

In the digital era, data is cheap, interpretations are expensive. An agile semantic data platform combines facts and what they mean to create reusable organizational knowledge.
Read Article

Truth in ESG Labels

Managing a portfolio of investments for your client has never been simple - and doing so through an ESG lens raises the complexity to an almost mind-boggling level. Learn the signs your team has hit the wall with current tools - and how a semantic knowledge graph can help.
Read Article

4 Signs You’ve Got a Transaction Reconciliation Challenge

Many firms manage transaction reconciliation using smart people armed with spreadsheets - but that doesn't scale well. Learn what to look for, to know if you're creating new forms of risk for your firm.
Read Article
This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.