Don’t be offended by the play on words. Graph databases are very powerful and not generally involved in bribery. In fact, given its ability to discover fraudulent activity through the relationships it captures, a graph database is quite good at uncovering transgressions such as payoffs and other forms of corruption.
With that caveat aside, let’s explore why a graph database should not be the ONLY data-management technology for capturing various 360 views—especially for customers—in the context of GDPR and the recently enacted California Consumer Privacy Act (CCPA).
As my colleague, David Gorbet, wrote in a recent SC Media article, California Consumer Privacy Act: Challenge and Opportunity, CCPA is
considered the most comprehensive of any state privacy law, provides consumers with new rights, including a right to transparency about data collection, a right to be forgotten and a right to opt out of having their data sold.”
David goes on to discuss the importance of viewing data as an asset, inventorying it properly, centralizing governance policies and moving past point solutions.
Attempting to do all of this strictly with a graph database is not the right approach. As with highly normalized relational databases, collecting all there is to know about a customer and shredding it into a graph model is like taking apart one’s car and putting its thousands of pieces on shelves each time one enters their garage. Needless to say, the task of assembling the car for day-to-day use becomes expensive, tedious and unreliable (oops! forgot the brake liners).
A better approach for meeting regulatory requirements and reducing the risk of non-compliance is to implement a multi-model strategy. Such an approach incorporates document, relational and graph structures along with their respective query mechanisms, i.e., NoSQL document search, SQL relational access and SPARQL semantic/graph access. In fact, having the ability to leverage all of these access mechanisms in a single, complex query across all three data models simultaneously is a powerful feature for GDPR/CCPA solutions.
As described in David’s article and Companies: Lean into Consumer Privacy to Win (by another colleague, Ken Krupa),
It’s difficult to ensure trust and accountability in data when data is sourced from different silos and applied to many different use cases.”
Think of all the touchpoints an enterprise has with its consumers and the form in which those interactions are captured. For example:
Information is naturally kept in table form for transactions, document form for profile information and graph form for relationships that spider out from consumers to spouses, friends and other associations.
In a multi-model approach, pulling this information together in response to a customer request to “forget me” would be fulfilled first by performing a powerful document search. The documents (e.g., XML, JSON or free text) would contain much of the sought-after information and link to other information via graph structures.
Returning to the “car shredding/assembly” analogy, this would be like keeping the engine, transmission, wheels and body intact so as to retain their integrity as composite entities, but retaining the ability to reassemble them with “Transformer”-like agility (and coolness I might add) into a complete view of a car … or customer in our case.
A query that simultaneously performs a NoSQL search across documents, an SQL query against relational rows and a SPARQL query against semantic graphs gets all the data more reliably, which greatly reduces the risk of non-compliance. Also, by filtering first with search, it mitigates the need for a massive compute infrastructure required to rejoin customer data, at scale, when everything is stored in a graph model.
One final point. It’s possible to pull together the recommended solution with readily available technology components such as an open source NoSQL document database, relational database, search engine and graph database. But, integrating all of these fast-moving pieces into a reliable, enterprise-ready platform that accounts for security, data consistency, ACID transactions and overall governance is a formidable challenge.
MarkLogic’s Data Hub Platform addresses this challenge. As a multi-model database with NoSQL search, SQL access and SPARQL query features, it relieves enterprises of the burden to expend valuable technical resources on integration tasks and allows them to focus on higher-value business activities. MarkLogic’s Data Hub is a platform that can help an enterprise optimize resources, reduce risk and remain compliant with GDPR and CCPA regulations.
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
A data platform lets you collect, process, analyze, and share data across systems of record, systems of engagement, and systems of insight.
We’re all drowning in data. Keeping up with our data – and our understanding of it – requires using tools in new ways to unify data, metadata, and meaning.
A knowledge graph – a metadata structure sitting on a machine somewhere – has very interesting potential, but can’t do very much by itself. How do we put it to work?
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.Request a Demo