In the second blog post in our Operational Data Hub (ODH) series, we discussed what technical debt is and how it manifests. Now, let’s talk about some root causes of technical debt and how the ODH helps solve it.
When it comes to the technical debt associated with integrating data from silos, there are often many seemingly unrelated business problems that arise from a common root cause. Consider the following two examples that most companies can likely relate to:
In the above case, the root cause lies with limitations of modeling with a relational database management system (RDBMS), where schema modeling is a prerequisite activity to development. Moreover, as shown in the next example, because nearly every model change in a relational database is often accompanied by non-trivial code and back-testing changes, modelers attempt to design schemas that account for as many scenarios as possible, potentially making the modeling exercise very complex and time-consuming. In many cases, due to complexity, compromises are made in the modeling process in an attempt to meet a deadline or otherwise “save time.”
So how can the ODH solve issues like those above? Well, even though ODH implementations may vary from one to the other, they all have in common certain foundational principles that address data management challenges.
Use of document/object models to represent business entities. Self-describing documents (such as XML or JSON) are a natural way to represent business entities or objects. They do not suffer from the so-called “impedance mismatch” associated with object-to-relational mapping and come with many benefits such as:
a) The ability to treat schema as data, given that every payload may also include schema information. This is what gives schemas and models the same level of agility as the data itself.
b) The ability to allow for multiple models that represent the same class of business entity. For example, multiple systems may model customer data in different ways. In an ODH architecture, all of those models may be represented concurrently.
c) The ability to store metadata and data together. This allows provenance and lineage to be captured and provides a strong foundation for data governance.
Data harmonization. Most approaches to integrating disparate data models involve coming up with a new model followed by attempts to “force fit” (by way of ETL) all source data into the new model. Data harmonization, on the other hand, starts with the premise that all source models are not only valid, but also valuable, and hence should be retained as is in an integrated context. These source models are then leveraged to create an integrated canonical model (or models) on an as-needed basis, all the while recording valuable provenance and lineage metadata inside the ODH itself. The result is that instead of a lowest-common-denominator subset of integrated data, the ODH creates an agile superset of the source data.
Use of semantic RDF triples to represent relationships. The Resource Description Framework (RDF) is a set of W3C standards for representing machine-readable concepts about things and relationships between things. It also forms the basis for the concept of the Semantic Web. The unit of representation is called a triple, which consists of a subject, a predicate and an object, collectively comprising a fact/concept or a relationship (e.g., “Euro typeOf currency”). In an ODH, RDF triples provide a myriad of capabilities with respect to managing data and the complexities of data integration.
Indexing to support real-time ad hoc queries and searches. Unlike a data lake that depends on subsequent brute-force processing for data querying, an ODH indexes all data on ingestion to ensure that data is “queryable” as soon as it is loaded.
Support for bidirectional data access. Unlike patterns that support either “run-the-business” or “observe-the-business” functions, the ODH supports both. By allowing real-time updates with transactional support, alongside the ability to impact schemas and data in a way that may be tracked and audited, the ODH is a safe place in which direct updates may be made to integrated data without negatively impacting data governance and accuracy.
In our next blog in the series, we will dig down deeper into semantics and RDF triples and discuss what they can do for organizations’ data integration and management.
LEARN all things ODH by downloading our e-book. It’s a soup-to-nuts read about how our pattern helps companies better run their businesses.
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
A data platform lets you collect, process, analyze, and share data across systems of record, systems of engagement, and systems of insight.
We’re all drowning in data. Keeping up with our data – and our understanding of it – requires using tools in new ways to unify data, metadata, and meaning.
A knowledge graph – a metadata structure sitting on a machine somewhere – has very interesting potential, but can’t do very much by itself. How do we put it to work?
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.Request a Demo