Relational databases keep you stuck in a fragmented, rows and columns world. It's a rigid matrix that controls everything you do. You're a slave to the machines, bending your will to conform to their strict rules, shaping your world to the tyranny of their structure. It's time to escape the matrix of relational databases. It's time to embrace a modern, multi-model approach to data integration that frees your data and opens up endless possibilities.
You now have more data on your business, coming from more sources than ever before. But it’s more than likely spread across silos and stored in relational data models, such as registration databases, fulfillment and CRM systems, and ad serving platforms. Unless you can quickly integrate this data to create a 360 view of your business, the data isn’t actionable.
Every time you merge data from different relational databases you must extract, transform and load (ETL) the data, which is a slow, complex and expensive process.
A new application requires a unique view of your data. A merger or acquisition changes the amount and type of data you manage. A new compliance rule changes how you govern data. And as your business undergoes digital transformation, it’s critical that your data can be used in real-time.
Operational "run the business" data is used in real-time more than ever as businesses undergo digital transformations. This data can span multiple different lines of business and exist in multiple different databases. Being unable to integrate data from those operational silos fast enough or with enough agility to keep up with new business requirements can be fatal to your digital transformation.
At the same time, you also want to manage silos in your analytical “observe the business” functions, without proliferation of special-purpose data silos.
The problem is that relational databases make it difficult and expensive to use and integrate data in real-time.
Consider a simple entity relationship diagram: A customer completing a transaction involving a product. Conceptually, this is a very simple entity relationship. But due to the rigidity of relational databases, even simple entity relationships become complex. And in a relational database, similar information can be modeled differently, which makes integrating data difficult.
Looking at Blue Gear Auto Parts' simple schema, you’ll notice a problem. It features a join table, “Tlines,” that doesn’t map to our conceptual understanding of the transaction. Since transactions can include multiple products and products can be included in multiple transactions, separate join tables are necessary to store product information. Join tables exist because relational databases are rigid and force data into columns and rows.
Relational schemas are also difficult to intuitively understand. For example, what does ”Addr” represent in this customer table? You’d need to examine this table closely to understand that “Addr” represents the street information for this address.
As information changes, the rigid nature of relational data models presents more problems. For example, let’s say you need to add a second address or phone number to a customer’s record. In a relational model this requires new join tables, which further complicates your schema.
The rigid nature of relational data models creates another dilemma. Once you design a schema, the only way to change it is to break it apart and start over. So how do you create a schema that predicts your future needs? What if your business changes and you need to model your data differently?
You can choose the slow and expensive route and design a new schema and ETL your data. Or you can shortcut that process and force your data into your existing schema. But that would cause a data quality problem, because your schema wouldn’t match your data.
This schema allows only one phone number because Blue Gear Auto Parts didn’t account for mobile phones. Instead of redesigning the schema, developers forced two numbers into the field.
Therein lies the fundamental flaw in relational data models: they’re rigid and resistant to change. This hinders business agility.
Unlike Blue Gear Auto Parts, Red Motor Equipment’s schema allows for multiple customer phone numbers and addresses. Perhaps this division of Red Motor Equipment processes billing and shipping information and therefore requires both addresses whereas Blue Gear Auto Parts only processes shipping addresses.
Data schemas are much more complex in the real world. Here’s what a schema for one simple business application might look like. Imagine the complexity of an ERP application schema with tens of thousands of tables.
Let’s say Blue Gear Auto Parts acquires Red Motor Equipment and wants to integrate the two companies’ data.
To do that, Blue Gear Auto Parts would need to create a single schema that accommodates all the data from the two companies’ source schemas. This will create a more complicated “uber” schema. And in the process, Blue Gear Auto Parts will have to decide what information to keep and what to leave behind.
Red Motor Equipment has more information than Blue Gear Auto Parts. Perhaps it doesn’t need billing addresses now. But what if that changes?
Here’s what that “uber” schema might look like. As you can see, this schema has more tables and more joins, making it much more complex.
In reality, Blue Gear Auto Parts would want to simplify this “uber” schema so it is more manageable. But the only way to simplify the schema is to design schemas for specific use cases. And if Blue Gear Auto Parts did that, they perpetuate a continuous cycle of data silos and ETL.
With relational databases, this results in different versions of your data in separate silos: your original source data, modified versions of your source data, and more adaptations of your data for every unique business need. Further, every time you modify your data, you leave some of it behind, making data lineage difficult to track.
Let’s look at our simplified case of Blue Gear Auto Parts. When Blue Gear Auto Parts buys Red Motor Equipment, it makes sense for the data to be consolidated. Because ETL is expensive, complex, and makes it difficult to avoid system downtimes, they might not merge data until it's necessary for a specific application.
Let’s say Blue Gear Auto Parts now need to get a view of their customers to understand their customer profile. They’ll have to ETL the customer data from Blue Gear Auto Parts and Red Motor Equipment. Afterwards they might want to analyze their products and customers to be able to better predict what people might be interested in. Further into the year, Blue Gear Auto Parts need to consolidate their sales data for a regulator. So they ETL the data again, removing information the regulator doesn’t need to see in the process. Then, they ETL the data yet again to send sales data to a revenue dashboard.
But what’s next? Every time there’s a new business need, Blue Gear Auto Parts will need to ETL data. This slows their business down.
Each time ETL occurs, a new silo is created. As a result, there is no central source of data. Instead, different versions of your data exist across the organization: your original source data, modified versions of your source data, and more adaptations of your data for every particular business need. This also creates a data governance nightmare, as it is impossible to keep track of data lineage, provenance, and quality across the various silos.
Document databases, such as MarkLogic, have a more flexible data model. The customer record on the left, represented in a relational data model, is forced into rigid rows and columns.
The same customer record is represented as a JSON document on the right. It shows the hierarchical structure of a document model, which organizes data naturally, like a document. It features arrays (shipping address, billing address) and nested arrays (phone number).
When adding information to MarkLogic, you simply add whatever information you have, and leave out any information you don’t have. You can represent repeating hierarchical attributes like phone numbers and addresses naturally, without having to build out separate tables. And because MarkLogic indexes data as you add it, it’s immediately queryable.
Rather than using traditional ETL processes that transform data before loading it into a database, MarkLogic’s flexible document data model allows you to harmonize the data you need, when you need it. This harmonization happens in a fully-transactional database where data can be tracked and managed.
In this example, the zip code is being harmonized. In the harmonized canonical model both data models are consistently named "Postal."
With MarkLogic, you can easily pull pieces of data into a canonical model so that data can be queried consistently. You can evolve your model over time, as your business needs change. And at no point are you required to destroy your raw data.
MarkLogic uses an envelope pattern to model data where metadata can be easily added to the document to preserve data provenance and lineage. It allows you to store and query metadata, source data and canonical data all in a single document.
The document data model offers the most flexible and iterative approach for modeling business entities.
Relational databases aren’t good at showing the meaning of entity relationships. For example, Red Motor Equipment’s customer record shows numerous joins indicating relationships between the tables. But understanding the meaning of these relationships is difficult.
Here’s a view of the same customer record in MarkLogic.
You can quickly see the customer completed a transaction that included a “Battery” and “Jump Starter.” The entity relationships are much easier to understand.
Relational data models require that you understand your schema in order to query your data. That’s not the case in MarkLogic.
With the power of semantics, MarkLogic makes entity relationships simple to understand, making it easier and faster to query and discover data. Semantic relationships in MarkLogic are rich with meaning: you can define relationships explicitly and query the meaning of relationships directly. You can also draw inferences from the meaning of relationships to create new information.
A relational database assembles data by joining along primary or foreign key relationships. But these relationships tell us nothing about the actual nature of the relationship. The true nature of the relationship is buried in the application code. Some of these joins represent real relationships as we would understand them in the business context, and some of them are spurious relationships that exist only because the relational model isn’t rich enough to accommodate anything other than a row in a table. How can we tell which is which? Without knowing something about the semantics of the model, we really can’t.
With semantics, however, we can give context and explicit meaning to this transaction. For example, in MarkLogic, the same data can be defined as Karen Bender, customer 2001, bought product 7001, a battery.
Using triples, semantics provides the best way to model relationships in your data. Triples consist of a subject, predicate, and object, and they eliminate the need for foreign keys, nested queries, and complex joins. When you combine a document model with semantics you get a multi-model approach for all your data.
Semantic relationships in MarkLogic are rich with meaning. You can define that meaning explicitly, and you can query that meaning directly. You can draw inferences from that meaning and create new information.
Semantics gives your data intelligence and allows you to do things with your data that were previously difficult. For example:
Semantics can be used to define concepts that might be the same but named differently in different data sources. Blue Gear Auto Parts might have labeled a portable jump starter “Jump Starter” while Red Motor Equipment calls it a “Jumper Pack.” With semantics, they can define these concepts as the same. When Blue Gear Auto Parts searches for jump starter purchases after the merger with Red Motor Equipment they get the results they were after.
You can relate information such as one product accessorizes another to create a recommendation system or have a better understanding of your customers’ purchases. For Blue Gear Auto Parts, perhaps you relate battery and jump starter as items that complement each other. With semantics, they can build this relationship and when Karen purchases a battery on the online store, the recommendation system can suggest that she buys a jump starter to go with her battery.
These two examples are only a small subset of what you can build with semantics. You can use semantics to disassociate information, relate information to build a graph of your data, create alerts, and much more.
Gather all the schemas you want to integrate. Figure out what the schemas mean, how they work, and what they have in common.
Decide what data to leave behind. Design a new schema that can accommodate all the data you decided to keep.
Write ETL code to extract the data from the source, transform it into your new schema, and then load it into a new database with that target schema.
Create the indexes that power search.
Then build the application.
Restart steps 1-5
Every time your business needs change.
Pick a couple of source schemas to start with and dump out a few sample records.
Load those into MarkLogic as is and use MarkLogic’s built-in search capability to understand what you have.
Iterative harmonization and agile model-building.
Access the data as needed.
With MarkLogic, you're not just building an application—you're building a platform for all your data with endless possibilities. Perhaps there is one specific use case you're starting out with, but the data can be open to many use cases and you’re not just creating a new silo that works only for one application.
Here you can see how MarkLogic’s Operational Data Hub (ODH) might be incorporated into your architecture to ingest data from various data sources and deliver value. Within the platform, MarkLogic stores diverse data including JSON, XML, text, geospatial, and semantic triples complete with indexing for fast search. In addition, it offers APIs for fast data access and application development to quickly add value to your business. With MarkLogic’s ODH, changing data won’t hold your business back.
MarkLogic has industry-leading capabilities in security, privacy, compliance, and lifecycle management to ensure that your data governance rules are enforced. Not only will your system be transactional with full multi-document ACID transactions, it’s reliable and available. Whether it’s on-premises or in the cloud, MarkLogic has proven high availability and replication capabilities that will keep your data flowing even in the most demanding environments.
With MarkLogic you can change your data over time, as you need it. This allows you to proceed faster with less risk. And, to top it all off, once the project is done you’ve created an asset for your organization that will continue to evolve to meet your business needs.
MarkLogic's flexible data model allows you to integrate data faster and more cost effectively. Free your data with MarkLogic.