Today, change occurs frequently, and data modeling is a huge challenge because of the time and resources that relational databases require. Unfortunately, when using a relational database, even a simple change like adding or replacing a column in a table might be a million dollar task.
From CIOs to developers, everyone is realizing that relational databases were simply not designed for the challenges with today’s data. That’s why there’s been an explosion of data and new database products that have come on the market recently. The list just gets bigger and bigger every year. In fact, this trend has been going on for a number of years now, and the now former CIO of the U.S. Federal Government, Vivek Kundra, even commented on it at a conference back in 2009:
“This notion of thinking about data in a structured, relational database is dead.”
Vivek Kundra, Former CIO of the U.S. Federal Government
In the coming weeks, I am going to write a series of blog posts that explore the key reasons why I believe this statement to be true – why relational databases aren’t working and we need a new model. In each post, I am going to look at one key reason why relational databases aren’t solving the challenges with today’s data. The first reason is that relational databases are not designed to handle change.
Everything is faster in today’s world. Data is created faster and data changes faster. And, the questions asked of the data also change faster to meet new business requirements laid out to handle rapid in market dynamics, new management, on-demand services, or acquisitions and spin-offs. Today, decisions get made in minutes, not days, and data to support those decisions must be delivered in the right format — with reduced latency and greater efficiency.
Furthermore, the variety of data is increasing as organizations struggle to handle the many shapes, sizes, and types of data that are quickly growing in volume and changing. New applications, mergers and acquisitions, and re-purposing of data are common reasons that lead to the disparity of structured data, not to mention all the unstructured data that is going unaccounted for.
To understand why this is problematic, it’s best to take a closer look at how data is modeled in a relational database…
Relational databases organize data in tables with rows and columns, much like spreadsheets in Microsoft Excel. Each row represents a unique entry and each column describes unique attributes. One column is chosen as the primary key to uniquely identify each row in the table.
So, let’s suppose you were modeling a relational database for customers and products they ordered. You might start by creating a “Customers” table with a column called “CustomerID” to be used as the primary key. You would create additional columns for each attribute about each customer, such as “FirstName,” “LastName,” and “Address,” defining the type of data that will be stored in each. You then link the “CustomerID” to another table, “Orders,” that stores information about a customer’s purchases. Each row in the “Orders” table would have its own unique identifier and also a reference to the primary key of the “Customers” table.
You continue this process of creating various tables, ensuring your design meets all of the entity integrity and referential integrity constraints, and everything is properly “normalized” so that there are no repeating columns, that columns are all dependent on their primary, or unique key, and no tables duplicate any information. Those constraints are what maintain data consistency and ensure fast queries — hallmarks of the relational model.
This process of designing the data model, or schema, involves a dedicated team getting together to decide what tables should be created and what the column names will be. It is an important process, and the end result is often proudly depicted with a large entity-relationship diagram (ERD) that gets printed out and hung prominently in the hallway so that everyone can reference the glorious schema.
You’ve seen one of these, right?
Oracle BRM 7.3 database, picture courtesy of Tridens IT Solutions
Modeling data is incredibly important for many reasons. Schemas create a consistent way to describe and query your data and no organization would do well if they weren’t careful in how they modeled their data. Schemas are not the problem. It’s that relational databases require doing the modeling all up front. And, they don’t handle schemas very well when there are differing schemas or when a schema or schemas need to change. Relational modeling devolves into something that is way to cumbersome for organizations to deal with today…
FIRST, the process can take months, if not years, depending on the size of the database. Relational schemas are complex, and all of the modeling must be done in advance of loading any data or building the application. I used to do product and project management for large government organizations, and will be the first to admit that no plan goes according to the Gantt chart that was built in advance, and getting people to agree on anything upfront is near impossible.
SECOND, if a change is required after applications are built on top of the database, it becomes a time and resource-intensive process that can take another few months or years. One of the beautiful things about relational databases is the carefully engineered relationships and elimination of redundancies that make the system work very quickly as a whole. But the Achilles heel of this approach is that a small change to one table can cause a ripple of changes across the system that must be carefully accounted for.
“A good analogy is to think of the relational model is like a sensitive, complex rainforest ecosystem in which one small change can result in cascading impacts across the database and through the application stack.”
Even a simple change like adding or replacing a column in a table might be a million dollar task, as estimated by one MarkLogic customer (a leading Fortune 100 technology company). We’ve also heard higher estimates for more complex data modeling projects that last around five years and cost millions upon millions of dollars.
Today, change occurs frequently, and data modeling is a huge challenge because of the time and resources that relational databases require. Each year, billions of dollars are spent on data modeling and ETL processes to create and recreate more “perfect” data models that will never change.
BUT THEY ALWAYS DO.
Read the next blog in this series that discusses why relational databases are not designed to handle data variety or download the white paper, Beyond Relational, which covers all of the reasons why relational databases aren’t working in more depth.
All posts in this series:
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
Learn about data bias in AI, ways technology can help overcome it, why AI still needs humans, and how you can achieve transparency.
Successfully responding to changes in the business landscape requires data agility. Learn what visionary organizations have done, and how you can start your journey.
Sharing data can be relatively easy. Sharing our specialized knowledge about data is harder – and current approaches don’t scale.
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.Request a Demo