With data security, it is important to address the tactical details (think incident patterns, attack vectors, dynamic testing, etc.). But, it’s also important to look at the broader, more strategic issues and concerns with data security.
In talking to many of our customers, we identified three top strategic issues with data security that CIO’s, architects, and business leaders are most concerned about:
Concern #1: How traditional data integration with relational databases creates security vulnerabilities
Concern #2: How application developers are unduly burdened with data security
Concern #3: How insider threats create unknown, unmanaged data security risks within the network perimeter
In this post, I take a closer look at these issues that are particularly relevant to data integration, and discuss how MarkLogic helps address them.
Headlines reporting cyberattacks, ransomware, and compromises in data security are increasingly common. Data security is now a top priority — the risk of not securing data is simply too high.
There is no shortage of splashy numbers that highlight the problem:
Despite increasing awareness and spending, the data security is getting worse, at least if you’re measuring by the number of attacks and level of damage.
Every organization is looking for better solutions, but it’s a particularly difficult problem to solve with large-scale data integration projects that involve a variety of data silos that house mission-critical data.
Organization’s that do not take a comprehensive approach to data security — if they just focus on tactical details or just protecting the perimeter — they open themselves up to enormous cyber risk. So, with that, let’s jump into discussing the top strategic issues that we identified.
The traditional approach to data integration with relational databases and ETL leads to data loss and governance problems.
Role-and policy-based access controls are essential to govern, preserve, and audit data and associated entitlements. If these controls are not managed, you introduce unnecessary complexity and risk.
Unfortunately, most organizations have a proliferation of relational database silos. Each one has separate security access controls that make it virtually impossible to adequately track and protect all of the data.
Additionally, there are multiple ETL tools with obfuscated code and integration points, not to mention their own access controls that need to be managed. With an increasing number of data silos, there are more opportunities for exploits.
Often, what happens is a team builds a complex ETL process from multiple databases to a centralized analytical data warehouse—all using relational databases.
The ETL is done for two reasons:
But, step 2 often fails to ensure quality. In fact, the cleansing process may actually reduce quality by removing important data.
To a data analyst, some metadata may seem like “data lint” that needs to be laundered, but to a compliance analyst or data modeler, that same “data lint” may be required for critical business reasons (say, to prove to a regulatory agency that your trades were legal in order to avoid a hefty fine).
According to Mike Fillion, Director of Architecture at Aetna:
(Watch the full presentation from MarkLogic World 2017)
Over time, it becomes more and more difficult to maintain data governance (i.e., quality, lineage and provenance, security and privacy, compliance requirements, availability).
Failing to pay close attention to each aspect of data governance across the entire lifecycle of data creates additional cyber risk.
MarkLogic makes data integration a good thing for security and data governance.
First, MarkLogic reduces the burden of traditional ETL. By handling the process of ingesting source data as is and transforming and harmonizing the data inside MarkLogic, the whole process of integrating data becomes faster and more seamless. No data gets discarded during the process.
Second, MarkLogic’s multi-model approach using documents and triples is better for governing data over time. You can manage high level business concepts from multiple silos, materializing them as entities and relationships. Data and metadata stay together and you can track the details across the lifecycle—its provenance, who can see it, how it changed—all in a single system. (To learn more, download the free e-book, Building on Multi-Model Databases.)
Aetna is one company that has embraced this approach, and according to Mike Fillion, Director of Architecture:
By taking a more comprehensive approach, MarkLogic reduces opportunities for exploits and provides a more agile platform to handle new and changing regulations.
Unless security is handled in a more centralized database, what results is a spaghetti architecture that leads to more vulnerabilities. This graphic does not even depict the systems for backup and recovery, development, and testing that also require security monitoring maintenance.
It’s really hard to secure data across multiple data silos at every layer. Unfortunately, data is not secured in one central place, and not in the database layer. Usually, the burden is simply put on developers to do their best to secure data at the application layer for every new application.
With regulation around data privacy and security that organizations now have to account for (HIPAA, SEC17a-4, FINRA, GDPR, etc.), the stakes are higher and the burden is growing.
This is problematic because development and security teams are often disconnected.
A disconnect has grown because of the move towards DevOps and agile development. Both are positive improvements to software development that enable shorter release cycles.
Unfortunately, security teams cannot keep up. Security review cycles are designed to take weeks or months, and security certification and accreditations are bound to waterfall methods, not continuous improvement. Most developers know the OSWAP Top Ten, but the real security experts are only brought into the development process to do a final check before go-live.
According to Gartner, 90 percent of companies using DevOps consider security an afterthought. (Source: Gartner)
It is no surprise then, that according to the Department of Homeland Security, 90 percent of exploits are due to defective software. (Source: Homeland Security)
There is a disconnect between DevOps and security teams. Security is often only worked on during testing and release rather than through the whole lifecycle.
One example showing the disconnect between teams is at Intuit, which adopted an agile, DevOps approach for their 3,000-person team. Shannon Lietz, senior manager for cloud security engineering at Intuit, said in an interview (Source: TechTarget):
While most organizations are not the size of Intuit, the challenge is often similar. A development team is tasked with stitching together multiple technologies with different, usually quite limited security capabilities. The security team is out of sync and cannot keep up.
To solve this problem, organizations should implement many tactical recommendations:
Additionally, it is important to take a broader, more strategic look at how data is managed at the lowest possible level—in the database.
The goal is to keep data governance governable across the stack.
If you move to using a centralized database to govern and secure the data, securing applications becomes easier and faster. The work of data governance happens in one place. One change in data policy at the database level can be automatically applied to a hundred applications.
MarkLogic has extensive capabilities to govern and secure data in the database, which in turn helps with many of the aspects of application security.
The SANS Institute, a well-known cybersecurity training organization, provides a SWAT checklist to help development teams. (Note: This checklist includes references to the common weakness enumerators referenced by the OWASP Top Ten, which many people are more familiar with).
Of this list, MarkLogic fully addresses numbers 1, 2, and 7 – error handling and logging, data protection, and access control – and also helps address the rest (3, 4, 5, and 6).
By addressing many of these concerns in the database, the attack surface is decreased significantly.
One of MarkLogic’s key underlying capabilities that makes data security stronger and easier to implement is Role Based Access Control (RBAC). RBAC governs who can access what data based on their privileges and permissions. These privileges and permissions work to secure data at the document level.
MarkLogic also has Element Level Security, which makes it possible to secure pieces of data inside documents (more on this later). Working together, these features make life easier on developers by managing the access controls in the database.
Additionally, MarkLogic has programming APIs so developers can create and execute policies utilizing all of the security and data protection capabilities in MarkLogic (e.g., backup, retention, data access, data lifecycle, and authentication).
Policies can be associated with data, metadata, and data attributes so that policies such as those for privacy or compliance can be easily executed. And, the security controls and checks are transparent to developers.
Beyond these features, MarkLogic also has additional out-of-the-box features designed to help organizations with compliance:
All of these features means smarter data management in the database, less work for developers to do at the application level, reduced time and complexity around security testing, and better security resilience.
Focusing only on network security may create a secure perimeter, but the data in the “squishy middle” is then vulnerable.
Typically, most organizations put an immense focus on implementing endpoint, application, perimeter, and network security—and for good reason. Preventing intrusion into your network is a critical part of securing your infrastructure.
Some companies see hundreds of thousands of intrusion attempts against their network—every single day.
But focusing only on network security is like creating a hard shell around a soft, squishy middle. If you can get in, you’re in. The truth is, no network perimeter will ever be impenetrable. There are likely bad actors already in the network.
Some of the biggest data breaches have occurred because an insider got the keys to the kingdom. And, the number of incidents involving internal actors is increasing.
The numbers vary, but in general, internal actors are involved in 25 percent of all breaches (Source: Verizon).
In the healthcare industry, insiders are responsible for 68 percent of breaches (Source: IBM).
Unfortunately, many systems are vulnerable to such attacks because they only have all-or-none data access rather than fine-grained security controls.
Complicating the insider threat problem is the fact that modern enterprises have staff, contractors, sub-contractors, trading partners, consultants, auditors, and other people involved. It is very difficult to discern just who is ‘inside’ and who is ‘outside.’
Sometimes, it is relatively innocuous data management decisions that can create the biggest insider threats. For example, many organizations have data lakes that are virtual treasure troves of data with broad access to users.
One global bank we work with spent years building a data lake using another technology. But, they shut it down for security and compliance reasons when they realized the new system did not have proper controls and that were potentially violating certain rules and regulations regarding customer data.
Organizations today need better data security. It is not an option, however, to just lock everything down. While the most secure database in the world might be one that is locked in a safe and dropped in the bottom of the ocean, that data would not be very shareable.
In the quest for data security, it is important to still maintain data sharing.
Organizations must have proper security controls to ensure that the right portions of data are accessible and shareable with those in and outside the company who are granted proper access. And, there must be a separation of duties so that administrators granting access do not themselves have access to the crown jewels.
As discussed in the previous section, MarkLogic has fine-grained access controls designed to provide optimal data security even when sharing data. One additional feature that directly addresses the problem of insider threats is Advanced Encryption.
Without encryption, or even with file system encryption, the system administrator, cloud operator, or hacker could access or modify files—including the files that comprise the database.
MarkLogic’s Advanced Encryption allows data, configuration, and logs to be encrypted on disk (i.e., encrypted at rest). This feature requires no modification to applications developed on MarkLogic. And, the optional use of an External Key Management System (KMS) further ensures separation of duties and integration into existing security infrastructure.
In this post, I covered the top strategic data security issues that many of our customers are working on. The list if not comprehensive, nor does every organization struggle with all three. Regardless, it is important for every organization to think strategically about the vulnerabilities throughout their data ecosystem.
How is your organization addressing these problems? Are there any additional issues to add to the list?
If you’re interested in learning more about MarkLogic’s approach to security and data governance, here are some key resources below.
White Paper – Top Data Security Concerns When Integrating Data
Presentation – Security Keynote: SVP of Engineering
David Gorbet, SVP of Engineering, MarkLogic
Presentation – Data Security In Practice
Caio Milani, Director of Product Management, MarkLogic
Presentation – Data Governance in an Unpredictable World
Damon Feldman, Ph.D., Solutions Director, MarkLogic
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
A data platform lets you collect, process, analyze, and share data across systems of record, systems of engagement, and systems of insight.
We’re all drowning in data. Keeping up with our data – and our understanding of it – requires using tools in new ways to unify data, metadata, and meaning.
A knowledge graph – a metadata structure sitting on a machine somewhere – has very interesting potential, but can’t do very much by itself. How do we put it to work?
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.Request a Demo