The recently published DoD Data Strategy, “Unleashing Data to Advance the National Defense Strategy,” articulates a vision for turning the US Department of Defense into a data-centric organization.
We at MarkLogic have worked with data for a long time, in a variety of industries – including government, banking, healthcare, and more – helping our customers engage in large-scale digital transformation and modernization efforts. Based on the expertise we’ve developed over the years, we’re sharing some commentary and recommendations to help people translate strategic principles into real-life results.
This is the first post of a blog series where we will explore the key concepts laid out in the DoD Data Strategy. In this post, I’ll outline what data platform capabilities are essential to ensure alignment to the strategy’s 8 Guiding Principles that are foundational to all data efforts in the DoD.
Data Is a Strategic Asset
Your data must be treated as a treasured commodity, focusing on security, lineage, and provenance of data ingested, along with the quality of the data. Best practice is to record data as is and retain its lineage to understand what occurred while using the data for a strategic mission or purpose to your advantage.
You need to be able to integrate data on demand, store it for mission priorities in data stores, tag it on ingest, and catalog its original state for provenance and tracking of its lineage. And, it’s critical that your data can be made available immediately for your mission to execute securely at an enterprise level and under your management governance.
Collective Data Stewardship
How can you ensure accountability throughout the entire data lifecycle? Data stewards, custodians, and functional managers need to be able to assign access to protect, disseminate, and enforce data policy promoting and providing governance and quality to your data. Make sure you can implement role-based data access in your security implementation. Having a data platform with strict ACID compliance is also critical, to ensure the data is reliable.
The DOD mission requires fast results with reduced risk at a favorable cost that ensures effectiveness of the mission and efficiency with the intended result in a timely manner. Your data platform should support role-based security and accountability out of the box, as well as data provenance to ensure full understanding of the data lifecycle.
Data integration can involve multiple stakeholders, each holding a piece of the integration milestones with multiple touchpoints throughout the data lifecycle. A data hub can enable all of these stakeholders – data architects, data stewards, custodians, and functional data managers like system analysts and business analysts – to iteratively collaborate and apply their expertise to build and test data in the enterprise at speed with accuracy and accountability.
All of these data management personnel need to be properly trained, of course, so be sure to build this time and any associated cost into your plan. Ideally, your data platform provider will offer free training on its software.
As your data takes its path through your governance and controls, you need to have visibility and reporting capability all the way through your mission to include returning any data to its original state. Make sure you know what has occurred with your data by using a platform that tracks and reports on its provenance and lineage and can evaluate impact at every step.
Laws and regulations often govern the DoD mission data; ACID compliance ensures the reliability of data you need to meet the demands this environment requires. Additionally, your data platform must support security defined by your governance to follow law, regulation, and access or visibility – enabling users to interact directly with the data as they define the entity model, configure curation services to harmonize and master the data, and explore and share curated (and even raw) data by the role they are identified with.
When integrating your data, make sure that it is tagged, stored, and cataloged at ingest for use in the mission. Automate the data integration effort as much as possible, to eliminate the untrusted ETL process and its long development cycle.
To ensure consistency and provenance, as your data is integrated it should be indexed, cataloged, and stored securely providing a clear understanding of its origin and tracking of every action taken against the data. Your data platform should enable you to easily tag and track data over its lifecycle of use to provide Command and Control knowledge of what was done to the data as well as providing governance and rationale for the data as it is needed.
Enterprise-Wide Data Access and Availability
Your data platform must provide secure enterprise access and quality to meet the dynamics of today’s defense mission – including support for disconnected or latent communications and information sharing. The right enterprise platform will ensure your data is secured and sharable with confidence when required. Enterprise-grade quality includes security, monitoring, and data replication for COOP/COG requirements.
Focus on simplifying the most complex IT challenge, data integration. For advanced security needs, track metadata alongside the data itself instead of using a disjointed data lineage approach.
Instead of handing over security to app developers to worry about, your database should manage the roles, permissions, privileges, etc. – implement security at the data layer instead of the application layer. And, instead of worrying whether to lock data up or risk sharing it, ensure that your administrators have extremely granular, tight control over exactly what data gets shared with whom.
Data for Artificial Intelligence Training
Creating, managing, protecting, and exploiting datasets for Artificial Intelligence (AI) training requires integrated data you can trust. AI and machine learning makes analytics more sensitive to bad data. Curated, integrated, governed data is the foundation of any successful AI and machine learning pipeline. How do you know you used the right training data? Where did that data come from? Does it include PII? Those are all questions that need to be answered about the data going into and out of an AI or machine learning system.
Since both analytic and operational needs change over time, instead of worrying about mapping schemas together, integrating an MDM tool, writing custom algorithms, and other non-value adding tasks, we recommend a “smart curation” process that enables users to leverage built-in, smart, and automated capabilities to enrich, harmonize, and master data easier and faster.
Your data platform should serve both your AI and machine learning analytic needs as well as your operational needs – securely delivering reliable information for whatever purpose. The best platforms include built-in machine learning capabilities that allow processing to occur close to the data, both for performance reasons and to eliminate the need to export highly secure data to outside platforms.
Delivering data to meet DoD’s AI and machine learning needs requires:
- The ability to curate the quality data needed for input
- The ability to keep your most sensitive data safe and secure
- The power to apply the most advanced algorithms close to the data
- The flexibility to managed and understand varied and multi-dimensional inputs and outputs
Data Fit for Purpose
Look for providers who have built their platforms with US Government needs in mind, including understanding ethical concerns with regard to data use as well as a mission need to understand the requirement to adhere to the US laws, governing guidance, and mission or unit regulations that provides oversight of data. As data is a central asset of your organization’s mission, your data platform should make it easy to centrally define and enforce governance, security, and other data lifecycle policies and access control.
Rapid data integration to deliver fit-for-purpose data is best achieved by an iterative, model-driven process that includes consideration of data sharing and data access control requirements. Ideally this process will be supported with a user interface that allows agile teams to collaborate.
The goal is to make it simple for the teams to harmonize, master, and enrich multi-structured data to create durable data assets for multiple use cases. DoD should have the flexibility to use multiple lenses for exploring and analyzing the whole gamut of curated data assets, with complete security and governance.
A multi-model database that can store and query documents, graph data, or relational data from a single database provides incredible flexibility. Gartner believes multi-model makes things simpler and recommends multi-model for certain analytics use cases:
Multi-model DBMSs can reduce the complexity of existing portfolios of production systems. They can often more consistently apply auditing, concurrency controls, versioning, distributed data complexity management, points of governance and security.
Using a multi-model platform with built-in universal indexing and search can result in less time and effort to build and configure indexes for standard queries and eliminate the need to bolt-on a separate search engine for full-text search. This is incredibly helpful for data integration because it saves time during the curation process, provides users immediate access to their ingested data, and enables users to ask more complex questions of integrated data.
Finally, the platform you use should support cloud operations; even if you’re initially planning to deploy on-premises, it is likely that you’ll want to move to the cloud at some point. Serverless cloud data hubs can also deliver significant O&M resource savings and faster results than traditional solutions. Make sure that your platform provider is flexible enough to support the infrastructure environments you prefer – now and in the future.
Design for Compliance
Clearly, automation is key and your platform should make your data properly secure, fully managed, stored, and accessible through policy and rights as well as role, maintained by its lineage and provenance, and governed by your requirements for reporting.
Your data must be properly secured and maintained throughout its lifecycle; make sure that your data platform has been proven in high-security environments and has been certified by third parties.
While integrated data must be secure, it also must be accessible and shareable; that is ultimately how you get value out of your data. The challenge is how to do that safely. Key capabilities in this area include:
- Granular access controls so that you have full control over exactly what data is accessible, by whom, and when; different users will have different views based on what they are allowed to see
- Strong data governance, including audit trails so you can track lineage and provenance in the metadata, ensure data quality and availability, and apply governance rules and policies as needed
- Advanced encryption that allows you to store your most sensitive data in a public cloud environment, e.g. AWS. This level of encryption allows you to take advantage of modern cloud architecture without the risk of having some database administrator that runs those systems somehow getting access to your data
MarkLogic believes it is critical for our Federal customers to select the right technology to power their data enterprise. Here are additional resources to help you evaluate the right approach for developing a data solution within the DoD to solve the data challenges of the 21st century.