Data Strategy Factors for Threat Management
In the wake of the extremist attacks in Paris and in San Bernadino, California, some suggested the tragedies were a complete failure of intelligence services. That would be hyperbole. However, both of these attacks do serve to highlight the difficulty in protecting modern democratic societies from small group and lone gunman threats. The countries that most treasure “openness” and civil liberties walk a particularly fine line.
Short of creating a STASI-like surveillance state, government and industry have to ask themselves, is there something we can do better in the area of screening and risk assessment from an information technology standpoint. As it is now, by the time you get to watch-listing, the aperture of situational awareness is often too small.
Over the last year MarkLogic Global Public Sector has engaged with National Security and Public Safety leadership in Europe, North America, Asia, and Australia. Increasingly, post-attack investigations of acts of extremism — whether committed by homegrown or transnational extremists — are beginning to reveal data strategy shortcomings that impact threat management and the screening processes. These limitations slow down the ability of agencies whether at the country or regional level, to quickly evolve and adjust to shifting threats. Here are the four most pressing challenges.
Challenge #1: Data and IT Systems Proliferation
Various agencies’ have many systems from which information must be aggregated. Each of these systems was developed organically and were bought for specific reasons along a different timeline by different departments. These were procured rationally with market studies and analysis of alternatives. Requests for Information were issued; industry days may have been conducted. Tenders were posted and perhaps even a narrowed list of choices was evaluated based on functionality and cost reasonableness. These systems were then implemented largely according to plan and met or exceeded the stated requirements.
For example, an intelligence agency may have procured Geospatial Information Systems, Link Analysis, Case Management, Biometrics, and search tools, all at different times and reasons, with different funding. These systems and tools perform exactly or nearly what is expected of them. However, what was never considered was for these systems to be integrated in a way so new information applications could be created; applications that could be agile enough to respond to new threats, integrate new sensors or screening methods, and match evolving analytical intelligence techniques.
The solution to this problem can’t be “replace everything.” Not only is that economically untenable, but it’s techno centric – ignoring the reality that with their old systems, users are trained, relatively productive, and are used to all of the quirks. Equally, the answer can’t be incremental and point-to-point integration of these systems – which dooms organizations to a forever-loop of engineering and increasingly complex maintenance and quality assurance. You’re fighting against the arithmetic of systems engineering at that point.
System proliferation, especially when the result of department to department, or agency to agency bureaucracy directly impacts a government’s ability to protect its borders by creating the likelihood of that some of the data is orphaned or even unmanaged.
Challenge #2: Silos of “Excellence”
Sometimes the above-described proliferation takes the form of stove-piped analytical environments. Organized around applications or integrated systems such as statistics, link analysis, GeoINT, Signals, OSINT — these environments, present the organization with a double-edge sword. In exchange for a robust user experience and productivity within that single discipline, data feeds, works-in-progress, and even finished intelligence products are trapped in their own silos. This solution ends up complicating interoperability, creating synchronization and data consistency issues, and reduces the return on investment from initiatives such as data center consolidation and adoption of cloud architectures (whether private, commercial, or hybrid).
Using individual applications – each with their own databases to be the focus of fused information – particularly objects and entities related to people, organizations, events, places, and chronologies, greatly limits counter-extremism and border control organizations’ ability to adapt to threats. It also diverts money, time, and resources to the care and feeding of infrastructure rather than operations and analysis.
Challenge #3: Multiple Communities of Interest
If you look at a complex function such as Threat Management, Screening, and the subsequent watch-listing, the reality is that the interests of multiple groups or communities of interest are at play. Besides Public Safety and Law Enforcement at local, state, National, and International levels, those responsible for monitoring and operating critical infrastructure all may have reasons to see similar data, but they will use it in vastly different ways. Furthermore, information coming from these stakeholders may require granular security controls at the attribute or value level so the cooperating organizations can live up to “need to share” but be able to safeguard sensitive content such as sources and methods. Even beyond this, when you look at end-to-end activities related to extremism such as Organized Crime, Human Smuggling & Trafficking, and illegal commerce around drugs and weapons, the interplay between poverty, education, availability of social services, and transportation are extremely complex. It’s hard enough for Defense, Intelligence, and Law Enforcement to share information (just from a political standpoint). But what does the architecture look like that allows you to bring together the indicators and warnings from the social fabric that would provide even the roughest idea of what puts young men and women, already at risk due to hopelessness, economic strife, isolation, and issues of language, culture, and religion? The greatest care has to be taken to respect Personal Identification Information, Privacy, and even health records.
If threat management systems aren’t designed from the ground up with the expectation that there are multiple communities of interest involved in combatting extremism, true information sharing and collaboration will be elusive.
Challenge #4: Making Data Science Operational
There’s no doubt that innovation in the area of big data and data science is going to transform many different aspects of Security and Public Safety. However, looking broadly across industries including government, financial services, healthcare and life sciences – it is clear that current investments in data science are experimental in nature. This characterization is not meant to denigrate any of these efforts including those specifically focused on security, border control, and public safety. However there seem to be two gaps that need to be better addressed:
- Data Science has to conform to the scientific method. The rigor applied to creating a screening algorithm for border control has to be the same as what you would apply to any other experiment. Over time, this will undoubtedly improve as the “science” of data science matures. However, today, in many organizations data science is being pursued as if you’re seeing what will grow in a petri dish without isolating one variable at a time.
- The IT architecture surrounding data science, frequently a collection of open source tools anchored by Hadoop, requires so much effort and time to wire together, that instead of being a platform on which to conduct experiments, becomes the experiment itself.
What’s needed is a platform that can both support the scientific process unimpeded, but also be a platform to operationalize the algorithms, models, filters, and pattern detectors created ‘on the workbench.’
The system architectures and tools thus far evangelized and adopted for Data Science are, likely, not sufficient to transfer these insights to an operational environment. For threat management, the feedback loop between algorithm and model creation and real-world application has to be dependable and rigorously real-time.
The answer to the above challenges is not purely technical. There are significant organizational, culture, and process changes that have to be addressed. However, based on what we’ve seen in working with dozens of National Security and Public Safety organizations around the world there are aspects of threat management and watch-listing processes that may be better managed by taking a fresh look at Data Strategy.
What Do We Mean by Data Strategy?
Because of the above four challenges as well as the variety and volatility of the data that is critical to providing situational awareness to counter-extremism operations, a coherent approach to data curating, integration, governance, sharing, and security is difficult. When we think about data strategy it must encompass all of this and do so in a way that spans applications, systems, communities of interest, organizations, and even nations. Data should be managed independently of individual applications. The strategy has to incorporate data life cycle, stakeholder attributes, access controls, master data management, and providing geospatial, temporal, and semantic context.
When we look broadly at data strategy it’s clear we are at a crossroads. Across National Security, Defense, and Public Safety organizations, there are dozens of systems implemented and procured in an unsynchronized way. As we discussed above, there’s no way to do wholesale modernization that requires “rip and replace” of all systems.
There is another approach.
The Operational Data Hub
One answer may lie in an enterprise architectural pattern known as an Operational Data Hub (ODH).
An ODH brings all of the relevant mission data together regardless of format or schema. It provides the ability to index all structured, unstructured, semantic, geospatial, temporal, metadata, and security information and securely expose all of this information for search, data matching/alerting, and exploration via tools such as link analysis, geospatial information systems, and statistical packages. Integration and dissemination are made simpler by providing hooks into data and functionality via RESTful web services.
The ODH is not designed for analytics or business intelligence, but the ETL, aggregation, and related data management time, resources, and complexity associated with analytics or data science will be greatly reduced. The ODH is a way to avoid all of the point-to-point integration typical of complex information environments. The ODH also reduces the need for costly wholesale IT modernization efforts.
Object-Based Intelligence & Production
While an ODH provides one mechanism to organize, search, and tag all of the relevant data, the way that entities such as people, organizations, events, observations, and chronologies, are central to counter-extremism and threat management work, means that something more is needed. Those involved with counter-terrorism and Threat management need to be able to create, share, discovery, and relate these entities or objects. Each of these is comprised of multiple attributes each potentially with multiple values. Specialized metadata denoting pedigree, provenance, timespan validity, analyst comments, and security tags can also be included. Object Based Production provides the counter-extremism and security community with a way make the intelligence life-cycle more dynamic.
Both ODH and these Object & Entity services expose shortcomings in RDBMS platforms to a point that they would crumble under the variability of the content and user types.
There’s another equally powerful reason to consider moving to an Object or Entity Services Approach to Counter-Extremism. The existing intelligence life cycle, characterized by:
Some countries spend billions on Collection, hundreds of millions on Processing, put their best people and analytical tools on Exploitation, then shove everything into PDFs and PPTs for Dissemination, trapping important insights and data about threats, people, organizations, and locations inside these files that are hard to discover and relate.
The promise of an Object or Entity based approach is liberating facts from the confines their underlying sources or summary documents. This means all of the cooperating agencies and even countries can more flexibly and securely share the information they need to combat extremism.
The False Dichotomy – Enterprise RDBMS Products Versus NoSQL Projects
In general there is little debate that the Relational Database Management System (RDBMS) epoch is far closer to its end than its beginning. Innovation has slowed and those areas where investments are made by the leading RDBMS vendors provide only diminishing marginal utility. These frequently reflect defensive moves to keep the product seemingly competitive with newer approaches like NoSQL. Under examination the embrace of newer modalities are cursory and are little more than a way to pull unstructured data into the RDBMS core.
Built upon RDBMS, enterprise architectural patterns such as data warehouses or data marts do address some of the data challenges. However, they begin and end with high structured sources. RDBMS-based data warehouses and data marts are inflexible and brittle. This is largely due to the need to express and organize data in harmonized and normalized ‘star’ schema. This ends up flattening out the kind of ad hoc and nested document-based information that is just so vital to the counter-extremism and threat management mission.
Certainly many NoSQL databases can answer Challenges 1 and 2, with their promise to take in any type of data. But the bulk of NoSQL options available are open source projects, not enterprise products. Open source is very alluring: Seemingly lower licensing costs, the ability to tailor for a particular organization or mission, and the innovation of entire communities working on common problems. However, these NoSQL databases are anemic when it comes to data consistency, disaster recovery and backup, replication, ability to handle all data types, and government-grade security. Which leaves Challenge #3 unmet – unless organizations take on the heavy software engineering tasks akin to those typically done by independent software vendors, not customers.
The bottom line is that counter-extremism and security operations presents data management challenges both for legacy RDBMS and for less-than-enterprise ready NoSQL. The panacea would be a databases that has the agility of NoSQL and the reliability of enterprise relational.
To Reboot Your Data Strategy
If (rethinking) your data strategy is at the center to rethinking counter-extremism and security operations, then how do you get started? Well, there’s a few things your organization can do right away:
- Consider the variety of data needed to do Threat Management. Is it documents? Video? Biometrics?
Are the sources volatile and variable?
- Before you build a relational data warehouse, consider how many of the sources started out in JSON or XML and then were deconstructed into tables
- If you have a data warehouse and it doesn’t include all documents, PDFs, PowerPoint slides, and other content on your network share drive – imagine the work the users need to do to bring together all data
- If your business processes revolve around entity management, and geospatial analysis consider how operations could be improved if entities and geospatial features were application-independent and managed by an operational data hub
- If business processes rely on business intelligence reports to answer “can I find…” type questions – consider what impact integrated search & DBMS services can have on individual productivity
- Lastly, much change is often focused on data strategy initiatives: “if it isn’t broken, don’t fix it…” This is an important mantra when considering an Operational Data Hub (ODH). Frequently, investments in legacy systems can be extended with an ODH because it removes the burden of point-to-point interoperability from the individual application or system
When considering the scope of challenges associated with Threat Management the answer is rarely ‘just buy and implement product X’. These are tough problems that have to be addressed at many levels: Laws, policies, funding, organizational culture, process changes, and technology. However, as governments and agencies consider how to address Data Strategy challenges, the advantages of implementing an Operational Data Hub or moving away from static intelligence life cycles via Object or Entity methods become more apparent — and more crucial.