The efficiency of every day operational processes can be improved by removing older unused data into a separate system or archive. In regulated industries such as Financial Markets, there are rules governing how long this old data or records need to be kept and also ensuring that historical information cannot be changed.
Firms often use external services to archive voice, chat, and email. They may well have more than one enterprise content management system to handle office files and scanned images, with formal workflows to produce records. The most difficult area is system applications that typically use RDBMS. These may well be archived into separate relationally structured data warehouses or have application code to write records out, but of course each will have its own set of formats, tables and keys. Long-term storage for these platforms will generally include Write Once Read Many (WORM) storage, to meet Sec 4a immutability rules, or cheaper storage such as Hadoop.
Use of this historical data generally falls into two requirements. First, from a business intelligence viewpoint, older operational data can provide a rich data lake for statistical mining. Second, older information may be needed for investigations – either regulatory or legal. In the latter, investigations can lead to extending the safekeeping time for the dataset, sometimes known as a legal hold.
Generally, internal audit is tasked with making sure processes are being maintained properly.
One forensic firm addresses an “event” (criminal, civil, regulatory, or otherwise) by taking samples from a range of different systems — spreadsheets, records of transactions, documents and even communication to try and piece the transaction history together. These types of firedrills are disruptive and expensive. If the internal team can’t pull it together, an external firm will be brought in to help re-assemble — at a great cost.
So recreating histories is near impossible as long as data silos (in this case, data archive silos) persist.
Four Reasons Why Compliance Archives Are So Difficult to Navigate
Based on my 30+ years of experience in financial markets, I can identify four major challenges to solve:
- Most firms have become used to the fact that data, such as emails, will be stored in separate archives. Email may be manageable as the structure stays the same. But the same can not be said for the archives of relational data from applications. There may be hundreds of these archives that actually become harder to access over time as schemas change and become more out of date — and knowledge of the earlier applications evaporates. So, the ability to access these can be a challenge.
- Field names will not be consistent across schemas – there may be different names for the same fields or the same name for different fields. This can impact completeness and introduce quality issues when searching.
- Predetermined metadata is out of date. Basically, metadata decisions are made on what we think people will want to search for in the future without knowing what those requirements will be. This is also a common issue with content management systems, since often the metadata has been enriched manually and is not consistent. The usual way around this is to undertake a complete text index and search on the whole repository, which is not practical with large data volumes – since you would need to sweep the whole archive to know what is in it! (Certainly not practical if you have billions of documents.)
- Access to the archive systems will be restricted in various ways across the different silos so it is difficult to organize consistent control over who can see what or all access is constricted to a narrow group of human record keepers. Role-based access controls would allow you to create rules that will span the various archives.
To deal with yesterday’s, today’s and tomorrow’s requirements you need to architect a platform that can handle archive data as you would operational data.
Compliance Archive on a Regulatory Reporting Platform
Instead of creating yet another series of archive silos, consider creating a compliance archive on a common infrastructure as your Regulatory Reporting Platform. If the platform is built on a multi-model platform, you can create a single archive with mixed formats and multiple schemas.
This multi-model database platform that allows you to load data without upfront ETL — and instead lets you develop a data harmonization layer where data can either be transformed or, in the case of preserving data integrity, metadata is created. By creating metadata to harmonize, you can provide consistent data terminology while maintaining governance. Through the use of MarkLogic’s Universal Index, you can search on values and terms that were not predetermined. Any reporting system needs to have fine-grained security controls on who can see what, and of course redact fields if necessary. Finally, you need your platform to integrate with Hadoop and WORM for cost-effective and compliant storage.
For three, top-tier investment banks, one asset manager and one of the top three global brokers, MarkLogic’s database platform is the choice to satisfy MiFID II requirements like transaction reporting, transparency reporting, and record keeping. It is the ideal infrastructure for providing a datastore for compliance archives – that also can be extended to a full Regulatory Reporting Platform. In fact, the aforementioned customers have extended their regulatory frameworks to comply with other regulations, e.g. Dodd Frank, MAS 610 and GDPR and also get in control of any upcoming regulations.
And unlike solutions built on top of relational databases, MarkLogic’s data integration approach offers the flexibility, cost efficiency and faster time to market needed to address current, future and additional regulatory requirements. We consistently out-perform benchmarks:
- Scale: integrating 30+ trading systems, 140+ data sources, and managing over 8 million documents
- Resilience: managing 1,600+ requests per second
- Reach and Timeliness: over 300 users can run sub-second full-text queries on post-trade information updated daily, and we also do real-time data delivery
- Precision: metadata storage for 200 attributes to refine search and deliver better insight
- Global coverage: multi-lingual support for 200+ languages and advanced support for English and 15 other languages
Archives are going to remain a fact of life for financial services. But they don’t have to be the (costly) headache they used to be.