The new website for MarkLogic is Visit it.

Big Data, Dark Data in Property Casualty Insurance

Back to blog
7 minute read
Back to blog
7 minute read

In our first blog in the series on Electronic Content Management (ECM) for Insurance we looked at how specialized ECM systems that allow storage of binaries and metadata are brutally expensive. Now we are going to focus on how “big data” hampers the industry — particularly Property and Casualty (P&C).

P&C is a complex business that relies on more than just the highly-structured data found in the myriad of databases throughout the enterprise. Instead it requires the other 80 percent of the company’s data, the unstructured text that reside as binaries in a multitude of ECM systems. This type of content is often referred to as dark data because it’s hard to find the content when you need it.

Managing digital content across all these document repositories has been a special sort of big data hell. In fact most insurance companies don’t do it well for a multitude of reasons, but let’s list a few of them: Volume, Variety, Security, Knowledge of Process, and Tooling. Let’s look at each of these in turn.

Variety of Data Types — and Sources

Variety is by far the most challenging problem in document management, it’s not just dealing with PDF, Word, and Excel files any more. Critical, structured information from relational database needs to be gathered from the operational systems and those records must integrate with the customer or business 360 view.

Variety challenges come from the different functions and tools used by the business. It includes the different systems — policy, quoting, claims, accident and DMV reports — none of which speak to each other. And it includes the technical bits & bytes of the document formats such as Word, PDF, scanned docs, images, and spreadsheets. These formats must be understood — “cracked” so to speak — so their content can be managed. All of these documents have to have metadata extracted and maintained for search – and regulatory reporting.

Customer communications including policy repositories, emails, efaxes, ad hoc claims notes, underwriting reports and so forth all have their own structures, formats and life-cycles. The operational systems that support all these functional components are often fragmented across the enterprise so users need to access specific systems for the documents they are looking for.

The business is forced to do “swivel-chair integration” as workers bounce between systems locating what they need to get a 360 view. This is time consuming and prone to error; it’s easy to miss what you don’t know to look for. The 360 view, or Golden Record, is the holistic Customer View insurers want to rely on to understand customer behaviors and needs connected to various life events they may be going through. The Household View is another critical element in insurers’ decision-support, which will show all the customers from a single household. The challenge is these views can get stale fast.

“Static views of customers on a dashboard having ‘fixed’ connectors can get passé very soon,” warns Amit Unde, CTO Insurance, at LTI. “Analyzing customer data merely across products might be good enough for a customer service rep, but a product manager might be additionally interested in analyzing customer behavior across distribution channels, across period of time and across geographies; and be able to perform a more sophisticated analysis such as the ‘household view of the customer’ which will help him link other customer entities.”

“All insurance companies have systems for dealing with different aspects of their business. Some of the smaller companies I’ve worked with are even fortunate to have a single system for different operational areas. A ratings system that is uniformly used across all specialty lines, is vastly better than having multiple ratings systems, but even so digging further the ratings data is still isolated to that system. It’s just a larger silo that needs to be leveraged with the operational information from the rest of the business areas. Think about joining cross the cross LOB ratings data with actual customer claims, and marketing profiles, and billing data. A single view of a customer or line of business or operating region becomes possible when the structural silos are removed.

This leads directly to another problem; the lack of proper classification of documents. The type of content, its source, providence, and security considerations all matter when you’re creating an enterprise content management system. Including a document’s full context and metadata behind it to helps guide a user’s search. Unfortunately, this information is typically ignored; most organizations settle for a few simple tags and keywords provided by the author or report creator when documents get filed. We will revisit this when we look at some common tools used by the enterprise.

Volume of Data to Support Millions of Customers

We’ve touched on the variety problem, which identified the sources content comes from and the many formats. Now, imagine how that must scale out to support the demands of a million customers. How about 20 million? Solutions need to scale as well as cut across the operations and lines of business.

The key to locating and securing information is comprehensive indexing of all that content (including text, metadata, and structured). Shared file systems scale in volume, but search is sacrificed. Think of your three-year old laptop, can you find anything there? Is it governed or even very secure? We will touch more on this under tools.

Hadoop is often used to address the volume problem because of the way HDFS can scale cost effectively, but there is that word again, “file-system.” Indexing and locating data in Hadoop-based initiatives is a major undertaking made difficult by the variety of content.

In some of the better-organized companies I’ve spoken with, SharePoint is used to manage the volume and deal with a fair cross section of enterprise content. That’s not surprising as web pages and supporting content are easily composed and made available with SharePoint. The most successful implementations have dedicated people to curate content and limit the number of folders/pages. They also work hard to ensure content is properly tagged so it shows up when searched. However, one of the most common complaints I hear is how hard it is to tag all that content.

Further, when the number of users gets large, or when operational data like Policy Documents or Claim Submissions get moved into a SharePoint, the operational nature and sheer volume of updates make it hard for SharePoint to keep up.

Proliferation of Tools

Shared file systems are not content management systems, but I still see these systems used as primary sources for document sharing — even at large carriers. Network drives might be great for quickly sharing and for IT to manage your home folders, but expect sprawl, duplication – and sacrifice of governance.

As stated in our previous post, large organizations address the file system sprawl with highly expensive systems like SharePoint or lower cost wikis — but both make it difficult to locate anything. One friend shared their organization’s running joke “it’s on the wiki” as shorthand for “what you’re looking for exists, good luck finding it.” You just need to be “In the know …”

As the sector is told to digitize – organizations are scanning literally thousands of emails, faxes and claim forms – every single day. The resulting file is (typically) a bloated TIFF file that is not easily searched. As stated above – finding information in a TIFF file requires thorough tagging. OCR scanners allow for a text description to be married to the file, which aids in search. More content, even a text file, equals more bloat. The enterprise must maintain a careful linkage between the original scan and the digitally useable OCR enriched versions. Security, content tagging, related ontologies, and enrichment all drive the ability to locate the right information at the right time.

The Digital Transformation imperative requires a different approach to content management not only to cater for the changing customer needs and their ‘digital’ behavior, but also to improve inter-departmental collaboration and transparency within large insurance providers as well as to stay competitive vis-à-vis InsureTech threats. According to ComputerWeekly, a total of 70 percent of Europe’s largest insurance companies were found to have appointed a new CEO in the past 18 months, and it was these leaders who were implementing IT innovation strategies.

In the Know

Another common problem with document systems is they become islands of information that you have to know how to navigate. You have to know…

  • Where to search
  • What to look for
  • How to find it

Consider the underwriter’s role; they must evaluate the risk for a given policy and make the decision to write it or not based on the risk. To do this effectively they must have access to the relevant underwriting guides, application information, previous applications, current policies, riders, and claim history. This information is found in many different systems, for example:

  • Underwriting Guide – line of business’SharePoint portal
  • Application information – scanned from a paper form in email
  • Current and previous policies/riders – line of business policy management systems
  • Claim history – each line’s claim system

To this list we could add client and agent communications so more emails may be needed to fully round out the information, and likely increase the number of systems an underwriter must consult.

There are technical solutions to help address this problem in future blogs, but starting with a consolidated content repository is a excellent beginning. In our next blog we will look at 5 Key Functions an Insurance Electronic Document Store Must Have.

Derek Laufenberg

Read more by this author

Share this article

Read More

Related Posts

Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.

Architect Insights

What Is a Data Platform – and Why Do You Need One?

A data platform lets you collect, process, analyze, and share data across systems of record, systems of engagement, and systems of insight.

All Blog Articles
Architect Insights

Unifying Data, Metadata, and Meaning

We’re all drowning in data. Keeping up with our data – and our understanding of it – requires using tools in new ways to unify data, metadata, and meaning.

All Blog Articles
Architect Insights

When a Knowledge Graph Isn’t Enough

A knowledge graph – a metadata structure sitting on a machine somewhere – has very interesting potential, but can’t do very much by itself. How do we put it to work?

All Blog Articles

Sign up for a Demo

Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.

Request a Demo