Progress Acquires MarkLogic! Learn More
BLOG ARTICLE

How NoSQL Can Help Analytics in Life Sciences

Back to blog
10.25.2016
2 minute read
Back to blog
10.25.2016
2 minute read
Horizon of AI in Government

I often get asked — how does MarkLogic help with analytics? Where does MarkLogic fit in the analytics landscape?

Before I answer that question – let’s look at what happens today.

The world of Artificial Intelligence (AI) and cognitive computing is all about extracting information and relationships from piles of data. The insights (or signals) are typically buried in very particular parts of the data that can be narrowed by the “features” of interest. A good example of this is a life sciences related Real World Evidence (RWE) study that Stanford performed for detect adverse events from EMR clinical data, where the Vioxx-MI association could have been detected three years prior to the drug’s recall.

Speed Up Filtering ‘Features’ of Interest

Adverse events are typically closely tied to drugs and the patients pre-existing conditions or co-morbidities. In this case, if I had 1 billion records, the brute force way of narrowing down the datasets in a Hadoop infrastructure would be to load everything and filter away the narrow “features” of interest such as Vioxx, cardiac disease, and death. Typically, this brute force method takes hours to sort through and filter the data before it gets to the machine learning aspect of detecting signals.

BUT what if you had all this data in a real-time, search-oriented database so you could narrow the set down by 10x or 100x to just the features of interest? You could shave that machine learning cycle to only minutes — even seconds.

What you need is a NoSQL database that has co-occurrence capabilities. MarkLogic allows you to find value pairs and runs such queries against any number of indexes and any type.

The picture below provides a nice summary of how MarkLogic can help narrow down the data sets to the features of interest using its real-time co-occurrence capabilities. In this example, we can see healthcare-related co-occurrences for diseases treatments symptoms from more than 2.6 million articles.

Narrow Down Data

There is a second way MarkLogic can help too. And that’s to operationalize the analytics.

Now that we have generated the insights, what do we do with them? Per the diagram below, MarkLogic provides a search focused multi-model transactional operational data hub. This means we can store flexible schema agnostic content as documents and graphs and can provide access to the data via various indexing models such as full-token-search, key-values, row-column, documents, semantic graphs, and geospatial views. Typically, cognitive computing insights can be mapped nicely into semantic graphs. MarkLogic provides a very nice way to tie these insights to content in the database via embedded triples or via RDF inferencing. As new insights are generated, they become directly accessible by applications running on MarkLogic.

Seeking Clarity in the World of Data

Back to the question: How does MarkLogic help with analytics? MarkLogic’s real-time content indexes can speed up signal detection 10x or 100x by providing the algorithms and just the data they need to generate the insights. Once the insights are created, MarkLogic can store the insights as RDF graphs that can then be used to build semantically smart real-time applications. And you can take action with full confidence that you are using all your data.

Imran Chaudhri

Imran joined MarkLogic to focus on bringing enterprise quality NoSQL solutions for managing large diverse data integrations and analytics to the healthcare IT marketplace. Imran co-founded Apixio with the vision of solving the clinical data overload problem and has been developing a HIPAA compliant clinical big data analytics platform. The big data platform makes extensive use of cloud computing based NOSQL technolgies such as Hadoop, Cassandra, and Solr. Previously, Imran co-founded Anka Systems and focused on the execution of EyeRoute's business development, product definition, engineering, and operations. EyeRoute was the world's first distributed big data ophthalmology image management system. Imran was also the IHE EyeCare Technical Committee Co-chair. Before Anka Systems, Imran was a founder and CTO of FastTide, the worlds first operational performance based meta-content delivery network. Imran has an undergraduate degree in electrical engineering from McGill University, a Masters degree in the same field from Cornell University and over 25 years of experience in the industry.

Read more by this author

Share this article

Read More

Related Posts

Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.

Use Cases

Extracting Hidden Value from Your Content

Get more value from your content by improving discoverability. Learn about the key elements of a smarter content roadmap, and what steps most enterprises are missing.

All Blog Articles
Use Cases

Using MarkLogic for Cyber Threat Analytics and Awareness

MarkLogic helps agency decision-makers connect to their IT data quickly, make better decisions, become more agile, and solve their most difficult cyber challenges.

All Blog Articles
Product

Standardizing Internal Data Models on FHIR

Learn about MarkLogic’s work on a FHIR-based standardized data model to support persisted payer data for our Medicaid Accelerators.

All Blog Articles

Sign up for a Demo

Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.

Request a Demo