How NoSQL Can Help Analytics in Life Sciences
I often get asked — how does MarkLogic help with analytics? Where does MarkLogic fit in the analytics landscape?
Before I answer that question – let’s look at what happens today.
The world of Artificial Intelligence (AI) and cognitive computing is all about extracting information and relationships from piles of data. The insights (or signals) are typically buried in very particular parts of the data that can be narrowed by the “features” of interest. A good example of this is a life sciences related Real World Evidence (RWE) study that Stanford performed for detect adverse events from EMR clinical data, where the Vioxx-MI association could have been detected three years prior to the drug’s recall.
Speed Up Filtering ‘Features’ of Interest
Adverse events are typically closely tied to drugs and the patients pre-existing conditions or co-morbidities. In this case, if I had 1 billion records, the brute force way of narrowing down the datasets in a Hadoop infrastructure would be to load everything and filter away the narrow “features” of interest such as Vioxx, cardiac disease, and death. Typically, this brute force method takes hours to sort through and filter the data before it gets to the machine learning aspect of detecting signals.
BUT what if you had all this data in a real-time, search-oriented database so you could narrow the set down by 10x or 100x to just the features of interest? You could shave that machine learning cycle to only minutes — even seconds.
What you need is a NoSQL database that has co-occurrence capabilities. MarkLogic allows you to find value pairs and runs such queries against any number of indexes and any type.
The picture below provides a nice summary of how MarkLogic can help narrow down the data sets to the features of interest using its real-time co-occurrence capabilities. In this example, we can see healthcare-related co-occurrences for diseases treatments symptoms from more than 2.6 million articles.
There is a second way MarkLogic can help too. And that’s to operationalize the analytics.
Now that we have generated the insights, what do we do with them? Per the diagram below, MarkLogic provides a search focused multi-model transactional operational data hub. This means we can store flexible schema agnostic content as documents and graphs and can provide access to the data via various indexing models such as full-token-search, key-values, row-column, documents, semantic graphs, and geospatial views. Typically, cognitive computing insights can be mapped nicely into semantic graphs. MarkLogic provides a very nice way to tie these insights to content in the database via embedded triples or via RDF inferencing. As new insights are generated, they become directly accessible by applications running on MarkLogic.
Back to the question: How does MarkLogic help with analytics? MarkLogic’s real-time content indexes can speed up signal detection 10x or 100x by providing the algorithms and just the data they need to generate the insights. Once the insights are created, MarkLogic can store the insights as RDF graphs that can then be used to build semantically smart real-time applications. And you can take action with full confidence that you are using all your data.