Gartner Cloud DBMS Report Names MarkLogic a Visionary

How NoSQL Can Help Analytics in Life Sciences

I often get asked — how does MarkLogic help with analytics? Where does MarkLogic fit in the analytics landscape?

Before I answer that question – let’s look at what happens today.

The world of Artificial Intelligence (AI) and cognitive computing is all about extracting information and relationships from piles of data. The insights (or signals) are typically buried in very particular parts of the data that can be narrowed by the “features” of interest. A good example of this is a life sciences related Real World Evidence (RWE) study that Stanford performed for detect adverse events from EMR clinical data, where the Vioxx-MI association could have been detected three years prior to the drug’s recall.

Speed Up Filtering ‘Features’ of Interest

Adverse events are typically closely tied to drugs and the patients pre-existing conditions or co-morbidities. In this case, if I had 1 billion records, the brute force way of narrowing down the datasets in a Hadoop infrastructure would be to load everything and filter away the narrow “features” of interest such as Vioxx, cardiac disease, and death. Typically, this brute force method takes hours to sort through and filter the data before it gets to the machine learning aspect of detecting signals.

BUT what if you had all this data in a real-time, search-oriented database so you could narrow the set down by 10x or 100x to just the features of interest? You could shave that machine learning cycle to only minutes — even seconds.

What you need is a NoSQL database that has co-occurrence capabilities. MarkLogic allows you to find value pairs and runs such queries against any number of indexes and any type.

The picture below provides a nice summary of how MarkLogic can help narrow down the data sets to the features of interest using its real-time co-occurrence capabilities. In this example, we can see healthcare-related co-occurrences for diseases treatments symptoms from more than 2.6 million articles.

Narrow Down Data

There is a second way MarkLogic can help too. And that’s to operationalize the analytics.

Now that we have generated the insights, what do we do with them? Per the diagram below, MarkLogic provides a search focused multi-model transactional operational data hub. This means we can store flexible schema agnostic content as documents and graphs and can provide access to the data via various indexing models such as full-token-search, key-values, row-column, documents, semantic graphs, and geospatial views. Typically, cognitive computing insights can be mapped nicely into semantic graphs. MarkLogic provides a very nice way to tie these insights to content in the database via embedded triples or via RDF inferencing. As new insights are generated, they become directly accessible by applications running on MarkLogic.

Seeking Clarity in the World of Data

Back to the question: How does MarkLogic help with analytics? MarkLogic’s real-time content indexes can speed up signal detection 10x or 100x by providing the algorithms and just the data they need to generate the insights. Once the insights are created, MarkLogic can store the insights as RDF graphs that can then be used to build semantically smart real-time applications. And you can take action with full confidence that you are using all your data.

Imran joined MarkLogic to focus on bringing enterprise quality NoSQL solutions for managing large diverse data integrations and analytics to the healthcare IT marketplace. Imran co-founded Apixio with the vision of solving the clinical data overload problem and has been developing a HIPAA compliant clinical big data analytics platform. The big data platform makes extensive use of cloud computing based NOSQL technolgies such as Hadoop, Cassandra, and Solr. Previously, Imran co-founded Anka Systems and focused on the execution of EyeRoute's business development, product definition, engineering, and operations. EyeRoute was the world's first distributed big data ophthalmology image management system. Imran was also the IHE EyeCare Technical Committee Co-chair. Before Anka Systems, Imran was a founder and CTO of FastTide, the worlds first operational performance based meta-content delivery network. Imran has an undergraduate degree in electrical engineering from McGill University, a Masters degree in the same field from Cornell University and over 25 years of experience in the industry.

Start a discussion

Connect with the community

STACK OVERFLOW

EVENTS

GITHUB COMMUNITY

Most Recent

View All

Digital Acceleration Series: Powering MDM with MarkLogic

Our next event series covers key aspects of MDM including data integration, third-party data, data governance, and data security -- and how MarkLogic brings all of these elements together in one future-facing, agile MDM data hub.
Read Article

Of Data Warehouses, Data Marts, Data Lakes … and Data Hubs

New technology solutions arise in response to new business needs. Learn why a data hub platform makes the most sense for complex data.
Read Article

5 Key Findings from MarkLogic-Sponsored Financial Data Leaders Study

Financial institutions differ in their levels of maturity in managing and utilizing their enterprise data. To understand trends and winning strategies in getting the greatest value from this data, we recently co-sponsored a survey with the Financial Information Management WBR Insights research division.
Read Article
This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.