Big Data Nation Boston: Triples & NoSQL
The tour rolled into Boston this week and despite the skies opening, we managed a robust and avid audience. My colleague at MarkLogic, Amir Halfon CTO of Financial Services was my cohort and we spent a good portion of time demystifying how a NoSQL (not-only) database works. Not surprisingly, with seemingly a zillion NoSQL options out there people are completely confused. Amir related how there are three flavors: Document, Graph and Key-Value Stores. MarkLogic is of course a Document Store, which author Dan McCreary points out is the most flexible and comprehensive of all the NoSQL stores.
As Boston was our 6th city, we had developed a rhythm. The worst thing that any technology company can do is assume that everyone out there “gets it.” And in the NoSQL/Big Data space it is a lot of hype and confusion. Just guessing, 90% of the NoSQL/Big Data vendors out there say “We unify all your data and let you unlock valuable insights.” We say it, they say it, so why should we assume that the folks in the room have any clue how one differentiates from another — since the vendors don’t spell it out.
So we spell it out. It always astounds me when non-native English speakers eloquently turn phrases. My colleague Amir, who is Israeli by birth, is a prime example. When defining the different types of databases — he makes sure to note that some data “unifies” very well in one type of database — and is a disaster in another. As he describes Anti-Money Laundering (AML) initiatives, he explains how hierarchical and semantic linking of data allows you to see correlations that you never could have seen before. “If the document is the customer’s mortgage, we can now cross reference this to the fact that the person sits on a board of a company that transferred money yesterday …Which is good news for banks as regulators are saying “too bad if you couldn’t find the relationships — you are still culpable.”
In the Big Data space, Hadoop is the elephant in the room. In real life, Hadoop has been difficult to move from science project to production environment. Intel’s HDFS and MarkLogic are working together to create an enterprise-hardened environment that allows you to pull historic transaction data into Hadoop — and begin its massively parallel-processing. In tandem, with MarkLogic atop HDFS, data streams into MarkLogic. “We do that through reverse queries,” explained Amir.
As you get more data in though, you need to turn it into information. And one of the most efficient ways to do that is through semantic enrichment. Temis’ John Paty was present to talk about how semantic enrichment is moving beyond the media world, it’s original beachhead, and is seeing financial services and other industries finally start to turn to text analytics. So to recap. Lots of data, add some enrichment to all that data so you can actually query on it, store long-tail data in Hadoop, bring it back in to MarkLogic to run against saved queries for alerts. So simple! :)
Next stop: New York.