Data Integration: We’ve All Been There
We have all been there: Rushing from the car park to the customer’s office stressed from being late. I heard the muffled sound of voices as I approached the room – made loud only for an instant after I opened the door, shortly followed by a hushed silence and 10 pairs of eyes staring up at me.
“I’m so sorry I’m late everyone. The M6 was a nightmare,” I said, answered by vague murmurs of agreement.
“Everyone, this is our technical expert,” announced our Account Manager.
I made my way to the only free chair and helped myself to coffee, its welcome aroma awakening my senses after the early start and long torturous drive.
After the customary introductions, the prospect stood up and started to white-board the problem. After the horrendous journey and embarrassing entrance, I was hoping for an easy meeting. Not so!
A familiar story of complex data integration from a myriad of data sources was being told. Some of this data came from external sources who would change the schema of their data at any time — a story of ever-increasing business requirements that expected IT to match the agility the rest of the business was accustomed to.
Dynamic Data From Various Sources
The customer was a leader in market research. It collected large amounts of highly dynamic data from various sources. The nature of this data wasn’t really relational, as it had hierarchies and dynamic relationships based on the individual or organization’s demographic, preferences, and the nature of the research being undertaken. It was a very mature player in the market and had historical data dating back several decades. This data was a competitive advantage for them, but each generation of data varied massively in terms of structure and content.
The firm required the collation of this data for purposes of analytics. But the data models were getting so large and unwieldy, the time it took to process and integrate this data was being prohibitively expensive.
Since it was a mature player in the market, its software had been built in-house from esoteric components, due to the complex nature of their data requirements. This software wasn’t scaling in terms of analytical capabilities, and with the advent of Big Data, its competitive edge was being marginalized.
Throwing all this data into a Data Lake wasn’t an option either as it required:
- Rich metadata, associated with the data, and complex analysis of the data for complex (yet structured) queries, alongside rich search through free-text fields
- Strict security models and strong governance — due to the sensitive nature of the collected data
- Zero data loss and zero downtime — as this software was the life-blood of the company and with frequent tight deadlines
After the company laid out its challenges it was my turn. I’d been there before. I worked for a large software company that had a veritable Smörgåsbord of components and tools that we would tell our customers could solve any problem.
Except there was an elephant in the room — an elephant this customer had just drawn up on the board: For all the array of expensive software in our arsenal, nothing adequately addressed these complex data integration problems.
So I got up and drew my Frankenstein monster, an aberration of expensive software components held together by duct tape and more than a veil of smoke and mirrors. It included a mishmash of data modeling tools, ETL, metadata management and all sorts of over-engineered automation and generation components. Of course, it could solve the problem — and had done so for previous customers — but the time and cost in manpower would be enormous. And honestly, those costs would be never-ending as the business demanded constant change to suit the modern world.
We broke late for lunch. A trolley of uninspiring sandwiches with bland fillings awaited us, their edges starting to go stale from having waited there too long. The customer offered no peace as I tried to answer tricky questions between hastily swallowed mouthfuls.
The meeting continued into the afternoon, a familiar theater of difficult questions and unsatisfactory answers. It was a role that the customer had no doubt played with the other vendors we probably would be competing against. This was a modern problem trying to be addressed by software giants (or perhaps dinosaurs) with tools and techniques developed for the previous decade.
The team understood that it was being presented with imperfect solutions. And after exhausting the list of large vendors they would inevitably sit there assuming the problem had no adequate solution. They would either be forced to choose one (based on who knows what, price or perhaps whichever sales team they liked most), or decide this problem couldn’t suitably be solved and reprioritize other needs.
Never Found Out Who Won
I never got to find out if my drawing on the board delivered the sale. I had accepted a job at MarkLogic only the week before. And while normally, I would be put on so-called “gardening leave,” the company had been going through an aggressive “resource optimization” exercise and were hugely understaffed and so they asked me to work up until my very last day.
The next day I left my old job behind and England, and flew to San Francisco (MarkLogic’s headquarters) to start sales training. Amidst the cocktail of a fuzzy head from jet-lag and adrenaline of excitement of starting my new job, I was introduced to the company and its product.
And it wasn’t long before it dawned on me that MarkLogic was the right tool to solve the problem I had erstwhile been unable to solve. If only I had been working for MarkLogic when I stood in front of that customer, I would have been able to stand up — confident — and draw a genuine solution to the problem.
Complex Data Integration … Solved!
You see, MarkLogic turned the problem of complex data integration on its head. Rather than require months of data modeling and ETL, you just take the data as-is. Its clever “zero-latency” indexing and semantics technology enables you to organize and find your data in any fashion enabling fast response to business change. Moreover, MarkLogic handled any kind of data: hierarchical data, free text data, dynamic relationships — all without sacrificing any of the enterprise features you’d expect from the traditional big vendor solutions.
At first my mind cried out “Black Magic!” Years of relational thinking rebelled at the idea. But after hearing of our wins with similar challenges at big names like the BBC and Healthcare.gov, I realized this wasn’t magic at all. It was a fresh look at the problem, coupled with over a decade of brilliant engineering.
Sales meetings are rarely easy. I now find my biggest challenge is convincing my prospects that MarkLogic is for real! For they too have had years of relational thinking and large software vendors drawing out large and expensive solutions that barely solve the problem.
Now, I can face my prospects with a genuine solution to their complex data integration problems. Sadly, it doesn’t prevent the M6 motorway from being a nightmare.