I remember an interesting meeting at a very large global company that I worked at years ago. Our product documentation needed to be translated into dozens of different languages – which was a very costly process. We had a great team that did the work and the team leader was very innovative. He was always looking for ways to lower the cost per word while maintaining high quality. In a staff meeting one day he was telling us about the latest refinement his team had developed which reduced the cost per word a little further; maybe another 0.5%. Every improvement helped – there were a lot of products and a lot of words.
Most of us were impressed by the continued progress, but I remember our boss saying, “We’re solving the wrong problem. We’re decreasing our cost per word by another 0.5%, but we should be focused on decreasing the number of words. Why are our products so complicated? Why does it require so many words to describe them? That’s the right problem to fix.” This reframing is something that really stuck with me. You can optimize the heck out of a system, but if you’re turning the wrong set of knobs, you’re never going to have the kind of impact that you really want – the kind you really need.
How does this relate to the database world? Substitute “transformation” for “translation.” You may have a superb team that builds your ETL processes – particularly the transformation part. They may be wringing every last efficiency out of the process. You allow them to spend a lot of time optimizing transformations because it’s costing you so much. That’s great, but they’re solving the wrong problem. The problem isn’t how to shave off another 0.5%, it’s that you have to do so much work in the first place.
Our approach with MarkLogic turns the normal process on its head. We aren’t out to make ETL simpler; we’re out to remove the need for it. We’re not focused on the next 0.5%, but rather on the other 99.5%. That’s where massive savings will arise. I know, it’s easy to say that we do that, but where’s the proof? Rather than try to include it all here, I’ll point you to another set of posts that describe what makes MarkLogic so different from relational and ETL:
- It’s Time to Rip Off The ETL Band-Aid
- Decimating Data Silos With Multi-Model Databases
- Relational Databases Are Not Designed For Heterogeneous Data
Having said that, here’s the short answer. MarkLogic discovers structure rather than making you declare it up front as you must with relational. MarkLogic loads data “as-is” and accesses it with a universal index. Our flexible data model allows you to normalize and harmonize data as you need to. We don’t make you boil the ocean trying to create an uber-schema before you get any value.
The bottom line is that to get really big wins, you need to stop what you’re doing for a moment and ask yourself if you’re solving the right problem. If you’re not, and if your tools are creating the problems rather than helping you solve them, it’s time to move on.