Explaining ‘Load As Is’
Technology marketers walk a fine line. Often we rely on metaphors and snappy language to articulate differentiators and capture imaginations. Sometimes those phrases work really well. But at other times, particularly as the product evolves, we can out clever ourselves and cause confusion — or worse — jeopardize credibility. One of the terms we came up with over the last decade of MarkLogic’s existence is that we “load [data] as is.” Some quibble with that — and with good reason. But there was no malice or misinformation when we derived it. To understand the phrase is to understand our roots — when we were largely an XML database. MarkLogic ingested text and PDFs and, as such, “ingested, as is.”
But when we are talking about ingesting relational or unstructured data that has been sharded — and therefore resides across many tables in a relational database — well our pithy phrase loses its luster and sets unrealistic expectations. I asked our Chief Architect Jason Hunter to help come up with a new metaphor or a more accurate description for load as is. Instead, as he is wont to do, he came up with a story:
Relational data requires you define everything beforehand. Relational assumes you know everything’s type, maximum length, and cardinality. It assumes you know the full set of things you’ll want to model. It also assumes you know what your queries will look like so they can be index optimized.
But what if you don’t?
MarkLogic lets you load without telling us what’s coming. We’ll figure out its type. We don’t care about lengths. We don’t need to worry about cardinality. And if something shows up tomorrow unexpected, we can still store it just fine. And our index model make it so more queries can be run quickly without having to hand-code an index for that particular query.
This makes the process of loading data easier. It tends to win us business when the data being loaded isn’t centrally controlled, is changing over time, or there are new sources being added. Or when the data has text components, which can really leverage our indexes.One client of ours were receiving XML payloads they wanted to store. MarkLogic would be vastly easier to save them in because they really would be saved “as is.” But the client’s data was held in Oracle across a wide array of tables. To load from Oracle into MarkLogic required putting back together the original form of the data.
Mongo CEO Max Schireson’s analogy of Mongo as an object store is as a parking garage. If you park your car in a relational database they’ll take it apart, put the spark plugs over here, the tires over there, the steering wheels over there. Then when you want your car back they’ll reassemble it. Document stores just store the car as a car, easy to retrieve and easy to query. So that way we store cars “as is.”
But if you start with spark plugs and tires and steering wheels (as you have in a 3rd normal form relational model), then you still have to put the cars back together to load them into MarkLogic. But hey, we take any shaped car, no matter how many wheels!
Ok, it’s not pithy. But it’s accurate.