NoSQL Without Multi-Document Transactions? No Way.
The promise of NoSQL is speed; but like a car without brakes, not very safe unless it allows multi-document transactions.
I recently read a blog post that took me down memory lane. It reminded me of what it was like to work a database that didn’t support transactions.
About 20+ years ago I started using MySQL. It was a fast, lightweight, free and open database. I could use MySQL from PHP or Perl code without paying for a commercial database, and it was a better option than some of the other free databases (like Postgres or Ingres). At the time, I built most of my own Linux kernels and building software from source was fine (and even preferable).
MySQL was touted as fast because it did not have transaction support. Speed and freedom were selling points. When you executed a statement in MySQL, it did not have to worry about issues like locking. Like present day Mongo (and many other NoSQL databases), inserts were atomic, but you could not perform multi-statement transactions. And because it lacked transactional consistency, the MySQL developers had the presence of mind to not add stored procedures to the database. Stored procedures without transactions is a recipe for disaster (when random failures in the hard to debug stored procedure code lead to inconsistent data). When MySQL added the InnoDB storage engine, and transactions were available, other features like referential integrity constraints became possible (another data quality ‘must-have’ in relational databases).
Yep, I used MySQL but I think I was clear-eyed about what a lack of transaction meant for writing applications. I could be fast but not necessarily correct. Fast was important to me – and to many others. For example, aggregating logs does not require a complex series of inserts but may require your solution to be fast. Atomicity is good enough when one row in the database is equal to one log entry in the HTTP access or error log. Loading data into a data warehouse was also a use-case where complex transactions were not required. So there were niches where a fast database without transactions could be used safely, but for most users of general-purpose databases, a lack of transaction support was a deal breaker.
On Usenet, it was common to see examples of MySQL workarounds and code to kind of perform a transaction. There were various terms for the code that cleaned up partial work, but what I remember was a ‘compensator.’ If a series of MySQL calls did not execute successfully, all a developer had to do was to write a compensator to clean up the database. Or, maybe schedule some cleanup code to run every hour and clean-up half-completed work. Of course, this is largely rubbish if you are really interested in data consistency. It is probably a case of bias to justify the fact that (compared to the Oracle and SQL Server wars) the open and friendly MySQL community was a breath of fresh air.
Multi-Statement, Multi-Document Transactions for the Win
In the end, MySQL finally adopted multi-statement transactions for various reasons. When it did, it became a database people could use (reliably) outside of specific use cases; if you want to be taken seriously as a database, you need to support multiple statement transactions. There was a transition period where the story was around “best of both worlds,” but I no longer see MySQL’s ability to shut off transactions as a selling point. In fact, it’s one of those things you almost never do unless you have a special purpose database for tasks like collecting web server logs.
Why the long story about MySQL? While history doesn’t really repeat itself, it does rhyme. Currently there’s a debate about atomicity in NoSQL versus full transactional support. Multi-statement or as Mongo calls them multi-document transactions are hard. They do have a performance impact since you obviously are writing to disk, but I don’t think you can be useful as a general-purpose database without supporting transactions. It’s like a car without brakes. It can go fast, but you lose a degree of control that’s essential to safe, general-purpose driving. You can justify a lack of transactions through “simple clean up code,” or some statement like X % of users don’t really need transactions, but at the end of the day transactions are key to consistent data. A database without consistent data is little better than writing the data to the filesystem. This is something we’ve known for at least the last 40 or 50 years of general business computing.
Eventually consistent #FiveWordTechHorrors
— Stewart Smith (@stewartsmith) December 11, 2013
Why Eventually Consistent Is … BAD!
But sigh, free is compelling. For a long time, despite the flaws, people built solutions based on MySQL that processed on-line commerce. I suppose we’ll never know how many companies sold something on the Internet but didn’t ship it because some PHP code died after completing the sale (or vice-versa). We’ll never know how many hours were spent writing ‘compensators.’ We’ll never know how many companies spent how many hours dealing with ‘support tickets,’ because some key part of the database wasn’t correctly updated. We’ll never know how many projects were shelved or canned because the system just ‘didn’t seem to work,’ largely because of data consistency problems.
This isn’t to condemn MySQL or Mongo. This is a message to database users to remind them that no matter how ‘free’ or ‘open’ a database may be, without certain key properties, a database isn’t really a general-purpose database. Aggregate your web server logs in Mongo? Sure. But do you really want to process a health care claim when it’s possible that only some of the data was written to the database? This is not a message spreading fear, uncertainty, and doubt. But just like a car without effective brakes might be good for the drag strip, it isn’t good for driving to the grocery store. It’s good to remember a database without transactions should not be considered a general-purpose database.