The 2015-16 Christmas and New Year period in the UK will be remembered yet again as a festive season dominated by news footage of flooded homes, collapsed bridges and breached defences coupled with the human misery of refuse skips full of sodden carpets, rotten MDF furniture and wrecked electrical appliances. This year it was the North of England’s turn to bear the brunt of the rainfall on both sides of the Pennines with knock-on effects being felt along the tributaries and then cumulatively in the core river basins.
More insidiously these floods overwhelm the delicate balance between the drinking water and sewage systems so it’s not just a simple question of drying out, haggling with the loss adjusters and redecorating – all of the water supply infrastructure needs to be checked for cross contamination.
As ever, media attention focused on what was responsible – was it El Nino – or, after the recent Paris accord, are we really seeing the climate change effects of Global Warming?
As enterprises drown in data, it might be best to look to Hydrology — the science of macro scale water management to find insights on how best to manage enterprise and public data management services. Whereas we have only had a few years to grapple with our tsunami of bits and bytes, mankind has been managing the collection, distribution and cleansing of water since ancient times for agriculture and sustenance as well as using it as a sustainable source of mechanical and electrical power. Let’s take a page from Hydrologists and look at what we may learn.
The use of physical structures to collect and store water for future use has been key to enabling the human race to colonise areas of the earth with only limited periods of rainfall during the year. Other species such as beavers are also well known for their construction of dams as part of managing their habitats.
The Romans are probably the best-known exponents of building water distribution systems at scale to support the expansion of their empire and to demonstrate the value of their governance model to the tribes and countries they controlled.
Since the origin of the written word we have also been storing information and managing it as a scarce resource — the Great Library of Alexandria is the best-known example from the ancient world.
Note however that we should not consider these ancient libraries as simple data vaults or lakes. Many ancillary functions grew up around them. For example during the later monastic libraries of the Middle Ages religious texts were copied and translated and checked for theological purity before being distributed to the parishes.
The arrival of the printing press started the technological transformation of data, firstly Jacquard’s punched cards provided the means of standardising the storage of data for mechanical consumption and then the telegraph enabled automated dissemination alongside the railway networks that funded its deployment.
As well as collecting water to sustain life, mankind has needed to build defences against flooding both from tidal seaside threats and inland rivers. The Dutch have become acknowledged as the leaders in not only protecting their country from the sea — but also reclaiming large areas of land using dykes and polders.
Before humans developed the tool and techniques to reshape the physical world they learned that there were natural buffering techniques such as floodplains and water meadows that not only allowed for natural changes in water levels but also support specialised wildlife and agriculture.
Just as the scribes and monks developed supply chains to copy and translate handwritten documents the pioneers of data automation in the punched card and tabulator era had invented processes and technologies to handle the physical volume of cards and paper tapes as well as to repair them when they got torn or wore out. These techniques evolved further in the batch processing and magnetic tape era of 1960’s and 70’s COBOL dominated data processing.
Ever since humanity started to live in communities there has been a need to use water for hygiene functions and as a transport to get rid of refuse and waste. The development of London’s sewage system by Bazalgette is now recognised alongside the contributions of Stephenson and Brunel as a key social transformation of the UK in the 19th Century.
More recently however the continuous challenge of measuring and maintaining water purity across a complex distribution network was highlighted by the recent cryptosporidium outbreak in Northwest England in 2015, which took much longer than anticipated to be purged.
We have many international standards for data interchange but very few for quality – even basic checksums to ensure integrity during delivery are not consistently defined and used.
The Credit Crisis of 2008 has led to some thinking and analysis of the systemic risk caused by the mass shared consumption of highly interdependent data but it is only the recent rise of hacking and cyber attacks such as Stuxnet that has focused attention on the quality of the content and the need to protect against tampering.
Much of the debate and angst about the level of detail required by the regulators for Dodd-Frank and MiFID II transaction regulations has yet to play out – it will be interesting to see what will be learned and published over time about what is really flowing through our financial data networks how much is “Garbage In Garbage Out?”
The use of water as a power source provides the natural conjunction between the world of hydrology and information technology.
The development of storage based hydroelectric power schemes such as Dinorwig that provide burst capacity for peak consumer loads are also needed to support the development of dynamically scaling cloud computing platforms that can quickly require extra megawatts for processing and cooling.
Mankind has used water as a form of transport for mobility and commerce since first floating down rivers on fallen trees. Archimedes’ famous principle allowed the systematic design and construction of larger scale vessels, which of course enabled the great voyages of discovery — and the subsequent trading corridors that they in turn enabled.
The development of the Suez and Panama canals has direct parallels with the evolution of data networking – both have had significant physical upgrades recently to enable larger container ships and bulk carriers to use the locks and basins similar to the trajectories of Moore’s and Metcalfe’s laws that have provided us with the capacity and distribution models for the internet age.
Water has been a core tenet of many political and military strategies including Roosevelt’s construction of the Hoover Dam to mitigate some of the effects of the depression in the 1930s.
It is traditional in the UK over the festive season to show a number of famous films depicting British military successes – it was somewhat ironic however that The Dam Busters was shown on Christmas Eve just before the second tranche of rainfall arrived that caused the major flooding.
Information has always been a core plank of strategic and political intrigue and lies at the heart of the dark arts of codes, ciphers diplomacy and espionage. The key differentiators in the digital age are the speed/latency where huge amounts of money are still being spent to optimize paths between participants on an electronic exchange to increase trading profits and the balance between cryptography and surveillance.
History has taught us to study and celebrate many of our engineering feats — the heroes of data management and engineering, however, are much less well known than their peers.
Physical water management infrastructures such as the Pont Du Gard or the Thames Barrier have far more visual impact than the hum of cooling fans and the array of flashing network port lights in an anonymous datacentre.
It is worth noting that while we continue to store data in ever-denser media that makes it seem invisible — we cannot compress water (remember 1 cubic metre weighs 1 tonne). However the impending demise of Moore’s law as we hit the limits of photolithographic manufacturing will start to change our thinking back to physical volume management.
The civil engineering profession has made great strides in recent decades in delivering large scale projects on time and budget – this has helped garner trust and support from political leaders wanting to champion infrastructure projects.
Data projects also need strong sustained leadership and support within organisations – many a reference and MDM project has run into quicksand and been shelved after only a couple of investment cycles as political support has evaporated and no tangible deliveries have emerged.
As with rainfall and its subsequent journey out to sea the manufacture, delivery and aggregation of data varies over time so point solutions are soon made obsolete as the UK has learned to its cost this winter with many local flood defence schemes being overwhelmed.
Data management is capital intensive – whilst we treat drinking water as a scarce resource that needs to be preserved, the production, storage and consumption of data globally continue to grow at an exponential rate.
Even within a single IT project, it is normal to manufacture and store multiple copies both to preserve the integrity of the production data and also support the engineering development of the systems that consume it.
Ironically, unlike water, these copies are usually made less pure due to the deliberate obfuscation of sensitive content such as account details, social security numbers etc.
Data projects have a pervasive effect on the information and business process ecosystems they underpin in the same way that the civil engineering activities in a large scale hydrology project will have significant environmental impact.
As a result, many special interest groups and “protected” species & habitats are uncovered during the planning and design phases that can lead to extended review and design cycles.
We have yet to embrace the notions of “heritage” and “conservation” in the world of data management because as we have already noted most content can be cheaply copied and transformed for backwards/legacy compatibility.
Information, like water, is a core element of human society; without it, we have no shared understanding about which we can communicate or transact. However it can also be both physically hazardous and toxic to personal and economic life when not understood, managed or purified.
The piecemeal approach of ‘make do’ and ‘mend that’ has plagued many political responses to flooding around the globe has direct parallels with the lack of consistent data management approaches in governments and enterprises.
It took from the dawn of mankind until Snow and Whitehead in the Victorian era to demonstrate need to systematically map the consumption of water to determine the spread of Cholera and the subsequent regulation of the quality of its content and distribution – hopefully, we will learn our data quality and management lessons faster in the digital era.
Hopefully this article has highlighted that with both water and data, it is not enough to build collection and distribution systems at scale – the lineage, purity and consumption of the content are always changing due to many factors and needs to be constantly monitored in real time throughout the infrastructure.
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
A data platform lets you collect, process, analyze, and share data across systems of record, systems of engagement, and systems of insight.
We’re all drowning in data. Keeping up with our data – and our understanding of it – requires using tools in new ways to unify data, metadata, and meaning.
A knowledge graph – a metadata structure sitting on a machine somewhere – has very interesting potential, but can’t do very much by itself. How do we put it to work?
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.Request a Demo