The month of January, named after Janus the ancient Roman god of beginnings and transitions who is depicted as having two faces so that he may look both forward into the new year as well as back into the one that is past, is a time for reflection and looking forward.
With the introduction of MarkLogic 7 late last year, we find ourselves at a point of transition where we can look back at how we used to structure our information and look forward to 7’s Semantic Web functionality that enables us to bring our information and applications into the realm of Linked Data.
Regardless of the form your data currently takes, highly structured relational, semi-structured XML or even unstructured texts, the Resource Description Framework (RDF) model is very flexible and ideally suited to the tasks of joining disparate datasets within an organization, describing metadata about existing assets or identifying entities and their relationships found as a result of the enrichment of plain text.
In assessing the possibilities, we have to ask ourselves some important questions about the information we have; is it already interlinked, how should we identify the entities (things) within it, what are the questions we would like to ask of it and how to the create the kinds relationships, both internally amongst the data and externally to third-party data, that will enrich its value and allow more useful and valuable applications to emerge from it.
Is your data interlinked?
As the name ‘Linked Data’ suggests, information that is modeled using RDF, which underpins Linked Data, can be connected to form a graph of relationships, a ‘web of data’ as it’s often termed. Much of the information you have may fall naturally into the pattern of a graph because you are capturing information about real world things like prices and details of products and commodities or be about the people and organizations that purchase, use or trade in these things.
How do you identify the things?
Whether your information is purely for internal consumption or you wish to publish it on the web, we now live and work within the web so we need to use web-based identifiers like the Universal Resource Identifier (URI) and there are useful Identifier Patterns you can follow as described in “Linked Data Patterns” by Leigh Dodds and Ian Davis. I will be making further references to this book, as it is very useful in deed.
What questions are we asking?
One of the most fundamental ways we make use of a knowledge domain is in how we phrase questions to retrieve existing information or gain new insights. We often start by constructing sentences that describe what we want:
“As an end-user I want a list of all the books by author X.”
As you delve further into the Semantic Web technologies you will find, I think, that the sentence-oriented structure of RDF assertions (subject, predicate, object) align quite nicely with natural language and this is also true of RDF’s query language SPARQL. With this in mind, the domain experts in your, or your customer’s, organization may find that this model makes for more effective communication of requirements between teams.
How should I assert a relationship?
The way we make links between pieces of information influences the types of questions we can ask and the ease with which we can implement those questions. With regard to Dodds and Davis’ Qualified Relation pattern, it may not be enough to say that “Mr. Smith went Washington,” as in Frank Capra’s film ‘Mr. Smith Goes to Washington,’ because there is much more to be said about that trip than this simple relationship allows. For a start, you can expand the assertion and say that “Mr. Smith went on a journey to Washington” and by making a thing out of the “journey” you can make additional assertions about the journey, to qualify it, like when it happened and who was involved in it. Let’s not forget that Mr. Smith and Washington are things too and there is much we can say about them too that will enrich our pool of information.
Therefore, it is important to consider how you will support the querying of your data by the way you express relationships. It may be enough to use simple relationships, which are easy to query like “who went to Washington“, and other times it will be necessary to qualify them so you can ask “find me all the people who made journeys to Washington in 1939“, the judgment is yours to make but one of the nice things about RDF is that you can easily extend simple relations by adding the qualified ones without disrupting your existing data. In that respect, RDF is a very agile data model that allows your data to evolve as and when you need it to.
Beginning the transition
The key to such a transition is to begin with a few simple questions like the ones I‘ve briefly described above. I hope you find these useful as you look back on your existing applications — and begin to plan how you can add value to your information and services this coming year.
Happy New Year.