Cooking Up a Data Hub with MarkLogic
What’s in Your Data Hub?
Data Lake, Data Warehouse, Data Mart, Data Hub.
With so many similar buzzwords floating about we might as well call it Data Soup! What are all these things? How are they similar? How do they differ? I’m thinking, “I just need to load a bunch of data into one place to get at it. How hard is it to take all of my data and put it in one place and then use it?”
“It’s Actually Quite Difficult”
The answer is that it’s quite difficult to put a bunch of different data in one place and use it. Data Soup is a very fitting name. Companies spend a lot of money paying highly qualified people to cook up data soup every day. The process these chefs follow usually goes something like this.
- They understand all the data they need to load
- They design an all-encompassing data model to represent everything, or at least the most important parts
- They spend a great deal of time cooking their data soup
- They find out they didn’t account for everything
- They go back and add new ingredients (data sources) and repeat the process
That’s the way it’s often done with relational technology. MarkLogic makes it a lot easier because we don’t have to design the all-encompassing data model. (Read our blog post on how MarkLogic allows you to load data “as is”.) Regardless of MarkLogic’s ease of use, it still takes time to build a Data Hub. If only there were some sort of recipe for creating a Data Hub with MarkLogic, we would have more time left over to make dessert!
It’s your lucky day. We created a framework that speeds up the process of building a Data Hub. We are calling it the MarkLogic Data Hub. Boring name; Awesome software. We distilled out the common processes in building Data Hubs and used them to build our framework. This was easy to do because a lot of customers are building Data Hubs on MarkLogic these days.
Building a Data Hub really boils down to these 3 steps: Ingest, Curate, Access.
In this step we load ALL of our heterogeneous data into the hub as-is. Simply dump all those ingredients in the pot. If you are coming from a relational background this might sound strange. Fear not! Many ideas seem strange at first, like the notion that the Earth isn’t flat.
Think of this step like stirring the pot. Now that we have our ingredients we must mix them together. During the curation step we are enriching our data to meet our needs. Some common types of enrichment include:
- Normalizing dates and other fields
- Formatting data for indexing
- Enriching data with additional data/markup
- Using semantic triples to enhance our understanding of the data
- Performing conflict resolution between differing values from multiple systems
Now that we have created our Data Hub, we need to be able to access the data to serve our customers. MarkLogic provides a full featured REST API for getting that data out of the Hub. You can also build your own data services API that gets just the data your customers need.
Much like many other popular frameworks, the MarkLogic Data Hub uses Convention over Configuration to guide us along the process of creating a Data Hub. Simply stick to the hub’s conventions and we can have a functional Data Hub in 15 minutes.
Try it yourself. We have a Getting Started tutorial that gets you up in running with some sample data.
Do you prefer to learn through a hands-on training with a live instructor?
Sign up for our FREE, 8-hour Using the MarkLogic Data Hub course.
Hungry for more details?
Great! Check out our MarkLogic Data Hub site for more information.