Data Platform

ProgressBlogs Cooking Up a Data Hub with MarkLogic

Cooking Up a Data Hub with MarkLogic

by Paxton Hare

Posted on April 20, 2016 0 Comments

What’s in Your Data Hub?

Data Lake, Data Warehouse, Data Mart, Data Hub.

With so many similar buzzwords floating about we might as well call it Data Soup! What are all these things? How are they similar? How do they differ? I’m thinking, “I just need to load a bunch of data into one place to get at it. How hard is it to take all of my data and put it in one place and then use it?”

“It’s Actually Quite Difficult”

The answer is that it’s quite difficult to put a bunch of different data in one place and use it. Data Soup is a very fitting name. Companies spend a lot of money paying highly qualified people to cook up data soup every day. The process these chefs follow usually goes something like this.

They understand all the data they need to load
They design an all-encompassing data model to represent everything, or at least the most important parts
They spend a great deal of time cooking their data soup
They find out they didn’t account for everything
They go back and add new ingredients (data sources) and repeat the process

That’s the way it’s often done with relational technology. MarkLogic makes it a lot easier because we don’t have to design the all-encompassing data model. (Read our blog post on how MarkLogic allows you to load data “as is”.) Regardless of MarkLogic’s ease of use, it still takes time to build a Data Hub. If only there were some sort of recipe for creating a Data Hub with MarkLogic, we would have more time left over to make dessert!

young boy with soup ladle

A Recipe

It’s your lucky day. We created a framework that speeds up the process of building a Data Hub. We are calling it the MarkLogic Data Hub. Boring name; Awesome software. We distilled out the common processes in building Data Hubs and used them to build our framework. This was easy to do because a lot of customers are building Data Hubs on MarkLogic these days.

Building a Data Hub really boils down to these 3 steps: Ingest, Curate, Access.

Ingest

In this step we load ALL of our heterogeneous data into the hub as-is. Simply dump all those ingredients in the pot. If you are coming from a relational background this might sound strange. Fear not! Many ideas seem strange at first, like the notion that the Earth isn’t flat.

Curate

Think of this step like stirring the pot. Now that we have our ingredients we must mix them together. During the curation step we are enriching our data to meet our needs. Some common types of enrichment include:

Normalizing dates and other fields
Formatting data for indexing
Enriching data with additional data/markup
Using semantic triples to enhance our understanding of the data
Performing conflict resolution between differing values from multiple systems

Access

Now that we have created our Data Hub, we need to be able to access the data to serve our customers. MarkLogic provides a full featured REST API for getting that data out of the Hub. You can also build your own data services API that gets just the data your customers need.

But How?

Much like many other popular frameworks, the MarkLogic Data Hub uses Convention over Configuration to guide us along the process of creating a Data Hub. Simply stick to the hub’s conventions and we can have a functional Data Hub in 15 minutes.

Feeling Skeptical?

Try it yourself. We have a Getting Started tutorial that gets you up in running with some sample data.

Do you prefer to learn through a hands-on training with a live instructor?

Hungry for more details?

Great! Check out our MarkLogic Data Hub site for more information.

MarkLogic

Paxton Hare

View all posts from Paxton Hare on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.

Comments

Comments are disabled in preview mode.

Topics

More From Progress

Shadow Analytics: Why You Can’t Afford to Leave It Unchecked

Then, Now and Beyond: The Future of Back Office Software

2022 Progress Data Connectivity Report

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Country/Territory

Blog

MarkLogic

Semaphore

OpenEdge

DataDirect

Sitefinity

Telerik

Kendo UI

Corticon

DataDirect

MOVEit

Chef

Flowmon

Kemp LoadMaster

WhatsUp Gold