Progress Acquires MarkLogic! Learn More
BLOG ARTICLE

Create Custom Steps Without Writing Code with Pipes

Back to blog
03.06.2020
4 minute read
Back to blog
03.06.2020
4 minute read
City at night with illustration of a global network

Are you someone who’s more comfortable working in Graphical User Interface (GUI) than writing code? Do you want to have a visual representation of your data transformation pipelines? What if there was a way to empower users to visually enrich content and drive data pipelines without writing code?

With the community tool Pipes for MarkLogic Data Hub, you can. Pipes allows you to create a custom step for your Data Hub without writing code – instead, you simply connect blocks. Pipes provides a low-code solution to designing and running logic within a MarkLogic Data Hub.

What is Pipes?

Pipes is a tool for the MarkLogic Data Hub that produces code for a custom step using a GUI. There may be a case when you need to extend out-of-the-box Data Hub functionality with a custom step – with Pipes, you create Data Hub 5.x custom steps without coding.

Note: Pipes for MarkLogic Data Hub is a community tool. As such, it is not supported by MarkLogic Corporation and is only updated and corrected based on a best-effort approach. Any contribution or feedback is welcomed to make the tool better. Pipes is designed to run on MarkLogic 10.0-2 with DHF 5.1.0 installed.

Who is Pipes for?

Pipes is targeted towards data analysts and Data Hub developers. For data analysts, Pipes allows users to drive logic inside the Data Hub using a GUI, and build and tweak flows on their own. Instead of writing code for your custom step, you can define your complex data transformations with mouse clicks, much like drawing a diagram on a whiteboard.

For developers, Pipes gives you a starting point to accomplish tasks very quickly using building blocks so you don’t have to start from scratch. In addition, using building blocks in a GUI allows you to better communicate custom step functionality to business users, ensuring everyone is on the same page.

How Does it Work?

Simply put, Pipes converts visual blocks in a GUI into JavaScript for a custom step. The GUI uses LiteGraph, an open-source, node-based programming framework, which provides a UI and engine used to design and execute a visual graph in JavaScript. The graph is composed of building blocks, each associated with code executed in MarkLogic shared libraries. You design your step using these blocks, then, in the current version, the LiteGraph engine executes the graph inside MarkLogic. In the upcoming version Pipes will directly produce plain JavaScript code to be executed in MarkLogic.

Pipes can be used to build out complex scenarios, such as:

  • Transforming multiple different source documents into multiple harmonized entities in one go
  • Producing an array of entities from a single source file
  • Harmonizing an entity from different source files (e.g. document + reference / meta-data)
  • Documenting the provenance (origin) of every harmonized data point

In addition, Pipes has a Live-Preview Function where you can preview exactly what your custom step will output, either from a random doc in the source collection, or using a specific URI.

Create Your Own Blocks

You can extend Pipes by creating your own blocks, which adds features and functionality.  If you need a specific computation or transformation that you plan to re-use in multiple places, you can create a custom block for it. Considering that a block can implement any logic and have its own settings, it’s also possible to provide high-level features such as:

  • Value mapping: Map value from input to output based on a dictionary which is configured in the UI.
  • Customer transactions: The block takes the customer ID as an input and returns the transaction statistics configured in the block. (e.g., the sum of the transactions performed during the last year).
  • PROV-O : The graph can generate provenance data to be stored alongside the data.
  • Custom transformations: use existing or purpose build JavaScript libraries to transform data. For example, to do coordinate conversion or buffering.

Use Cases for Pipes

While working on the Pipes, I have come across some situations where the tool has been handy to speed up development that include:

  • Customizing envelopes: Add information to headers and triples, as well as dynamically add collections (e.g., based on some lookup, logic, computation, etc.). If you don’t want to include the attachment or want to put only URIs (no content) or some other lineage information, you can do that with Pipes.
  • Customizing URIs in final database: Define your own URI for documents in Final, instead of using the one from Staging.
  • Nesting and harmonizing: Re-use “values” in multiple contexts by computing sub-objects, which can then feed into multiple different elements of the final entities.
  • Handling data without 1:1 mapping: Handle mismatches between input and output records when raw input records don’t match your entity model (e.g., you have 10 CSV “rows” and need to join them).
  • Combining operations into a single step: Combine multiple “steps” into one transaction to avoid dealing with the partial states in between two steps, in case one fails.

Future Plans

Pipes currently uses the LiteGraph engine to execute the graph. We are currently preparing a new engine which generates Javascript code within MarkLogic so that Pipes will generate the code directly for better performance. The new engine is currently in beta and will soon be able to manage all existing Pipes blocks.

Get Started

Play around with the tool and let us know your thoughts and feedback so we can improve the tool. We like to hear your input for further improvements, and want to understand how your projects benefit from it. Or just pinpoint what’s missing (and add it yourself).

Get started today with this GitHub Wiki guide for Pipes.

Related Resources

Pipes GitHub Wiki Documentation — Get started with your first Pipes project.

Pipes GitHub Repository — Clone or download the tool today. Explore documentation and videos. Submit issues or tickets using GitHub issues.

Pipes Technical Resources — Explore the technical resources related to Pipes. Find documentation, blogs, demos, and more.

Eric Poilvet

Eric is director Solutions Architecture at MarkLogic. He supports the company's customers from the pre-sales phases to the release of solutions. He is involved, among other things, in manufacturing, media and insurance industries on the design of innovative solutions based on MarkLogic operational DataHub.

Eric is currently based in France and previously spent 2 years in MarkLogic London office.

Read more by this author

Share this article

Read More

Related Posts

Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.

Developer Insights

Multi-Model Search using Semantics and Optic API

The MarkLogic Optic API makes your searches smarter by incorporating semantic information about the world around you and this tutorial shows you just how to do it.

All Blog Articles
Developer Insights

Part 3: What’s New with JavaScript in MarkLogic 10?

Rest and Spread Properties in MarkLogic 10 In this last blog of the series, we’ll review over the new object rest and spread properties in MarkLogic 10. As mentioned previously, other newly introduced features of MarkLogic 10 include: The addition of JavaScript Modules, also known as MJS (discussed in detail in the first blog in this […]

All Blog Articles
Developer Insights

Part 2: What’s New with JavaScript in MarkLogic 10?

V8 Engine Upgrade in MarkLogic 10 In this continuation of the series, What’s New with JavaScript in MarkLogic 10?, we’ll discuss in detail about the new API’s the V8 Engine upgrade comes with. In the previous blog, we mentioned how MarkLogic 10 comes with a ton of new features, some of which include: The addition […]

All Blog Articles

Sign up for a Demo

Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.

Request a Demo