Data Platform

ProgressBlogs MarkLogic Connector for AWS Glue Now Available on AWS Marketplace

MarkLogic Connector for AWS Glue Now Available on AWS Marketplace

by Ankur Jain

Posted on April 07, 2021 0 Comments

We are excited to announce the availability of the MarkLogic Connector for AWS Glue. AWS Glue is a serverless ETL tool provided as a managed service in the AWS cloud ecosystem. When connected to MarkLogic, AWS Glue provides a simple way to build data pipelines for moving data in and out of MarkLogic using visual and code-based interfaces. To get started, subscribe to the MarkLogic Connector for AWS Glue on the AWS Marketplace.

What Is AWS Glue?

AWS Glue provides a fully-managed, serverless Apache Spark infrastructure to graphically create, run, and monitor ETL pipelines. Its graphical interface, Glue Studio, automatically generates code, saving developers time and effort from the challenges of coding and optimizing Spark jobs.

Using AWS Glue, developers can build ETL pipelines using readily-available connectors for AWS services like Aurora, RDS, S3, Redshift, Kinesis, and DynamoDB as well as third-party databases like Oracle or SnowFlake. It provides a data catalog and a rich library of out-of-the-box data transformations (like filter, joins, etc.) to easily model ETL pipelines in Glue Studio. Additionally, developers can choose to code a data pipeline in either Scala or Python.

Note that for those who want to use Apache Spark with MarkLogic but are not using AWS Glue service, we have also released a MarkLogic Connector for Apache Spark.

Using AWS Glue with MarkLogic

MarkLogic customers now can easily use AWS Glue to implement Spark ETL pipelines for fast data ingestion and data export.

MarkLogic Connector for AWS Glue

High-performance Data Ingestion

The MarkLogic Connector for AWS Glue makes it simple to bulk load or stream relational and non-relational data as is into MarkLogic. Additionally, it provides the flexibility of using Glue’s data transformation capabilities to combine and transform tabular data from multiple sources into hierarchical data formats like JSON before loading into MarkLogic.

As an example, users can easily use the new Glue connector to build a batch or a change data capture pipeline to load complex data (or source entities) into MarkLogic Data Hub Service. Once loaded, Data Hub Service has the necessary capabilities to integrate source data into durable data assets for later use in operational and analytical applications.

Secure Data Sharing

The MarkLogic Connector for Glue also makes it easy to consume data from MarkLogic with complete security and governance. Users can easily build scalable data pipelines for complex analytical processing using Spark libraries (like machine learning, SQL, etc.) on clean, curated, and governed data in MarkLogic. Additionally, users can also leverage MarkLogic’s multi-model querying capabilities to securely share fit-for-purpose data with various AWS services like SageMaker, Redshift, S3, and other third-party data stores like Snowflake.

Get Started

To use the MarkLogic Connector for AWS Glue, simply subscribe to the connector in the AWS marketplace. Once subscribed, the MarkLogic connector will appear in your AWS Glue studio, where users can graphically build data pipelines.

To get started, follow along with the hands-on, step-by-step tutorial. To learn more about configuring the MarkLogic Connector for AWS Glue, please check out the documentation here. AWS Glue documentation is available here.

MarkLogic

Ankur Jain

View all posts from Ankur Jain on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.

Comments

Comments are disabled in preview mode.

Topics

More From Progress

Shadow Analytics: Why You Can’t Afford to Leave It Unchecked

Then, Now and Beyond: The Future of Back Office Software

2022 Progress Data Connectivity Report

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Country/Territory

Blog

MarkLogic

Semaphore

OpenEdge

DataDirect

Sitefinity

Telerik

Kendo UI

Corticon

DataDirect

MOVEit

Chef

Flowmon

Kemp LoadMaster

WhatsUp Gold

Telerik

Kendo UI

Fiddler

Test Studio