We are excited to announce the availability of the MarkLogic Connector for AWS Glue. AWS Glue is a serverless ETL tool provided as a managed service in the AWS cloud ecosystem. When connected to MarkLogic, AWS Glue provides a simple way to build data pipelines for moving data in and out of MarkLogic using visual and code-based interfaces. To get started, subscribe to the MarkLogic Connector for AWS Glue on the AWS Marketplace.
AWS Glue provides a fully-managed, serverless Apache Spark infrastructure to graphically create, run, and monitor ETL pipelines. Its graphical interface, Glue Studio, automatically generates code, saving developers time and effort from the challenges of coding and optimizing Spark jobs.
Using AWS Glue, developers can build ETL pipelines using readily-available connectors for AWS services like Aurora, RDS, S3, Redshift, Kinesis, and DynamoDB as well as third-party databases like Oracle or SnowFlake. It provides a data catalog and a rich library of out-of-the-box data transformations (like filter, joins, etc.) to easily model ETL pipelines in Glue Studio. Additionally, developers can choose to code a data pipeline in either Scala or Python.
Note that for those who want to use Apache Spark with MarkLogic but are not using AWS Glue service, we have also released a MarkLogic Connector for Apache Spark.
MarkLogic customers now can easily use AWS Glue to implement Spark ETL pipelines for fast data ingestion and data export.
The MarkLogic Connector for AWS Glue makes it simple to bulk load or stream relational and non-relational data as is into MarkLogic. Additionally, it provides the flexibility of using Glue’s data transformation capabilities to combine and transform tabular data from multiple sources into hierarchical data formats like JSON before loading into MarkLogic.
As an example, users can easily use the new Glue connector to build a batch or a change data capture pipeline to load complex data (or source entities) into MarkLogic Data Hub Service. Once loaded, Data Hub Service has the necessary capabilities to integrate source data into durable data assets for later use in operational and analytical applications.
The MarkLogic Connector for Glue also makes it easy to consume data from MarkLogic with complete security and governance. Users can easily build scalable data pipelines for complex analytical processing using Spark libraries (like machine learning, SQL, etc.) on clean, curated, and governed data in MarkLogic. Additionally, users can also leverage MarkLogic’s multi-model querying capabilities to securely share fit-for-purpose data with various AWS services like SageMaker, Redshift, S3, and other third-party data stores like Snowflake.
To use the MarkLogic Connector for AWS Glue, simply subscribe to the connector in the AWS marketplace. Once subscribed, the MarkLogic connector will appear in your AWS Glue studio, where users can graphically build data pipelines.
To get started, follow along with the hands-on, step-by-step tutorial. To learn more about configuring the MarkLogic Connector for AWS Glue, please check out the documentation here. AWS Glue documentation is available here.
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
Get info on recent and upcoming product updates from John Snelson, head of the MarkLogic product architecture team.
The MarkLogic Kafka Connector makes it easy to move data between the two systems, without the need for custom code.
MarkLogic 11 introduces support for GraphQL queries that run against views in your MarkLogic database. Customers interested in or already using GraphQL can now securely query MarkLogic via this increasingly popular query language.
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.Request a Demo