services-hub-hero

MarkLogic Data Curation

On-Demand Training

services-education-illustration

Course Description

Learn to build a MarkLogic Data Hub powered by the MarkLogic database to help accelerate data integration projects and deliver faster time to value to your customers. This course is only recommended if you are using the MarkLogic Data Hub or want to learn the Hub Central interface.

By completing this course you will be able to:

  • Develop, test, debug and deploy custom code using a local IDE (Visual Studio Code)
  • Use custom code during ingest, mapping and mastering
  • Implement an entity model that includes nesting and relationships
  • Load data from a variety of sources
  • Load data using a variety of methods and describe the use cases and best practices for each method
  • Use custom code during data ingest
  • Implement mapping configurations for a more complex data model
  • Implement smart mastering configurations with more complexity and customization

Audience

Data Architect, MarkLogic Developer, Data Engineer

Duration

8 hours

Course Outline

Data Services First

  • Understand the high-level approach to data integration projects using the MarkLogic Data Hub
  • Understand the customer and business requirement for the course hands-on project
  • Understand the user stories and technical requirements for the course hands-on project
  • Understand the data sources available for the course hands-on project

The MarkLogic Data Hub

  • Understand what it is
  • Understand what it does
  • Initialize and install a new MarkLogic Data Hub project

Implement Security

  • Create users and roles for both business users and members of the technical project team
  • Understand how to use Data Hub specific roles
  • Implement role hierarchies
  • Assign execute privileges necessary to meet project requirements
  • Deploy security configuration using QuickStart and ml-gradle

Create an Entity

  • Create a new entity
  • Define properties
  • Configure Indexed
  • Protect access to PII (personally identifiable information)

Ingest Data

  • Create flow pipelines
  • Configure ingestion steps
  • Understand the purpose and use of the staging and final databases in a MarkLogic Data Hub
  • Implement key data modeling concepts including document URIs, collections, document permissions, property naming best practices, geospatial data modeling patterns, denormalization, and the use of the envelope pattern

Curate Data

  • Configure mapping steps
  • Use pre-built mapping functions
  • Program, deploy and use a custom mapping function
  • Test and debug mapping steps

Use Semantics

  • Understand key semantic data modeling concepts including triples, IRIs, ontology triples, managed and unmanaged triples
  • Load triples to a MarkLogic Data Hub
  • Program, deploy and use a custom harmonization step to add triples to the envelope of a document

Access Data

  • Explore the use of JavaScript APIs
  • Explore the use of SPARQL
  • Validate that the curated data from the hub can be used to meet the business and technical requirements for the hands-on project

Adapt to Change: Perform Another Iteration of Ingest | Curate | Access

  • Ingest a new data source
  • Curate the new data so that it can be consumed in the same way as existing data

Use Smart Mastering

  • Configure a matching step
  • Configure a merging step
  • Test Smart Mastering
  • Explore mastered data

How to Subscribe

Instructor-Led Option

This course is available as a free publicly scheduled instructor-led course! Please, refer to our schedule to select the most suitable date for you.

See dates
Services prefooter banner

Interested in the Class?

Stay up to date with technology trends and get the most out of your Progress technology investment.