Using the UML-to-Entity Services Toolkit for UML Modeling with Data Hub and Semantics
Welcome to the continuation (i.e., the third blog) of my series that explores modeling for MarkLogic’s Entity Services using the Unified Modeling Language (UML). In Part 1, I introduce the concept of using UML notation to visually depict the model and explain how to seamlessly transform the UML model to MarkLogic’s Entity Services model descriptor format, which I demonstrate in Part 2 using movies an example, and also introduce my UML-to-Entity Services toolkit for model-driven MarkLogic development.
Here, we will examine modeling for a mixed document/semantic database. Besides showcasing semantics, my example also demonstrates how to use UML modeling with MarkLogic’s Data Hub Framework (DHF). The source code for this UML modeling example is on GitHub.
Overview of the Employee-Department Data Model Using Semantics
Our data model (Figure 1) describes employees and departments in a company’s human resources repository. The company is the fictional GlobalCorp. Our model is based on an example included in DHF’s GitHub repository.
Figure 1: Sample data model with departments and employees
The two main classes are
Employee. A department has an ID (
departmentId) and a name (
departmentName). An employee has an ID (
employeeId), name (
lastName), salary (
bonus), hire status and dates (
title), plus addresses, phone numbers, and emails. The latter are complex types, hence the four additional classes—
GeoCoordinates (a type within
Address)—which are datatypes used by
Relationships are particularly significant in this example; for example, an employee
reportsTo another employee and an employee is a
memberOf a department. We might physically represent these relationships by using document references or containment. The employee document, for example, could contain an attribute called
memberOf, whose value is the ID of the corresponding department. However, GlobalCorp has decided to represent these relationships using semantic triples instead for the following reasons:
- Since reporting structures and department memberships change frequently, GlobalCorp prefers to keep the representation of these relationships soft and maintain current relationships by updating triples than by re-routing document references.
- GlobalCorp recently acquired rival firm ACME and has decided to use the standard W3C organizational ontology to represent that merger semantically. Having already started down the semantic road, GlobalCorp has decided to use the same ontology to represent human resource relationships.
- GlobalCorp realizes the potential of SPARQL to run powerful human resource queries, such as the ability to build an organizational reporting tree without having to traverse employee documents.
If you look carefully at the model, you will see it is peppered with semantic, or “sem”, stereotypes. The model makes use of the Entity Services UML profile included in the toolkit. The profile defines semantic and other stereotypes used to map UML to Entity Services. Using these stereotypes, GlobalCorp is able to describe in the model the IRIs, RDF types, and RDFS labels of employees and departments. It also relates these entities using predicates defined by the W3 organizational ontology.
Here is a breakdown of the semantic stereotypes used in GlobalCorp’s model:
- semType stereotype: Both
Employeeclasses bear the stereotype
semType. This stereotype associates with each class an RDF semantic type. Department’s RDF type is https://www.w3.org/ns/org#OrganizationalUnit. Thus, from a semantic perspective, a department is an organizational unit as defined by W3C’s organization definition. Employee’s RDF type is friend-of-a-friend (FOAF) ontology.
- IRI definitions: The
Employeeclasses also define an IRI. The purpose of the IRI is to uniquely identify a department or employee when we use it as the subject or object for a semantic triple. In each class we nominate one attribute to serve as the IRI, stereotyping that attribute as
Department, that attribute is
Employee, it is
empIRI. Notice that each of the IRI attributes also bears the stereotypes
exclude. Thus, these IRI attributes are merely calculated fields, used to help construct triples. That attribute will not be included in the XML document representation of the department or employee. The
concattag indicates how the IRI’s value is calculated. For example
deptIRIis the concatenation of “http://www.w3.org/ns/org#d” and the department ID.
- RDFS labels: Each class also defines an RDFS label. In semantics, it is a good practice to associate a user-friendly label with the IRI . As with IRI, we nominate one attribute in each class to serve as the label; we stereotype it as
Departmentthat attribute is
Employee, it is
empLabel. Notice that
departmentNameis not a calculated field; it is a full-fledged attribute that will also appear in the department’s XML document.
empLabel, on the other hand, is an excluded field whose value is calculated from the
- Employee reportsTo Employee: The association shown as
reportsTo, which relates one employee to another, is a
semPropertywith the predicate https://www.w3.org/ns/org#reportsTo. Thus if employee A reports to employee B, we construct a triple whose subject is the IRI of employee A (employee A’s
empIRI), whose predicate is the one given, and whose object is the IRI of employee B (employee B’s
empIRI). Notice the
excludestereotype; the XML representation of an employee will not contain the
reportsToelement. We will maintain the relationship solely using a triple.
- Employee memberOf Department: The association between
semPropertywith predicate https://www.w3.org/ns/org#memberOf. The triple we create has the employee’s
empIRIas subject, the predicate given, and the department’s
deptIRIas object. This relationship is excluded from the document.
Specifying the stereotypes in the model is beneficial because the toolkit’s transform module, which maps the UML model to Entity Services, understands these semantic stereotypes and generates code to create triples based on the content of the document. For example, here in Figure 2 is the code the toolkit generates to create employee triples showing that every aspect of this code arises from the semantic stereotypes:
let $semIRI := map:get($options, "empIRI") return ( sem:triple(sem:iri($semIRI), sem:iri("http://www.w3.org/2000/01/rdf-schema#label"), map:get($options, "empLabel")), sem:triple(sem:iri($semIRI), sem:iri("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"), sem:iri("http://xmlns.com/foaf/0.1/Agent")), sem:triple(sem:iri($semIRI), sem:iri("http://www.w3.org/ns/org#memberOf"),sem:iri(map:get($options, "memberOf"))), sem:triple(sem:iri($semIRI), sem:iri("http://www.w3.org/ns/org#reportsTo"),sem:iri(map:get($options, "reportsTo"))), sem:triple(sem:iri($semIRI), sem:iri("http://xmlns.com/foaf/0.1/name"),map:get($options, "empLabel")) )
Figure 2: Auto-generated code creating triples based on semantic stereotypes
Figure 3 shows some example triples describing employee 114, his superior, and his department. He is a FOAF agent named Earl Garza who reports to Ruth Shaw (employee 1) and is a member of R&D (department 4).
To learn more about semantics and the MarkLogic Data Hub Framework, refer to the following resources: