Part 1: How to Model and Manage Entities with UML
Part 2: Introduction to the UML-to-Entity Services Toolkit: UML Modeling with MarkLogic’s Entity Services
Welcome to the continuation (i.e., the third blog) of my series that explores modeling for MarkLogic’s Entity Services using the Unified Modeling Language (UML). In Part 1, I introduce the concept of using UML notation to visually depict the model and explain how to seamlessly transform the UML model to MarkLogic’s Entity Services model descriptor format, which I demonstrate in Part 2 using movies an example, and also introduce my UML-to-Entity Services toolkit for model-driven MarkLogic development.
Here, we will examine modeling for a mixed document/semantic database. Besides showcasing semantics, my example also demonstrates how to use UML modeling with MarkLogic’s Data Hub Framework (DHF). The source code for this UML modeling example is on GitHub.
Our data model (Figure 1) describes employees and departments in a company’s human resources repository. The company is the fictional GlobalCorp. Our model is based on an example included in DHF’s GitHub repository.
Figure 1: Sample data model with departments and employees
The two main classes are
Employee. A department has an ID (
departmentId) and a name (
departmentName). An employee has an ID (
employeeId), name (
lastName), salary (
bonus), hire status and dates (
title), plus addresses, phone numbers, and emails. The latter are complex types, hence the four additional classes—
GeoCoordinates (a type within
Address)—which are datatypes used by
Relationships are particularly significant in this example; for example, an employee
reportsTo another employee and an employee is a
memberOf a department. We might physically represent these relationships by using document references or containment. The employee document, for example, could contain an attribute called
memberOf, whose value is the ID of the corresponding department. However, GlobalCorp has decided to represent these relationships using semantic triples instead for the following reasons:
If you look carefully at the model, you will see it is peppered with semantic, or “sem”, stereotypes. The model makes use of the Entity Services UML profile included in the toolkit. The profile defines semantic and other stereotypes used to map UML to Entity Services. Using these stereotypes, GlobalCorp is able to describe in the model the IRIs, RDF types, and RDFS labels of employees and departments. It also relates these entities using predicates defined by the W3 organizational ontology.
Here is a breakdown of the semantic stereotypes used in GlobalCorp’s model:
Employeeclasses bear the stereotype
semType. This stereotype associates with each class an RDF semantic type. Department’s RDF type is https://www.w3.org/ns/org#OrganizationalUnit. Thus, from a semantic perspective, a department is an organizational unit as defined by W3C’s organization definition. Employee’s RDF type is friend-of-a-friend (FOAF) ontology.
Employeeclasses also define an IRI. The purpose of the IRI is to uniquely identify a department or employee when we use it as the subject or object for a semantic triple. In each class we nominate one attribute to serve as the IRI, stereotyping that attribute as
Department, that attribute is
Employee, it is
empIRI. Notice that each of the IRI attributes also bears the stereotypes
exclude. Thus, these IRI attributes are merely calculated fields, used to help construct triples. That attribute will not be included in the XML document representation of the department or employee. The
concattag indicates how the IRI’s value is calculated. For example
deptIRIis the concatenation of “http://www.w3.org/ns/org#d” and the department ID.
Departmentthat attribute is
Employee, it is
empLabel. Notice that
departmentNameis not a calculated field; it is a full-fledged attribute that will also appear in the department’s XML document.
empLabel, on the other hand, is an excluded field whose value is calculated from the
reportsTo, which relates one employee to another, is a
semPropertywith the predicate https://www.w3.org/ns/org#reportsTo. Thus if employee A reports to employee B, we construct a triple whose subject is the IRI of employee A (employee A’s
empIRI), whose predicate is the one given, and whose object is the IRI of employee B (employee B’s
empIRI). Notice the
excludestereotype; the XML representation of an employee will not contain the
reportsToelement. We will maintain the relationship solely using a triple.
semPropertywith predicate https://www.w3.org/ns/org#memberOf. The triple we create has the employee’s
empIRIas subject, the predicate given, and the department’s
deptIRIas object. This relationship is excluded from the document.
Specifying the stereotypes in the model is beneficial because the toolkit’s transform module, which maps the UML model to Entity Services, understands these semantic stereotypes and generates code to create triples based on the content of the document. For example, here in Figure 2 is the code the toolkit generates to create employee triples showing that every aspect of this code arises from the semantic stereotypes:
let $semIRI := map:get($options, "empIRI") return ( sem:triple(sem:iri($semIRI), sem:iri("http://www.w3.org/2000/01/rdf-schema#label"), map:get($options, "empLabel")), sem:triple(sem:iri($semIRI), sem:iri("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"), sem:iri("http://xmlns.com/foaf/0.1/Agent")), sem:triple(sem:iri($semIRI), sem:iri("http://www.w3.org/ns/org#memberOf"),sem:iri(map:get($options, "memberOf"))), sem:triple(sem:iri($semIRI), sem:iri("http://www.w3.org/ns/org#reportsTo"),sem:iri(map:get($options, "reportsTo"))), sem:triple(sem:iri($semIRI), sem:iri("http://xmlns.com/foaf/0.1/name"),map:get($options, "empLabel")) )
Figure 2: Auto-generated code creating triples based on semantic stereotypes
Figure 3 shows some example triples describing employee 114, his superior, and his department. He is a FOAF agent named Earl Garza who reports to Ruth Shaw (employee 1) and is a member of R&D (department 4).
To learn more about semantics and the MarkLogic Data Hub Framework, refer to the following resources:
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
A data platform lets you collect, process, analyze, and share data across systems of record, systems of engagement, and systems of insight.
We’re all drowning in data. Keeping up with our data – and our understanding of it – requires using tools in new ways to unify data, metadata, and meaning.
A knowledge graph – a metadata structure sitting on a machine somewhere – has very interesting potential, but can’t do very much by itself. How do we put it to work?
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.Request a Demo