TQ logo

Complete TopQuadrant Documentation

Reshaping Relational Data using SPINMap

Last edited September 19, 2014

When integrating multiple sources of data or federating queries across multiple relational sources, it is rare for different schemas to align as well as you might like. In most cases, you’ll need some transformations. There are many ways to reshape relational data, and each presents its own advantages and difficulties. This tutorial shows how to use SPINMap to reshape the most typical relational patterns and transform relational data.

Overview of SPINMap

The SPINMap vocabulary (http://spinrdf.org/spinmap) is a collection of reusable design patterns that reflects typical best practices in Ontology Mapping.

Ontology Mapping is the process of establishing links between RDFS/OWL classes so that instances of the target classes are created from the values of instances of the source classes. Quite often, the source ontology might be automatically generated from legacy data sources such as spreadsheets, XML files and databases. And, accordingly, the source data will be in a spreadsheet, XML or relational format. In our example we are focusing specifically on transforming relational data.

SPARQL is a rich language that can be used for many purposes. The SPARQL CONSTRUCT keyword is particularly useful to define rules that map from one graph pattern (in the WHERE clause) to another graph pattern. This makes it possible to define sophisticated rules that map instances from one class to instances of another one. The SPIN framework provides several mechanisms that make the definition of such SPARQL-based mapping rules easier. In particular, SPIN makes it easy to associate mapping rules with classes, and SPIN templates and functions can define reusable building blocks for typical modeling patterns.

Once developed, mappings in the SPINMap models can be executed by a SPIN engine together with any other SPIN rules. TopBraid Composer provides a visual editor that makes it easy to establish ontology mappings by dragging and dropping and filling in forms. It also includes the TopSPIN engine for executing transformations in a local client environment. In production deployments, SPINMaps may run on a TopBraid Live server, which includes a server-based version of the TopSPIN engine. TopBraid Insight also uses SPINMaps to dynamically bring together and merge data from diverse data sources.

To follow this tutorial, you will need to have TopBraid Composer (TBC) and be familiar with using it. You should also read the introductory material on SPINMap in TopBraid Composer's on-line Help under Application Development Tools > SPIN > Ontology Mapping with SPINMap and SPINMap: SPARQL-Based Ontology Mapping With a Graphical Notation on TopQuadrant's blog. You may also want to review the following:

Introducing the HR database Example

We will demonstrate the relevant SPINMap mechanisms using the Oracle HR sample schema. The files to create the database are included with the Oracle distribution; a MySQL version is also available, as are instructions for porting it to SQL Server.

To follow this tutorial, you may either instantiate an RDBMS, generate an ontology representing its schema and define transformations for data stored in RDBMS or by-pass the steps of working directly with the RDBMS and download the TopBraid project spinmap.rdbexamples.topbraid.org.zip that contains the ontology representing the schema and a data dump converted into RDF. After downloading the project, unzip it into any directory and then use Import -> General -> Existing Projects into Workspace to add this project to your TBC workspace.

If you do use the RDBMS as your starting point, after using the downloaded DDL statements to create a database you will need to create a connection to the database using the D2RQ import wizard in TBC. You can find a tutorial for this on the TBC Help page "Importing Relational Databases with D2RQ."

Once you establish connection with the database, you will see its schema represented as an ontology. As shown in the diagram below, each class, such as jobs and employees, represents a table from the original RDBMS.

We will now define mappings between the ontology representing the HR database and our target ontology.

Exercise 1: Directly Mapping Literal Values and Relationships

Before we start reshaping data, let’s look at a simple case where we map some source classes and their relationships directly to some target classes with parallel properties.

For this example we will work with the hr:employees and hr:jobs classes created from the relational database's employees and jobs tables. First, we'll create our target ontology. Open a new RDF/OWL File in a project directory. Give the file a name and base URI. Once it opens, create target:Employee and target:Job classes and properties in the target namespace that directly correspond to properties in the source ontology. You may also add a target:Department class and related properties, because these will be used later in this tutorial. Or, you could just use the target-schema.ttl file provided in the TopBraid project.

To start mapping, we'll create a new SPINMap file with TopBraid Composer. If using version 4.6 or higher, right click on a project, select New, and then SPINMap Mapping File…. (If using 4.5 or earlier, right click on a project, select New, and then RDF/OWL/Spin File…., selecting SPINMap Ontology Mapping Vocabulary as one of the Initial Imports on the Create RDF/OWL/SPIN File dialog box. - TopBraid/SPIN/spinmap.spin.ttl). Now, import our source and target ontologies.

If you are using the provided TopBraid project, the source ontology is in the hr-schema.ttl file. Source data for the HR model is in hr-data.ttl.

Note: We also recommend that you import a graph with the source data, even if it is just a small subset of data. Since SPINMap provides a preview of the transformations, as you define mappings, you will be able to immediately see what the target data will look like.

In TopBraid Composer, select a source class, such as hr:employees, switch to the Diagram view, and drag and drop a target class into the panel. Connect the source class with the target class by drawing a line between them (a line should start from a triangle in the right top corner of the hr:employees class to a corresponding triangle in the target:Employee class) and then selecting a target (mapping) function. In this example we use the spinmapl:changeNamespace target function. This function simply replaces the namespace of the source resource with whatever argument you provide to create URIs for the instances of the target class—in this case, the target namespace.

Note that mapping two classes creates something called a mapping context. In this case, we created an hr:employees-Employee mapping context. Drag and drop hr:jobs, hr:departments and their corresponding target classes to the model, repeat the steps and create an hr:jobs-Job mapping context and an hr:departments-Department mapping context. A mapping context can be also used as input to functions that work in another context. We will see how this is done in the next section.

A diagram below shows a Create dialog box for the hr:jobs-Job mapping context. Note that the dialog box has a “Preview samples” window; when source data is imported, this shows a preview of the transformation. This is why we recommend that you include some sample data when developing mappings. This immediate feedback will greatly simplify your mapping activities and will help you to better understand the various mapping functions.

We can distinguish two types of properties in the model: datatype and object properties. In our model we map the datatype properties using the spinmap:equals mapping function as shown below. This dialog box opens once a connection between appropriate properties is created.

Note that the preview in a dialog box shows three columns: Source Instance column displays id of the source resources; arg1 shows the mapped-from property values for the source resource; Result shows generated property values of the target resources.

To map the object type properties we use the spinmap:targetResource function. This function takes the resource value for the property you are mapping from and uses the mapping context provided as a parameter to find the resource value for the target property. For example, when mapping hr:employees_job_id to target:employeeJob we provide as a parameter hr:jobs-Job mapping context as shown below.

To see all transformations, run inferencing, making sure that the inference engine is configured to be TopSPIN. (See Running SPINMap Transformations for additional information on this.) At the bottom of the screen in the Inferences tab you will see that for each instance of hr:employees, hr:jobs and hr:departments, mappings generated an instance of the corresponding target classes with the mapped property values. Target resources are connected to each other in the way that corresponds to how the source resources are connected, as shown below for one instances of target:Employee class.

Exercise 2: Inverting a Relationship

In Exercise 1, the normal interpretation of the relationship between job and employee would be expressed as “Employee performs Job”. Let’s suppose that a more relatable expression for our audience would be “Job is performed by Employees”. SPINMap provides function that allows us to do this. We start by adding the new object property target:jobEmployee to our target:Job class. Declare the domain of this property as target:Job and its range as target:Employee.

We then create a mapping context that maps the hr:employees class to the target:Job class. This may seem counterintuitive, since we are not going to create jobs from employees. Jobs will only be created from the source hr:jobs class. However, we need this in order to find the employees associated with a given job. To do this, we need to create a mapping from the source class to the target class that corresponds to the relationship we need to invert–from employees to jobs.

The mapping uses the spinmapl:relatedObjectContext function. Draw a line from the hr:employees class to the target:Job class. In the Create Mapping Context dialog, select spinmapl:relatedObjectContext as the target function. For the predicate argument, enter the hr:employees_job_id property. We are using the spinmapl:relatedObjectContext function because jobs are objects in the {?employee hr:employees_job_id ?job} triples. There is also a spinmapl:relatedSubjectContext function that we will use later in this tutorial.

For the context input variable use the jobs-Job context.

The next step is to create the mapping that will generate the values of the target:jobEmployee property. To do this we draw a line from the hr:employees class to the target:jobEmployee property. The Create new Mapping dialog box will appear. Select Use the instances as arguments for the mapping and click OK. Then, use the spinmapl:targetResource function with the jobs-Job context to create the mapping.

When the SPINMap engine encounters an instance of hr:employees, it will:

Below is the finished mapping diagram. A completed example of this exercise is provided in the project file exercise2.ttl.

If we run inferences, we will see how target jobs are linked to the target employees.

Exercise 3: Flattening an Object Property (One-to-Many) Relationship

Suppose we want to see the names of all of the employees in a department associated directly with the department. In the source database, names of employees are part of the employees table (class) since there is one-to-many relationship between a department and its employees. We again start by defining our target department class and mapping the hr:departments class to it.

Now add the hr:employees class to the diagram and create a context between it and target:Department. This time we are going to use the spinmapl:relatedObjectContext function. For the context argument, use the context just created for hr:departments to target:Department. For the predicate, use hr:employees_department_id.

In this case, the SPINMap engine is looking at all the instances of hr:employees. It then takes the value of hr:employees_department_id for each of those instances and applies the departments-Department mapping context to find the target:Department instance. Each new instance will get target:departmentEmployee values, which we’ll create by concatenating the first and last name of the source instance of hr:employees. In order to do so, we need to map the hr:employees_first_name property to the target:departmentEmployee property. To do so, select the spinmapl:concatWithSeparator function and then use hr:employees_last_name as the second argument and a space as the separator.

Since multiple employees are working for a given department, the mapping will find the same target:Department multiple times and create multiple values for its target:departmentEmployee property–one for each of the employees that work in a department. By running inferences, we will see that we now have created only target departments and that each department has employee name properties.

A completed example of this exercise is provided in the project file exercise3.ttl. The mapping should look as follows:

Exercise 4: Flattening a Many-to-One Relationship

Let’s say you wanted to simplify the view of the hr:employees class. You don’t need all the detail in the existing class, but the employee’s job title is an important property that you want to display for an employee without having to click through to the jobs class. In the source database, job titles are held in a separate table (class) because they are shared across employees.

We’ll start by redefining our version of what employees should look like. For this exercise, we create a target:Employee class with only 4 properties. Note that the value for one of them (job title) must be obtained from the source class for jobs.

First, we map the employee classes and the datatype properties of hr:employees that we want to have in the target class.

To get the data from the hr:jobs class, we drag it onto the diagram and draw a line from it to the target:Employee class to create a new mapping context. In the Create Mapping Context dialog box select the spinmapl:relatedSubjectContext target function. For the context argument, use the context that was just created to map hr:employees to target:Employee. For the predicate, use the object property that relates hr:employees to hr:jobs. Click OK to close the dialog box.

This creates a new mapping context that the SPINMap engine can use to find a target:Employee instance when given an hr:jobs instance (resource). We can then map any property of the hr:jobs resource to the target:Employee resource—in this case, the job title.

A completed example of this exercise is provided in the project file exercise4.ttl. The final mapping should look as follows:

If we run inferences, we will see that only instances of target:Employee are created and that they have the job title property.

Exercise 5: Flattening a Many-to-Many Relationship

As you may have already guessed, we can combine spinmapl:relatedObjectContext and spinmapl:relatedSubjectContext to flatten a many-to-many relationship. For example, maybe we are more concerned about the job functions performed in a department than the actual employee in the job role. We can add a job property to our target:Department class by creating a new mapping context to map hr:jobs to target:Department. This is a related subject context (spinmapl:relatedSubjectContext) that references the related object context created earlier that links hr:employees to target:Department. It specifies the hr:employees_job_id property of the hr:employees as the predicate.

When the SPINMap rules run, the SPINMap engine finds all the hr:employees subjects that have the current instance of hr:job as the value of its hr:employees_job_id property. It then applies the hr:employees to target:Department mapping context to that hr:employees instance to find the ultimate target:Department to map to. By running inferences, we will see that we now have created only target departments and that each department has job title properties.

The completed example of this exercise is provided in exercise5.ttl. The mapping should look as follows:

Exercise 6: Conditional Mapping

Working with conditional mappings requires version 4.6 of TopBraid Composer. To explain conditional mapping, we will use the simple database shown below designed to store the contact information for people.

A person can have many different contact types. Rather than to have a separate column in the Person table for each type of contact, there is a table called Contact. This table stores contact information in a generic string column contactInfo. Data in this column is then qualified with a type in the contactType column. The values in this column may be values such as “Email”, “home phone”, “skype id”, or “twitter id”, each describing a different type of contact information.

Suppose we want to flatten this schema and create specific properties for a subset of ways in which a person may be contacted in the person class itself. Here is how our target Person class might look.

We begin the mapping as usual by creating a context to map people:Person to target:Person. Next, create a context to map people:Contact to target:Person using spinmapl:relatedObjectContext.

In the mapping diagram, we then connect the people:contactInfo property to each of the contact properties that we want to populate in the target class.

Left as is, this mapping will populate all the target properties with all the people:contactInfo values where the contact person matches. This is not what we want.

Let’s open the people:Contact class in Form view. Here we see the spinmap:rule values for the lines that were just drawn on the diagram view. Expand the target:email mapping rule by mousing over the star icon next to it and clicking on a plus sign to show the nested form.

Change the type of the mapping from spinmap:Mapping-1-1 to spinmap:Conditional-Mapping-1-1. When we right click in the nested form (although not on a specific field) and select “Add all matching properties” to the mapping we see the spinmap:condition property appear. This property takes an ASK SPARQL query as its value. The spinmap:Conditional-Mapping-1-1 template uses this query to determine if the mapping should be performed. If the query yields true then the source is mapped to the target as specified by the expression. If the query yields false then no mapping is done. You reference the source resource being mapped with the variable “?source”. In this case, we will provide a query that only returns true if the people:contactType property of the people:Contact instance being mapped has a value of “Email”.

You can repeat this for each of the mapped properties, providing the appropriate contactType value in each condition query. Once you have the model done you can run inferences to see the results.

A completed example of this exercise is provided in the file contact-info.ttl.

Running SPINMap Transformations

To run SPINMaps in TopBraid Composer, select "Run Inferences" from the "Inferences' menu. (Make sure that Composer is configured to use the TopSPIN engine. For more on this, see the Composer online help page "Run and Configure Inference Engines.") Source resources and all information about them will be created as inferred statements.

TopBraid Insight will automatically execute SPINMaps when users query for information across data sources. For a tutorial on TopBraid Insight, see http://topquadrant.com/docs/tbi/tutorial/.

In TopBraid Live, SPINMaps will typically run as a service. They take care of the transformation part of the ETL (extract-transform-load) process. The service could be put together using SPARQLMotion or SPARQL Web Pages. For an example of a SPARQLMotion script that runs SPINMap transformations, see the short video at http://www.topquadrant.com/sparqlmotion/Videos/Code-hierarchy-transform/Code-hierarchy-transform.html.

Conclusion

This tutorial demonstrated common mapping patterns for the data originating from a relational database. You have learned how to use SPINMap to align the source data with the target model by:

By reading the TopBraid Composer help page introducing SPINMap you have also learned how to create relationships from the flat data. This pattern is more common when your source is a spreadsheet, but could also happen with the source data coming from a relational database.

With this, you should be well-equipped to do your own SPINMap project for data transformation and to develop data mappings needed to support federated queries with TopBraid Insight. For even more advanced exercises, you may want to explore how to extend SPINMap with your own mapping functions.