RDF-Star – Why, how and when should you use it?

Table of Contents
< All Topics
Print

RDF-Star – Why, how and when should you use it?

In the previous blog we discussed what is reification and how you could implement it when working with knowledge graphs.

In this blog, we will focus on one approach, RDF-star, and its support in TopBraid EDG. As mentioned before, RDF-star is a proposed extension to RDF that allows us to treat a triple statement as a resource – without using RDF reification vocabulary to declare this. When a product supports RDF-star, a triple can simply become a subject (or object) of another triple.

Key questions we will address in this blog are:

  • How to create data that is reified using RDF-star
  • What happens on import or export of such data
  • How do you query this data
  • When to use this approach to reification
How to create RDF-star data in TopBraid EDG

When you build ontologies in TopBraid EDG and define property constraints, you will see a built-in option to make statements about statements. For example, let’s look at the simple model, we created for the previous blog:

The property shape for “works for” is defined as follows:

When you add values for a constraint such as the class constraint (a constraint that says that values of “works for” must be organizations), you will see the + icon to the left of the value. If you expand it, as we have done in the image above, you will be able to enter 1) the severity level to be raised if a value is not an organization and 2) a custom message you want EDG to return if this happens.

Let’s select Warning as the severity level, save our edit and take a look at RDF in the Source Code panel. Information in the square brackets is stated using RDF-star style of reification. It captures the fact that severity level of violating this constraint for the “works for” property value is Warning.

Note that we are using [[ … ]] to surround nested information with, while RDF-star specification draft currently uses {| … |}. RDF-star is still in development. Turtle syntax and other aspects of it are not yet finalized. There are active discussions about what characters to use in the syntax. Once this is finalized, TopBraid EDG will get an updated implementation that supports the final syntax. Your current use of RDF-star in TopBraid EDG will not be impacted by this change.

OK, the option for adding information to a constraint is already built in to EDG, but how do you say it for your own data? As a reminder, our data looks like below and we want to be able to add additional facts to the “works for” relationship between Irene and TopQuadrant.

You can always use the Turtle syntax in the Source code to do this. Simply put the square brackets next to a triple you want to add information to. Between the brackets place predicates and values. However, we would like to have a more convenient, less technical support for editing and viewing such information. To make it possible to enter and view nested information in the forms, in our ontology:

  • Create a Node Shape (not a class) called Duration. (Take a look at this blog to understand differences between classes and node shapes.)
  • Give it two properties: start and end date
  • Connect it to the works for relationship between a person and organization – as shown in the screenshot below

We will now be able to enter start and end dates for the “works for” relationship between Irene and TopQuadrant.

What happens when you import or export

If we export the above data in the Turtle serialization of RDF, we will see the following in the exported file:

example:Irene rdfs:label “Irene” ;
rdf:type example:Person ;
example:worksFor example:TopQuadrant .

<urn:triple:%3Chttp%3A%2F%2Fexample.org%2Freification%23Irene%3E:%3Chttp%3A%2F%2Fexample.org%2Freification%23worksFor%3E:%3Chttp%3A%2F%2Fexample.org%2Freification%23TopQuadrant%3E>
example:startDate “2001-12-01″^^xsd:date .

Currently, the default export in Turtle does not use the extended Turtle syntax we see in the Source Code panel in TopBraid EDG. Since RDF-star is work in progress, support for it varies across different tools and some tools do not yet support any version of it. Also, the Turtle syntax for RDF-star is not yet final. However, you already have an option in EDG to use Turtle-star on export.

To ensure that exported RDF serialization is understood by any RDF compliant tool, on export, EDG will generate a URI representing a triple that was reified. This way, exported RDF is fully standard compliant. This the strategy used for the default Turtle export as well as the other RDF serializations for which  “-star” extension is not yet defined.

You see the generated URI in bold blue in the Turtle snippet above. It is constructed from the reified statements using the following pattern: urn:triple:<uri of a subject>:<uri of the predicate>:<uri or literal value of the object>. All URIs are HTML encoded e.g., every “/” in the URIs is replaced by “%2F”.

If you import such data into TopBraid EDG, it will understand what statements this refers to. Alternatively, if you prepare data for import into EDG, you can also use the Turtle-star syntax with square brackets.

How to query RDF-star data

The RDF-star specification is still being developed. In addition to defining the extension to RDF, it will also define the extension to SPARQL – SPARQL-star. This will allow users to query both standard and nested triples in SPARQL. Once the specification is complete and there is a reference implementation, TopBraid EDG will support SPARQL-star. In the meantime, you already have the following query options:

  • GraphQL – GraphQL in TopBraid EDG lets you query nested information
  • SPARQL function – TopBraid EDG implements a function that lets you query nested information

In GraphQL, you can formulate the query as shown below and get JSON results. For properties that are defined in an ontology as “reifiable by”, you will be able to use in GraphQL <property>_reif to access available information about statements with the <property> predicate. In our case, we can use worksFor_reif.

For more on GraphQL, take a look at our blog on using GraphQL with TopBraid EDG.

The SPARQL function provided in TopBraid EDG for accessing or creating nested data is called tosh:reificationURI. It should be used in a BIND statement with three parameters for the subject, predicate and object of the reified triple statement. This function returns a URI we can use to refer to the statement we want to retrieve nested facts about – or to add nested facts to.

The example below shows how to compose and run a SPARQL query using this function.

This query returns people that work for some organizations provided we have their start dates.

There are two more useful SPARQL functions:

    tosh:reificationURIOf can be used to convert a reification URI (e.g., produced by tosh:reificationURI) back into subject, predicate and object components.tosh:reifiedValue provides direct access to a nested value of a reified triple, e.g. the start or end dates.

tosh:reificationURIOf is a property function. This means it is used in the triple match patterns of the WHERE clause as opposed to the BIND statements. It requires a URI of the reified statement on the left hand side and three unbound variables for subject, predicate and object on the right hand side. For example:

SELECT *

WHERE { ?uri example:startDate ?start .  

               ?uri example:endDate ?end .

               ?uri tosh:reificationURIOf ( ?s ?p ?o ) .

             }

In this query, we first iterate over all statements that have nested information about them containing both start and end dates and then disassemble them into subject, predicate and object components to learn the original triples that have been reified.

tosh:reifiedValue is primarily a convenience function to look up values if you already know the subject, predicate and object. It is used in the BIND statements and must have 4 parameters – the first three are the subject, predicate and object of the statement you want to look up nested values for and the fourth is the predicate for the nested value. For example:

BIND (tosh:reifiedValue (example:Irene, example:worksFor, example:TopQuadrant, example:startDate) AS ?start)

This will get the start date for Irene at TopQuadrant – if such date exists.

When to use RDF-star

RDF-star approach can offer a convenient way to add statements about statements. However, other approaches also exist – as discussed in the previous blog. TopBraid EDG supports all of them. Decision on which one to use will depend on your data and context.

There are two main indications for selecting RDF-star approach:

1. Only some of the statements will have facts added to them.

For example, additional information about the “works for” relationships between people and organizations may exist only for a relatively small subset of people. Otherwise, as discussed in the previous blog, you will be better off adding a class Employment, creating instances of it and capturing employments relationships with people and organizations instead of the direct “works for” statements.

The example in the beginning of this blog (where we added severity level to the sh:class constraint of a property shape) fits this pattern. It is more typical to not have additional information about a constraint. By default, when a constraint is not met, the severity is assumed to be Violation. Because of this, we only need to specify a severity if it is different from Violation.

2. When you must have the underlying statement

This is the case for our sh:class constraint example. According to the SHACL standard, to specify sh:class constraint, we must have a a triple with sh:class as predicate. Thus, it is best to use some form of RDF reification if we need to add additional information to a constraint.

If you must comply with some ontology and have to use certain predicates, but, yet, want to be able to capture information about them, RDF-star is an attractive option.


Categories

Related Resources