Why I Use SHACL For Defining Ontology Models
This is the second blog on what modeling language we recommend to use when creating ontologies. In the previous blog, I talked about what I do not use and why. In this blog I will talk about what I do use. The idea for these two blogs came from a question asked by a customer. They said:
“When do you use OWL and when do you use SHACL? Isn’t OWL about semantics and SHACL about data validation? And what about RDFS?”
My answer was that I no longer used RDFS/OWL (besides declaring classes and subclasses). I now use SHACL for everything. My previous blog was about why I no longer use OWL. This blog is about why I use SHACL.
Why I Like Using SHACL
As I said in the previous blog, I like to keep things as simple as possible. I prefer to use one language rather than two because it is simpler. If using two languages had some advantages, I would consider using OWL in addition to SHACL based on the evaluation of benefits versus costs. However, I see no real benefits in adding OWL to the mix.
SHACL delivers everything I need nicely. I work a lot with different customers of TopQuadrant. SHACL also delivers everything they need nicely. Let me go into more detail as to why I say this.
It Means Exactly What You Think It Means
There are no unexpected impacts of the open world assumption. As expected, SHACL gives you results based on the data you have, not based on what may or may not in principle exist elsewhere.
Anything you say about a class, applies to subclasses – just as you expect. Similar to OO classes, subclasses can only narrow down the inherited constraints or add additional properties. As opposed to, for example, RDFS.
It Is Easy To Learn and Understand
Mastering any data modeling language has some learning curve. SHACL is no different. However, it works exactly like one expects and modeling skills from another language (e.g., UML or XSD) are reusable. The main difference is that it is a modeling language for graph data.
When you create a model in SHACL, you do not need to think whether you should use rdfs:subClassOf or owl:equivalentClass in defining restriction on property values. You simply define what the values should be. The definitions are simpler and easier to understand.
It Is Expressive
There are many pre-built constraints – cardinalities, min and max values, min and max length, property comparison, regex, etc. They cover majority of typical requirements. And they are all used in the same, easy to understand way e.g., you don’t need to define special custom datatypes just to say that a height of something needs to be in a certain range.
You can use property paths (just as in SPARQL), you can use value lists – without creating classes. You can use SHACL to define multiple views upon the same data via not just classes but also (node) shapes. And so on.
It Is Inexcusably About RDF Graphs
When working with RDF graphs, you frequently need to express or use some things specific to RDF graphs. For example:
- Say how URIs of resources should be formed
- Say whether a value could be a blank node
- Use SPARQL, RDF query language, for your definitions
Unlike RDFS/OWL, all of this is possible with SHACL.
It Is Extensible
As rich and expressive as SHACL is, sometimes you may find needing to express something it does not support out of the box. In these cases, you can use SPARQL directly within your constraints. If you identify a pattern you need to use often and/or you want to be more declarative in your modeling, SHACL offers a standard mechanism for declaratively defining new types of constraints. In other words, you can build your own domain specific language with additional constraints, specifying parameters, validation algorithm (e.g., SPARQL expression), error message, etc. New constraint components are expressed in RDF, making it easy to share with others.
TopQuadrant, over a period of the last few years working with SHACL, identified some additional constraint types we believe to be useful. These are published at http://datashapes.org/constraints.html. And can be downloaded in the following formats: Turtle, JSON-LD and RDF/XML.
It Is Broadly Supported
All major RDF stores and platforms today support SHACL. And this support is consistent across different implementations. Yes, some may not support custom extensions, but all support SHACL Core. There is no need to guess how statements in your SHACL ontology may be interpreted or what they would mean in one implementation versus another. It is all very clear.
It Offers Integrated Support For Rules
Admittedly, SHACL Rules are not as broadly supported as SHACL Shapes and Constraints. It is a pity that SHACL Working Group wasn’t chartered to deliver this – there certainly was enough interest. In the limited time available the Working Group did produce a Note describing SHACL rules and other advanced features.
The reality is that many RDF products already have a proprietary rules language and they don’t hear enough pressure from customers to switch to SHACL Rules. If you believe this to be important, tell your vendor about it.
In the meantime, SHACL Rules are smoothly integrated with constraints and can significantly enrich your models. One approach would be to create your rules in SHACL and translate them for execution into a database specific rules language.
And It Offers Quite a Few Other Advantages
Here are just a few examples of what else is supported by SHACL. It:
- Lets you say things about how information should be presented on a form
- Provides flexible mechanisms for re-use
- Offers a way to say that there should be no other properties besides those you defined
- Allows you to say how a specific non-compliance to the model should be treated
- Defines the structure and meaning of the validation report you can expect from a SHACL-compliant system
- And more
Moving From OWL to SHACL
Increasingly, people with long experience in OWL are coming to the same conclusion as me. See, for example, https://twitter.com/AlanMorrison/status/1216759180905742339 and https://twitter.com/semantifyit/status/1146375779829321728.
If you decide to follow me and switch to using SHACL, you may have questions like:
- What do I need to be aware of as I start developing ontologies in SHACL?
- Are there any “gotchas” or significant differences that OWL users, like me, would experience in switching to SHACL?
I have some relevant experiences and resources to share. SHACL specification is an excellent reference. It has useful examples and is relatively easy to understand. I am saying “relatively” because as any standard specification, it is written in a dry formal language. TopQuadrant also has a SHACL web page with links to tutorials, webinar and other useful information.
Below are a few additional tips, hints and considerations.
Classes in SHACL
When modeling in OWL, we create classes and subclasses. We then typically define classes by adding restrictions on property values. In SHACL, this works similarly:
- We create classes and subclasses (using RDFS subClassOf statements – just like in OWL)
- We then add restrictions on property values – using property shapes
- To do this, the class you are defining needs to be a Node Shape as well as a class
A new concept in SHACL is that in addition to classes, there are also Node Shapes. I wrote a blog on the differences between classes and node shapes and when to use what.
Properties in SHACL
Another common question is whether you should use RDFS domains and ranges in addition to property constraints. Personally, I typically avoided using RDFS domains and ranges even when modeling in OWL- because there can be unexpected interplay between what you say with domain/range and what you say with restrictions. This is not an Issue with SHACL. SHACL engines will not consider any RDFS domain and range statements.
Remember that RDFS is NOT about specifying what values class instances should have, it is ONLY about classifying resources based on the values they do have. With this, I recommend using domains and ranges only if you want to discover class membership of your resources based on their data. And, of course, if the environment you use the ontology in offers RDFS reasoning. SHACL itself offers only one aspect of RDFS reasoning: members of a class include members of all its subclasses. This means that any constraints you specify for a class are true for all subclasses.
In OWL, there are properties and restrictions on properties. In SHACL, there are property shapes. While you could use types from OWL to say that a property is an Object or a Datatype property, SHACL does not consider this. It will be looking only at the constraints specified in the shape e.g., datatype and class constraints. This blog goes into more details about property shapes.
Pre-existing RDFS/OWL Ontologies and SHACL
As I write this blog, there are still many more OWL ontologies on the public web than there are SHACL ontologies. This does not mean that OWL is better or that it is used more broadly. OWL had a head start of more than a decade. If SHACL and OWL had started at the same time, the situation would be rather different. Furthermore, availability of these ontologies does not really mean that people are using them as-is in the applications and systems they build. Many OWL ontologies were developed just because they could be developed and/or bulk converted from other knowledge representation languages like, for example, OBO.
If your starting point is already existing RDFS/OWL ontology, what can you do? First, OWL and SHACL can co-exist. SHACL engines will simply ignore OWL constraints. You do not need to remove them. All the subclass declarations will still be directly used by SHACL so you would typically use the existing class hierarchy and add SHACL Node Shape as a type to classes. You can then create property definitions (property shapes) from domains, ranges and restrictions. Much of the translation can be done automatically as described in this blog.
Before you start the translation, think what you want to use the ontology for. There are many ontologies in OWL which are really glossaries or taxonomies of terms. They do not define classes in terms of properties of their members. In other words, they do not contain schema or data definitions. They simply describe classes as vocabulary terms. You can easily identify such “ontologies” because you will see no OWL restrictions or domain/range statements. Instead, you will see synonyms and other annotation properties. In this case, recognize them for what they are and translate them to SKOS and/or develop a small ontology of your own to express this information. There is no value in keeping these terminologies in OWL nor in having them in SHACL. If the terms are instances instead of classes, they will be easier to use and you will not loose any semantics.
Response to Some Criticism of SHACL
There are still people who remain very vested in OWL. As mentioned in the previous blog, the reason for this commitment may be grounded in goals that are different from mine. The topic of comparing OWL and SHACL could result in a never ending and, possibly, heated discussion. While working at IBM with methodologists, a popular joke was “What is a difference between a methodologist and a terrorist? You can negotiate with a terrorist”. The same could be said about ontologists.
Speaking more seriously, I am not interested in religious wars. I am interested in meeting practical needs the best way available. In the end, what wins is what gets used. However, it is important that people understand the pros and cons when they consider options. Selecting a modeling language for RDF that will prove to be frustrating to use may result in their abandoning RDF altogether. With this, let me focus on addressing this set of concrete objections:
https://twitter.com/us2ts/status/1237448273540943872 .
Claim 1: SHACL blurs the distinction between what is true in the real world and what is true for a specific application
The argument is that what is true in the real world is relatively stable and what is true for a given application can change frequently.
What is true in a real world is often a complex philosophical question. Real world is complex and nuanced. What is true in the real world is a matter of an opinion, point of view and context. There are a lot of “yes, it is mostly so, but sometimes …” and “it depends on how you look at it”. Changes in applications are typically driven by changes in the world and/or changes in the application’s world perspective. Describing the real world as ontology model without a context or boundaries of its intended use in an application would be a never ending exercise – and the ontology produced would still be only one projection of reality. A projection reflecting some internal beliefs of the author.
Ultimately, we develop models to be used in software applications and software applications use these models against some data. A real world person can not be processed by an application. An application can only process their “digital twin” as expressed in available data about the person.
Useful models are typically developed by focusing on the need of a specific *family of applications* (that happen to agree on the same data models), considering what data is of interest, what questions do the applications need to answer, etc. A model that tries to cover a broad domain of potential use cases without understanding them, typically delivers no or too little benefits in practical use. As a result, one struggles with such model more than one benefits from it.
Claim 2: SHACL limits reuse/sharing of an ontology for different applications
The opposite is true. SHACL offers strong support for re-use. SHACL Working Group discussed known practical issues with reusing OWL ontologies and took them into account when designing SHACL.
SHACL supports layering of models using owl:imports. This, for example, lets you combine a model that describes what is believed to be true across many application scenarios with models that extend it with definitions that apply more narrowly. SHACL uses RDFS subclassing mechanism. When you create SHACL models you define classes and more specific subclasses. Additionally, you can separate Node Shapes from classes to define application specific views.Take a look at this blog to understand differences between classes and node shapes.
SHACL recognizes that one often comes across an ontology that mostly meets their needs. They could extend it to meet the needs more fully. However, there may be a statement or two that one can’t agree with. Or that are simply outside of their scope of interest and they do not want this to complicate their use cases. Until SHACL, they would be faced with an obstacle and a dilemma – should they re-use or should they not. How can an ontology be re-used in these circumstances?
SHACL offers a solution to this. With SHACL, you can deactivate constraints that you don’t need or can’t agree with. This is done declaratively, using the standard so it is unambiguously clear what aspects of the models you are using and what aspects you are not using.
SHACL also lets you specify reasoning profiles, if any, that are required in order to use your model.
Claim 3: SHACL undermines the core purpose of having an ontology
I think the validity of this statement depends on what one believes to be the core purpose of having an ontology. And what is needed to serve this core purpose. I have an opposite opinion.
In my previous blog, I discussed why I came to a view that RDFS/OWL undermines the core purpose of having an ontology – in a context of practical real world applications.
Claim 4: SHACL encourages paving the cowpaths with Semantic Silos
This is not a new independent claim, but rather a consequence of the previous claims if one believes them to be true. I do not believe them to be true. Therefore, I don’t think this point requires a separate response.
In Conclusion
In writing this blog, I was primarily thinking of answering questions of our customers, especially customers of TopBraid EDG. EDG offers many ways to bootstrap your use of SHACL:
- There are many pre-built models in SHACL, starting with SKOS in SHACL and ending with extensive TopBraid EDG ontologies
- Building your own ontologies in SHACL is very easy in TopBraid EDG and does not require deep knowledge of RDF, Turtle, etc.
- TopBraid EDG can auto-generate SHACL from an ontology in RDFS/OWL, from spreadsheets and from RDF data
A demo version of TopBraid EDG, is available within TopBraid Composer Maestro Edition. Even if you are not using TopBraid EDG, you can download TopBraid Composer Free Edition. While it has fewer options and a more technical user interface, it will offer you a capable SHACL editor.