Q&A from the EDW 2021 Presentation: “Data Cataloging with Knowledge Graphs” – part1
TopQuadrant participated in the recent Enterprise Data World conference with our CEO, Irene Polikoff, giving a presentation on “Data Cataloging with Knowledge Graphs“. We had a lot of interactions and interesting questions from the talk attendees. If you attended the conference, but missed the talk, recordings are available at the conference site. If you did not have a chance to attend, we invite you to listen to our recent webinar on the same topic. The webinar was not identical to the talk. However, it covered some of the similar content.
Questions we received during the conference ranged from “What are the open semantics and APIs and how they make a difference” to “What is supported by TopBraid EDG”. We believe they offer insights on the kind of topics people are thinking about when considering and implementing data catalogs. This part 1 of the two part blog explores these questions and how we think about answers. It includes some of the questions Irene answered directly following the talk, as well as those that were left unanswered because of the time constraints. Part 2 is available here.
We would like to hear your thoughts on these questions and our answers – as well as any additional questions you may have. Write to us, we always appreciate your input! With this said, lets move on to the questions:
1. Can you also restate the knowledge graph formats that are available?
There are two most commonly used graph data models – Property Graphs and RDF (Knowledge) Graphs. The latter one is used by TopBraid EDG. Other graph data models are also possible (e.g., hypergraphs), but these two are what is being used today in the 95+% of cases.
See this white paper for a comparison between the Knowledge Graphs and Property Graphs.
2. How are knowledge graphs positioned against data catalogs? Do they enrich the data catalogs or do they replace data catalog vendors like Collibra?
Knowledge Graphs is a technology. Our presentation was focused on discussing why this particular technology fits so well the requirements for data catalogs. For example, a lot of the value of the data catalog information lies in its graph structure – how assets are connected to each other, to the broader organizational context and to vocabularies describing them.
As described in the talk, Knowledge Graphs readily support evolution. They can grow and evolve very flexible as the scope of your data catalog evolves. Some of the information in a catalog can be gathered directly from a source, this is primarily technical metadata. At the same time, a lot of business and operational information need to be either collected from the subject matter experts or inferred based on the already known facts and rich ontologies. Knowledge Graphs excel at supporting reasoning. As a result, we believe that a data cataloging solution implemented using Knowledge Graph technology is more powerful and a better fit for today’s enterprises. Many industry analysts (e.g., Gartner) agree.
3. Can you clarify what open APIs are and how they differ from regular APIs?
This is a great question. There are multiple aspects to the answer:
Open APIs that can evolve as the model of information evolves. In a traditional solution that is not model driven and is not based on extensible models, API is pre-determined by a vendor. If a user adds a new type of an asset or adds a new field to existing asset type, the APIs do not change. They are closed because they do not reflect evolution of the information managed by the solution. Solution based on the Knowledge Graph technology can provide API (both, read and write) which evolves as the underlying model evolves. For more specific details, see https://www.topquadrant.com/querying-topbraid-edg-with-graphql/.
Another aspect is open APIs in a sense of being standards compliant as opposed to proprietary. Knowledge Graphs are defined and accessed using a rich set of the underlying standards e.g., model languages like SHACL, query languages like SPARQL and GraphQL.
4. Using your APIs – do we have the facility to ingest custom metadata and connect the dots to form the lineage in addition to building catalog.
Example – I want to connect my data vendors delivering files to linking to orchestrations to ETL loaders to storage objects to API reading the data to apps consuming the data to the users using those apps…and on and on.
Yes, you can ingest any metadata. You will need to extend EDG models if they do not already define the metadata you want to ingest. As described in the answer to the ontology question below, a user can readily add new classes and properties to the pre-built ontologies in EDG. Additionally, as described in the Querying TopBraid EDG with GraphQL blog, you can use APIs to add dynamically new properties as they get identified at the time of ingestion.
Having said this, connections you have described are already defined in EDG ontologies. We further recommend watching our webinar “Semantic Knowledge Graphs are the Governance Architecture of the Future”. It explores how “to be governed” assets can be dynamically modeled and the necessary metadata collected at any time as an ongoing part of any enterprise system – as it evolves from design and in use.
5. How much knowledge of ontology development is needed to use your platform?
TopBraid EDG comes complete with ontologies describing data assets and associated information e.g., technology assets. You can start using it immediately without doing any ontology development.
You can also expand and modify these pre-built models as needed. For more information on how to do this “manually” in EDG, see this recent blog that offers a summary overview of EDG ontologies.