Mapping Data Elements to Business Terms
A business glossary is a collection of terms important to a business. Unlike dictionaries, which are more general collections of words, business glossaries only concern themselves with terms that are specific to a particular business and its area of operation. All organizations, industries and business areas tend to have their own jargon. Business terms capture this jargon.
Like dictionaries, business glossaries contain a verbal definition for each term. A common vocabulary of terms with well written definitions is important to ensuring clear communication across all stakeholders. Business glossary content, however, goes beyond just a list of terms and their verbal definitions. For example:
- If a business term represents a set of controlled values such as Gender or Preferred Customer Status, in addition to defining it, one would want to associate it with the possible values e.g., Gold, Silver, etc. A business glossary is a place to capture valid definitions of values and business rules so that they could be managed and made available across the organization.
- Linking IT assets such as data elements to terms that describe the data, connects business and IT and enhances organizational collaboration and productivity. Numerous data elements across different data sources may hold similar information. For example, customer information such as customer ID, contact details, their status, etc. Data elements in different systems are often named differently, yet they hold the same type of data. Linking data assets to business glossary terms creates an important reference for business analysts, data analysts, data scientists, data governance managers, and other stakeholders.
The value is clear. However, the process for establishing such links presents a challenge for many organizations. With hundreds of data sources and many thousands of data elements, it is a large undertaking. In this blog we will describe how TopBraid EDG can help you automate this process.
Using Data Element Names to Map to Business Terms
One approach for automating connections between data elements (such as database columns) and business terms is to use column names and match them to similar term names. TopBraid offers such auto-mapping features for building crosswalks.
However, this approach has limited reliability because fields in different systems have different names for the same thing and the names can be quite obscure having little similarity with the business term name. Further, as systems evolve, sometimes data elements get co-opted to store data different from what one could surmise from their names.
Thus, the usefulness of this approach will differ between systems. Nevertheless, there are situations when this is useful and TopBraid EDG can offer mapping suggestions based purely on name matching.
Using Data to Map to Business Terms
TopBraid EDG lets users define precise rules about business term values. For example, the rule shown below describes the structure of employee IDs.
As data sources get cataloged by TopBraid EDG, it can profile the sources to capture important metadata for data elements. This metadata includes information about a number of unique values, min and max values, min and max length, physical datatype and so on. EDG can also collect some data samples.
TopBraid EDG then compares collected metadata and data samples to the data rules defined for business terms in order to identify potential mappings. For example, the screenshot below shows a recommendation to map EMPLOYEE_ID column to the Employee ID business term. In this case, the column name matches the term name well, but the same recommendation would be generated if the column name was EMP_NO or EMPLOY_CDE or EIN or anything else since this recommendation is based on matching the required field length and the Employee ID pattern.
Next recommendation suggests mapping the GENDER column to the Gender term. It is also based on the data values as opposed to the name match. When a number of unique values in a column is relatively small compared to the total number of records, TopBraid EDG recognizes that the column may contain reference data and collects frequencies statistics. It now knows that values in the GENDER column are either F or M. This matches the permissible values for the business term. Hence, we get a recommendation.
In Conclusion
TopBraid EDG can help users automate a tedious task of identifying connections between data elements and business terms. Automation makes it easier to “connect the dots” across the many enterprise data sources. It simplifies the process of adding new data sources to the enterprise knowledge graph managed by EDG. Automation also makes it easier to keep up with changes to data assets since the automated process can (and should) be run periodically.
Business glossary becomes a key reference for business analysts, data analysts, data scientists, data governance managers, and other stakeholders. It saves time for employees and increases consistency, standardization and data integrity for the entire enterprise.