← Back to blog

Advanced Join Speeds Data Model Development for Digital Twins or Advanced Analytics

The most essential operation when combining data from multiple sources is the Join operation. Used to combine rows from different tables the Join operation relies on matching values in specified fields or columns. Real world data originating in separate systems often contains field values that do not match exactly even though they refer to the same object. These differences can arise due to system differences, use of different naming conventions or weak compliance to standards. Data engineers building data models drawing on disparate sources must spend significant effort wrangling the data to account for such differences. This can entail complicated manipulations to achieve joins based on matches that are “near” or less than exact matches.

Advanced Joins to the Rescue

The new Advanced Join transformation in Element Unify enables users to quickly combine data from various sources based on matching multiple relevant data fields and using matching approaches including “fuzzy” and “contains” matching.

Advanced Join speeds the data wrangling effort, giving the modeler a convenient and flexible method for performing useful and tighter joins. Other benefits include the reduction in record duplication and the ability for customers to deploy terminology standards for common assets across their business.

The Advanced Join transformation supports three joining methods:

  • Exact Matching: This will match those strings that are equal, or the contents of the text are the same.

  • Fuzzy Matching: This method uses several techniques to match strings within a configurable similarity threshold. The user can select a value between 70-100, to define how similar strings can be. 70, means a lower similarity; 99 means almost similar, and 100 means an exact match.

  • Contains Matching: This method will match strings if the contents of one string are contained on the second string.

Figure 1: Advanced Join Dialog Box

Using Advanced Joins the engineer can easily perform successive refinement of their join, making decisions based on the amount of data matched at a given similarity level. Furthermore, the Advanced Join transformation allows the engineer to create joins based on multiple columns and use different joining methods for different columns as needed e.g. ExactJoin (Column A, Column B) and FuzzyJoin (Column C, Column D) and ContainsJoin (Column D, Column E).

Want to Learn More?

Register to view a live demo of Unify or sign up for a free trial to use the software for 30 days. You can also purchase a Personal License from the AWS Marketplace or Azure Marketplace.

Questions? Please contact us.