Structure Preserving Semantic Matching (SPSM)
Structure Preserving Semantic Matching is a type of semantic matching that takes two tree-like structures and produces a global similarity score between the trees and a set of correspondences between the nodes. The correspondences preserve a set of structural properties, namely:
- one-to-one correspondences between semantically related nodes,
- leaf nodes are matched to leaf nodes and internal nodes are matched to internal nodes.
A set of mappings produced by SPSM when comparing two extract of University course catalogs is presented in Figure 1. Note how the set of mappings preserve the above mentioned structural properties. A comparison with the other semantic approaches which do not preserve these set of structural properties can be found in the Examples and comparison page.
The main functionality of this type of semantic matching technique is to compare function definitions. Functions can be represented as tree like structures where each function name is an internal node, and each parameter is represented as a leaf node. The goal of SPSM when comparing functions (for example, web services) is to establish whether two functions are similar enough (given by the global similarity score) and to create an adaptor that maps the components of these functions. The structural properties of SPSM guarantees that functions are mapped only to functions (internal nodes) and parameters are mapped only to parameters (leaf nodes). The additional property of generating only one-to-one correspondences between nodes allows us to generate an adaptor where one function or parameter in one tree is mapped only to one function call or parameter in the other tree respectively.
Let us consider an example of approximate SPSM between the following web services: T1:get_wine(Region, Country, Color, Price, Number_of_bottles) and T2:get_wine(Region(Country,Area), Colour, Cost, Year, Quantity), see Figure 2. In this case the first web service description requires the fourth argument of the get_wine function “Color” to be matched to the second argument “Colour” of the get wine function in the second description. Also, “Region” in T2 is defined as a function with two arguments “Country” and “Area”, while in T1, “Region” is an argument of get_wine. Thus, “Region” in T1 must be passed to T2 as the value of the Area argument of the Region function. Moreover, Year in T2 has no corresponding term in T1. Notice that detecting these correspondences would have not been possible in the case of exact matching by its definition.
In order to guarantee successful web service integration, we are only interested in the correspondences holding among the nodes of the trees underlying the given web services in the case when the web services themselves are similar enough. At the same time the correspondences have to preserve two structural properties of the descriptions being matched: (i) functions have to be matched to functions and (ii) variables to variables. Thus, for example, Region in T1 is not linked to Region in T2. Finally, let us suppose that the correspondences on the example of Figure 2 are aggregated into a single similarity measure between the trees under consideration, e.g., 0.62. If this global similarity measure is higher than empirically established threshold (e.g., 0.5), the web services under scrutiny are considered to be similar enough, and the set of correspondences showed in Figure 2 is further used for the actual web service integration.
The SPSM approach follows the work in (Giunchiglia et al., 2008). The matching process is organized in two steps: (i) node matching and (ii) tree matching. Node matching solves the semantic heterogeneity problem by considering only labels at nodes and domain specific contextual information of the trees. To match nodes, SPSM approach uses the S-Match system as proposed in (Giunchiglia et al., 2007). Tree matching, in turn, exploits the results of the node matching and the structure of the trees to find if these globally match each other. We are mainly interested in approximate matching, since two web service descriptions may only rarely match perfectly in open environments, see (Giunchiglia et al., 2008) for details. Technically, two trees T1 and T2 approximately match if there is at least one node n1i in T1 and node n2j in T2 such that: (i) n1i approximately matches n2j, (ii) all ancestors of n1i are approximately matched to the ancestors of n2j. Notice that the horizontal order of siblings is not preserved being not a desirable property for the data translation purposes.
The implementation of approximate structure preserving semantic matching is based on (i) the theory of abstraction (Giunchiglia and Walsh, 1992) and (ii) the tree-edit distance. Specifically, the work in (Giunchiglia and Walsh, 1992) categorizes the various kinds of abstraction operations. The key idea is to use abstractions/refinements as tree-edit distance operations in order to estimate the similarity of two tree structures.