25th International Conference on Database Systems for Advanced Applications

Sep. 24-27, 2020, Jeju, South Korea

Click following URL


to visit DASFAA 2020 Online Event Site

Paper details

Title: String Joins with Synonyms

Authors: Gwangho Song, Hongrae Lee, Kyuseok Shim, Yoonjae Park and Wooyeol Kim

Abstract: String matching is a fundamental operation in many applications such as data integration, information retrieval and text mining. Since users express the same meaning in a variety of ways that are not textually similar, existing works have proposed variants of Jaccard similarity by using synonyms to consider semantics beyond textual similarities. However, they may produce a non-negligible number of false positives in some applications by employing set semantics and miss some true positives due to approximations. In this paper, we define new match relationships between a pair of strings under synonym rules and develop an efficient algorithm to verify the match relationships for a pair of strings. In addition, we propose two filtering methods to prune non-matching string pairs. We also develop join algorithms with synonyms based on the filtering methods and the match relationships. Experimental results with real-life datasets confirm the effectiveness of our proposed algorithms.

Video file:

Slide file:
