25th International Conference on Database Systems for Advanced Applications

Sep. 24-27, 2020, Jeju, South Korea

Click following URL

http://dasfaa2020.sigongji.com

to visit DASFAA 2020 Online Event Site

Paper details

Title: A Deep-Learning-based Blocking Technique for Entity Linkage

Authors: Fabio Azzalini, Marco Renzi and Letizia Tanca

Abstract: Nowadays, data integration must often manage noisy data, also containing attribute values written in natural language such as product descriptions or book reviews. Entity Linkage has the role of identifying records that contain information referring to the same object. Modern Entity Linkage methods, in order to reduce the dimension of the problem, partition the initial search space into "Blocks" of records that can be considered similar according to some metrics, greatly reducing the overall complexity of the algorithm.We propose a Blocking strategy that, differently from the traditional methods, aims at capturing the semantic properties of data by means of recent Deep Learning frameworks. This paper is mainly inspired by a recent work on Entity Linkage whose authors were among the first to investigate the application of tuple embeddings to data integration problems. We extend their method adopting an unsupervised approach:our blocking model is trained on an external corpus and then used on new datasets, exploiting a "transfer learning" paradigm. Our choice is motivated by the fact that, in most data integration scenarios, no training data is actually available. Using a semi-automatic approach to blocking, our model, after being trained on an external corpus, can be directly applied to any data integration problem.We tested our system on six popular datasets and compared its performance against five traditional blocking algorithms. The test results demonstrated that our deep-learning-based blocking solution outperforms standard blocking algorithms, especially on textual and noisy data.

Video file:

Slide file:

Sponsors