25th International Conference on Database Systems for Advanced Applications

Sep. 24-27, 2020, Jeju, South Korea

Click following URL

http://dasfaa2020.sigongji.com

to visit DASFAA 2020 Online Event Site

Paper details

Title: Efficient Source Selection for Error Detection via Matching Dependencies

Authors: Lingli Li, Sheng Zheng, Jingwen Cai and Jinbao Li

Abstract: Data dependencies have been widely used in error detection. However, errors might not be detected when the target data set is sparse and no conflicts occur. With a rapid increase in the number of data sources available for consumption, we consider how to apply both external data sources and matching dependencies(a general form of FD) to detect more potential errors in target data. However, accessing all the sources for error detection is impractical when the number of sources is large. In this demonstration, we present an efficient source selection algorithm that can select a proper subset of sources for error detection. A key challenge of this approach is how to estimate the gain of each source without accessing their datasets. To address the above problem, we develop a two-level signature mechanism for estimating the gain of each source. Empirical results on both real and synthetic data show high performance on both the effectiveness and efficiency of our algorithm.

Video file:

Slide file:

Sponsors