25th International Conference on Database Systems for Advanced Applications

Sep. 24-27, 2020, Jeju, South Korea

Click following URL

http://dasfaa2020.sigongji.com

to visit DASFAA 2020 Online Event Site

Paper details

Title: Incorporating Concept Information into Term Weighting Schemes for Topic Models

Authors: Huakui Zhang, Yi Cai, Bingshan Zhu, Changmeng Zheng, Kai Yang, Raymond Chi-Wing Wong and Qing Li

Abstract: Topic models demonstrate outstanding ability in discovering latent topics in text corpora. A coherent topic consists of words or entities related to similar concepts, i.e., abstract ideas of categories of things. To generate more coherent topics, term weighting schemes have been proposed for topic models by assigning weights to terms in text, such as promoting the informative entities. However, in current term weighting schemes, entities are not discriminated by their concepts, which may cause incoherent topics containing entities from unrelated concepts. To solve the problem, in this paper we propose two term weighting schemes for topic models, CEP scheme and DCEP scheme, to improve the topic coherence by incorporating the concept information of the entities. More specifically, the CEP term weighting scheme gives more weights to entities from the concepts that reveals the topics of the document. The DCEP scheme further reduces the co-occurrence of the entities from unrelated concepts and separates them into different duplicates of a document. We develop CEP-LDA and DCEPLDA term weighting topic models by applying the two proposed term weighting schemes to LDA. Experimental results on two public datasets show that CEPLDA and DCEP-LDA topic models can produce more coherent topics.

Video file:

Slide file:

Sponsors