Journal of Hebei University (Natural Science Edition) ›› 2016, Vol. 36 ›› Issue (1): 106-112.DOI: 10.3969/j.issn.1000-1565.2016.01.017

Previous Articles    

Chinese news topic detection based on LDA and T-OPTICS

LI Cong1,YUAN Fang2,LIU Yu2,LI Xinyu1   

  1. 1.College of Computer Science and Technology, Hebei University, Baoding 071002, China; 2.College of Mathematics and Information Science, Hebei University, Baoding 071002, China
  • Received:2015-09-20 Online:2016-01-25 Published:2016-01-25

Abstract: A method of topic detection from large-scale news dataset is proposed.First,latent dirichlet allocation(LDA)is used to reduce the dimension of data by express the news to probabilistic distribution on a set of topics.Then,T-OPTICS algorithm,one algorithm proved based on OPTICS(ordering point to identify the cluster structure)algorithm,is used to cluster news to topics.Because of the OPTICS algorithm is not sensitive to parameters variation,the influence of parameters choice is reduced.The calculation method of text similarity is proved by considering the effect of time parameters.The experimental results show that the algorithm can detect the topics in the TDT4 data set quickly and effectively.

Key words: LDA model, T-OPTICS, cluster, dimensionality reduction

CLC Number: