河北大学学报(自然科学版) ›› 2010, Vol. 30 ›› Issue (1): 97-101.DOI: 10.3969/j.issn.1000-1565.2010.01.021

• • 上一篇    下一篇

基于量化同义词关系的改进特征词提取方法

徐建民1,刘清江1,付婷婷1,戴旭2   

  1. 1.河北大学,数学与计算机学院,河北,保定,071002; 2.河北大学,传媒实验教学中心,河北,保定,071002
  • 出版日期:2010-01-25 发布日期:2010-01-25
  • 基金资助:
    中国博士后科学基金

Improved Feature Selection Method Based on Similarity of Synonymous

XU Jian-min1,LIU Qing-jiang1,FU Ting-ting1,DAI Xu2   

  • Online:2010-01-25 Published:2010-01-25

摘要: 提出一种基于量化同义词关系的改进的TF-IDF文本特征词提取方法.该方法将在同一文本中出现的某个词的同义词做为一个集合,在传统TF-IDF方法计算的词语权重的基础上对同义词集合中的词语及其相关词进行权重调整,通过相似度对同义词集合中的词语进行了合并加权.实验证明该方法对文本中的同义词及其相关词进行了有效处理,提高了文本特征词提取的准确性.

关键词: 特征提取, TF-IDF, 同义词, 知网, 同现概率

Abstract: A method of improved feature extraction based on synonymous was proposed. The method collected synonyms in the text as a set, adjustment the weights of synonyms in the set and related words based on TF-IDF,and combined synonyms through the similarity.The experimental results display that the new method improves the accuracy of feature extraction.

Key words: feature extraction, TF-IDF, synonymous, hownet, co-occurrence

中图分类号: