河北大学学报(自然科学版) ›› 2020, Vol. 40 ›› Issue (3): 322-327.DOI: 10.3969/j.issn.1000-1565.2020.03.014

• • 上一篇    下一篇

基于文档关系改进的向量空间模型

何丹丹1,吴树芳2,徐建民1   

  • 收稿日期:2020-01-12 出版日期:2020-05-25 发布日期:2020-05-25
  • 通讯作者: 徐建民(1966— ),男,河北邯郸人,河北大学教授,博士,主要从事信息检索、在线社交网络方向研究.E-mail:hbuxjm@hbu.edu.cn
  • 作者简介:何丹丹(1991— ),女,河北邯郸人,河北大学在读硕士研究生,主要从事信息检索方向研究. E-mail:1326320110@qq.com
  • 基金资助:
    国家社科基金后期资助项目(17FTQ002)

Improved vector space model based on document relationships

HE Dandan1,WU Shufang2,XU Jianmin1   

  1. 1.College of Cyberspace Security and Computer, Hebei University, Baoding 071002, China; 2.School of Management, Hebei University, Baoding 071002, China
  • Received:2020-01-12 Online:2020-05-25 Published:2020-05-25

摘要: 由于用户查询信息不足而导致传统向量空间模型检索结果不够准确,针对此问题,提出了一种基于文档关系改进的向量空间模型.改进模型将初始检索结果中排名靠前的高相关文档组成基准集,通过计算初始检索结果集中每篇文档与基准集的相似度,来修正原模型中文档与查询的相似度,实现对检索结果的重排序,从而实现对向量空间模型的改进.实验结果表明:与传统向量空间模型相比,改进模型使得相关文档排名更合理,在保证召回率的条件下提高了准确率.

关键词: 文档关系, 向量空间模型, 文档相似度, 信息检索

Abstract: Due to insufficient user query information, the retrieval results of traditional vector space model are not accurate enough. To solve this problem, an improved vector space model based on document relationship is proposed. The improved model combines the related documents ranked first in the initial retrieval results into a benchmark set. By calculating the similarity between each document in the initial retrieval result set and the benchmark set, the similarity between documents and queries in the original model and reorder the retrieval results is corrected, thus improving the vector space model.The experimental results show that, compared with the traditional vector space model, the improved model makes the ranking of related documents more reasonable and improves the precision while ensuring the recall rate.

Key words: document relationship, vector space model, document similarity, information retrieval

中图分类号: