河北大学学报(自然科学版) ›› 2025, Vol. 45 ›› Issue (5): 520-529.DOI: 10.3969/j.issn.1000-1565.2025.05.008

• • 上一篇    下一篇

基于条件概率分布的混合距离度量方法及应用

胡桂开,杨沛融   

  • 收稿日期:2024-10-15 发布日期:2025-09-18
  • 作者简介:胡桂开(1977—),男,东华理工大学教授,博士,主要从事机器学习及其应用和回归模型统计推断的研究.
    E-mail:hgk1997@163.com
  • 基金资助:
    国家自然科学基金项目(11661003);江西省自然科学基金项目(20192BAB201006)

Hybrid distance measurement method and application based on conditional probability distribution

HU Guikai, YANG Peirong   

  1. School of Science, East China University of Technology, Nanchang 330013, China
  • Received:2024-10-15 Published:2025-09-18

摘要: 为提高名词性属性实例差异的识别精度,优化分类算法性能,综合考虑实例的属性和类别特征,提出了一种基于条件概率分布的混合距离度量方法.首先,计算属性间以及属性与类别间条件概率分布的差异;其次,利用互信息对2种差异进行加权组合,得到新的混合距离度量;最后,利用K-近邻算法在20个UCI(University of California Irvine)数据集上进行仿真实验,并将其应用于儿童阑尾炎的诊断和治疗.结果表明:较重叠度量等3种度量方法,本文提出的距离度量方法,显著提高了分类算法的准确率,具有较好的应用前景.

关键词: 条件概率分布, 混合距离度量, 互信息, 名词性属性

Abstract: To improve the recognition accuracy of nominal attribute instance differences and optimize the performance of classification algorithms, a hybrid distance measurement method based on conditional probability distribution is proposed, which comprehensively considers the attributes and category features of instances.Firstly, this method calculates the differences in conditional probability distributions among attributes and between attributes and categories. Secondly, a new hybrid distance metric is obtained by using mutual information to weight and combine the two differences. Finally, simulation experiments are performed on 20 UCI datasets based on K-nearest neighbor algorithm. Meanwhile, the distance measurement method is applied to the diagnosis and treatment of appendicitis in children. The results show that compared with the three measurement methods including overlap measurement, the distance measurement method proposed in this work significantly improves the accuracy of the classification algorithm and has good application prospects.

Key words: conditional probability distribution, hybrid distance metric, mutual information, nominal attribute

中图分类号: