河北大学学报(自然科学版) ›› 2020, Vol. 40 ›› Issue (2): 193-199.DOI: 10.3969/j.issn.1000-1565.2020.02.012

• • 上一篇    下一篇

改进的K-近邻算法及其在学习预警中的应用

宗晓萍,陶泽泽   

  • 收稿日期:2019-05-07 出版日期:2020-03-25 发布日期:2020-03-25
  • 通讯作者: 陶泽泽(1991—),男,河北石家庄人,河北大学在读硕士研究生,主要从事教育数据挖掘研究.E-mail:462129970@qq.com
  • 作者简介:宗晓萍(1964—),女,河北蔚县人,河北大学教授,博士,主要从事模式识别、机器人视觉伺服控制方向研究. E-mail:769085906@qq.com
  • 基金资助:
    河北省高等教育教学改革研究与实践项目(2016GJJG016)

Improved K-nearest neighbor algorithm and its application in learning and warning

ZONG Xiaoping,TAO Zeze   

  1. College of Electronic Information Engineering, Hebei University, Baoding 071002, China
  • Received:2019-05-07 Online:2020-03-25 Published:2020-03-25

摘要: 随着大数据在教育中的作用日益凸显,大量的数据被应用到教学研究、教学评估和行为预测.学生的成绩、行为记录、与老师的互动记录等教育数据,都已经开始发挥价值.为了解决课程的低通过率问题,将改进的K-近邻算法应用到学习预警中,首先利用网格搜索和交叉验证相结合的方法对模型参数进行优选,其次在构建决策树过程中,利用基尼增益确定特征的权重系数并且根据权重系数进行特征选择,在计算距离时引入权重系数,使每个特征收到权重系数的约束.实验表明,在一个公开的数据集和一个真实的数据集上,改进后的K-近邻算法显著优于传统的K-NN.

关键词: 教育数据挖掘, 网格搜索, K-近邻, 交叉验证, 基尼增益

Abstract: With the increasing role of big data in education,a large amount of data is applied to teaching research,teaching evaluation and behavior prediction.Education data,such as students grades,behavioral records,and interaction with teachers,have begun to show their value.In order to solve the problem of low pass rate in the course,improved K-nearest neighbor algorithm is applied to study the early warning.The grid search and cross validation method of combining the parameter optimization of the model was used first.Second in the process of constructing a decision tree,the Gini gain is used to determine the characteristics of the weight coefficient and according to the weight coefficient of feature selection,weight coefficient was introduced when calculating the distance,enables each feature received weight coefficient constraint.Experiments show that the improved K-nearest neighbor algorithm is significantly better than the traditional K-NN algorithm in both a public data set and a real data set.

Key words: educational data mining, grid search, K-nearest neighbor, cross validation, Gini gain

中图分类号: