河北大学学报(自然科学版) ›› 2017, Vol. 37 ›› Issue (3): 309-315.DOI: 10.3969/j.issn.1000-1565.2017.03.014

• • 上一篇    下一篇

基于Hadoop分布式支持向量机球磨机大数据建模

高学伟1,2,付忠广2,孙力1,张刚1   

  • 收稿日期:2016-04-05 出版日期:2017-05-25 发布日期:2017-05-25
  • 作者简介:高学伟(1982—),男,河北石家庄人,沈阳工程学院工程师,华北电力大学在读博士,主要从事电站机组运行优化与复杂热力系统建模仿真研究. E-mail:gxw82425@163.com
  • 基金资助:
    沈阳工程学院科技基金资助项目(LGQN-1051);辽宁省教育厅创新团队项目(LT2015018)

Big data modeling of ball mill based on distributed support vector machine on Hadoop platform

GAO Xuewei1,2,FU Zhongguang2,SUN Li1,ZHANG Gang1   

  1. 1.Simulation Center, Shenyang Institute of Engineering, Shenyang 110136, China; 2.Key Laboratory of Condition Monitoring and Control for Power Plant Equipmentof Ministry of Education, North China Electric Power University, Beijing 102206, China
  • Received:2016-04-05 Online:2017-05-25 Published:2017-05-25

摘要: 大数据时代环境下,火电厂大量数据被存储到数据库中而不能被充分利用,由于双进双出钢球磨煤机系统的复杂性,很难建立其准确的机理数学模型,为此提出一种基于大数据挖掘的建模方法.首先分析影响磨煤机料位的因素,提取现场海量的实际运行数据,在Hadoop平台下利用K-Means聚类算法删除离群点,利用主成分分析法(PCA)降维完成属性约简,然后在MapReduce架构上采用分布式支持向量机(D_SVM)建立模型,实现计算并行化.结果表明,采取该方法提高了建模效率,所建立的模型具有很高的精确度,且具有很好的泛化能力,该模型可以用于表征实际料位的特性.

关键词: 双进双出磨煤机, Hadoop平台, 分布式支持向量机, K-Means聚类, 主成分分析

Abstract: In the era of big data environment,a large amount of data in thermal power plant is stored in the database and cannot be fully utilized.Because of the complicated process of the double inlet and double outlet mill system,the mathematical model is difficult to build.A method of modeling based on data mining is presented.The actual operation big data which impact the coal mill material is extracted.First, the K-Means clustering is used to delete outliers,and then the principal component analysis(PCA)is used to complete attribute reduction,at last the distributed support vector machine(D_SVM)is used to build a model on the Hadoop platform in MapReduce framework for the parallel computation.The results show that modeling time is greatly reduced due to the use of the method,and the accuracy and applicability of the model are very high. Therefore the model can be used to represent the actual material properties.

Key words: double inlet and double outlet mill, Hadoop platform, D_SVM, K-Means clustering, principal component analysis

中图分类号: