河北大学学报(自然科学版) ›› 2010, Vol. 30 ›› Issue (2): 211-215.DOI: 10.3969/j.issn.1000-1565.2010.02.021

• • 上一篇    下一篇

一种适用于数据仓库环境的增量聚类方法

王春才,杨华民,张彩虹,郭威,韩贵东   

  1. 长春理工大学计算机科学技术学院,吉林,长春,130022
  • 出版日期:2010-03-25 发布日期:2010-03-25
  • 基金资助:
    吉林省科技发展计划重点项目

A Incremental Clustering Algorithm in Data Warehouse Environment

WANG Chun-cai,YANG Hua-min,ZHANG Cai-hong,GUO Wei,HAN Gui-dong   

  • Online:2010-03-25 Published:2010-03-25

摘要: 聚类分析要求较高聚类质量和快速响应能力,各行业数据仓库中的大量、高维数据对算法的效率提出了更大的挑战.CURE算法能够提供高质量聚类结果但不满足联机聚类要求.结合数据仓库数据不定期批量、增量更新的特点,提出了一种新的增量式CURE聚类算法--InCURE,利用对象的互连性和近似度,保持原算法的动态聚类特性的同时大大缩短聚类时间.5维、20维、50维的大量数据实际测试表明无论低维还是高维数据,InCURE都比CURE具有更高的效率,适合数据仓库环境下的增量式聚类分析.

关键词: 聚类, 数据仓库, 增量聚类, CURE

Abstract: Data warehouse is a challenging field of application for data mining tasks such as clustering. Clustering online requires good result and fast-response ability at the same time. The CURE algorithm can get high-quality clusters but efficiency is relatively low. In this paper, a novel incremental CURE algorithm-InCURE is proposed, after investigating CURE and updates mode of data warehouse. CURE keeps nicely the dynamic clustering characteristic of the original algorithm, while shortens the clustering time consumedly by using the historical clustering results and dealing with added items separately. Performance evaluation of InCURE based on multidimensional data demonstrates that it is well applicable for incremental clustering in data warehouse.

Key words: clustering, data warehouse, incremental clustering, CURE

中图分类号: