基于动态滑动窗口的改进数据流聚类算法
Improved data stream clustering algorithm over sliding window
-
摘要: 提出一种采用滑动窗口处理数据的优化算法DCluStream.该方法基于CluStream算法双层框架思想,在聚类特征中引入数据流入和流出滑动窗口的实际时间,动态调整窗口大小以适应有限内存;对历史数据通过时间衰减机制来降低它对新数据对象的影响,使聚类效果更好.实验结果表明,与CluStream相比,本算法处理数据的效率更高且相对节约内存.Abstract: An optimization algorithm DCluStream was proposed which processed data over sliding window.The method adopted online-offline clustering framework of CluStream.The real time of the data object coming and out of sliding window was introduced into the characteristics of the cluster,adjusting the window size reasonably in the limited memory resources environment.Using the time decay mechanism on historical data could reduce the impact of new data object,which could get better clustering results.The experimental results showed that compared with the algorthm CluStream,data processing efficiency of the algorithm was relatively higher with saving memory.
-
Key words:
- sliding window /
- data stream clustering algorithm /
- time decay mechanism
-
-
[1]
金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172.
-
[2]
Guha S,Mishra N,Motwani R,et al.Clustering data streams[C]//Proceedings of 41st Annual Symposium on Foundations of Computer Science,Los Alamitos,CA:IEEE Computer Society Press,2000:359.
-
[3]
O'Callaghan L,Mishra N,Meyerson A,et al.Streaming data algorithms for high-quality clustering[C]// Proceeding of 18th Internationl Conference on Data Engineering.Los Alamitos,CA:IEEE Computer Society Press,2002:685.
-
[4]
Aggarwal C C,Han J,Wang J,et al.A framework for clustering evolving data streams[C]//Proceeding of 29th Internationl Conference on Very Large Data Bases,Berlin:Morgan Kaufmann,2003:81.
-
[5]
周晓云,孙志挥,张柏礼,等.高维数据流子空间聚类发现及维护算法[J].计算机研究与发展,2006,43(5):834.
-
[6]
杨春宇,周杰.一种混合属性数据流聚类算法[J].计算机学报,2007,30(8):1364.
-
[7]
吴枫,仲妍,金鑫,等.滑动窗口内进化数据流任意形状聚类算法[J].小型微型计算机系统,2009,30(5):887.
-
[8]
常建龙,曹锋,周傲英.基于滑动窗口的进化数据流聚类[J].软件学报,2007,18(4):905.
-
[9]
宋宝燕,张衡,于洋,等.基于滑动窗口的支持泛在应用的流聚类挖掘算法[J].小型微型计算机系统,2008,29(12):2262.
-
[1]
计量
- PDF下载量: 28
- 文章访问数: 1048
- 引证文献数: 0