JOURNAL OF LIGHT INDUSTRY

CN 41-1437/TS  ISSN 2096-1553

Volume 33 Issue 5
September 2018
Article Contents
ZHU Haodong, XUE Xiaobo, LI Hongchan and et al. Distributed FP-Growth algorithm based on Hadoop under massive data[J]. Journal of Light Industry, 2018, 33(5): 97-102,108. doi: 10.3969/j.issn.2096-1553.2018.05.013
Citation: ZHU Haodong, XUE Xiaobo, LI Hongchan and et al. Distributed FP-Growth algorithm based on Hadoop under massive data[J]. Journal of Light Industry, 2018, 33(5): 97-102,108. doi: 10.3969/j.issn.2096-1553.2018.05.013 shu

Distributed FP-Growth algorithm based on Hadoop under massive data

  • Received Date: 2018-05-16
  • In view of the large data problem of association mining by the method of taking two times of scanning database and adding the transaction to the independent data partition, distributed renovation of traditional FP-Growth algorithm was taken, the distributed FP-Growth algorithm based on Hadoop framework was then put forward so as to realize the frequent pattern FP huge amounts of data mining. The simulation results showed that in the process of increasing data processing, the algorithm was compared with the traditional algorithm advantages of its running time and memory consumption were becoming ever more obvious. When the amount of data processing reached 700,000 items, the algorithm saved about 2/3 running time than the traditional algorithm, while the memory consumption was only 1/5 of the traditional algorithm. It showed that the algorithm could significantly improve the mining efficiency of FP and reduced the memory consumption when dealing with massive data.
  • 加载中
    1. [1]

      刘智勇.关联规则挖掘的并行化算法研究[D].南京:东南大学,2016.

    2. [2]

      董金凤.数据挖掘中关联规则算法的改进与并行化处理[D].哈尔滨:哈尔滨理工大学,2016.

    3. [3]

      孙兵率.基于MapReduce的数据挖掘算法并行化研究与应用[D].西安:西安工程大学,2015.

    4. [4]

      HAN J W,PEI J,YIN Y W.Mining frequent patterns without candidate generation[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data,New York:ACM,2000:1.

    5. [5]

      黄明.基于空间分区的空间聚类研究[D].武汉:武汉大学,2010.

    6. [6]

      邱勇,兰永杰.高效FP-TREE创建算法[J].计算机科学,2004(10):98.

    7. [7]

      赵兰草.QAR数据的异常检测与分析算法研究[D].天津:中国民航大学,2014.

    8. [8]

      茹蓓,贺新征.高效的数据流完全频繁项集挖掘算法[J].计算机工程与设计,2017,38(10):2759.

    9. [9]

      王翔.基于云计算棉花仓储海量数据挖掘算法研究[D].北京:首都师范大学,2014.

    10. [10]

      周诗慧.基于Hadoop的改进的并行Fp-Growth算法[D].济南:山东大学,2013.

    11. [11]

      邵伟.基于FP-Tree的关联规则挖掘算法研究[D].西安:西安电子科技大学,2010.

    12. [12]

    13. [13]

      白川平,杨志翀.基于加权滑动窗口的数据流频繁项集挖掘算法[J].宁夏师范学院学报,2017,38(6):49.

    14. [14]

      胡健,吴毛毛.一种改进的数据流最大频繁项集挖掘算法[J].计算机工程与科学,2014,36(5):963.

    15. [15]

      刘慧婷,候明利,赵鹏,等.不确定数据流最大频繁项集挖掘算法研究[J].计算机工程与应用,2016,52(19):72.

Article Metrics

Article views(1986) PDF downloads(13) Cited by()

Ralated
    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return