JOURNAL OF LIGHT INDUSTRY

CN 41-1437/TS  ISSN 2096-1553

Volume 26 Issue 3
May 2011
Article Contents
SUN Nan and ZHANG Hua-wei. An new algorithm of Web page purification for data mining tools[J]. Journal of Light Industry, 2011, 26(3): 85-87,91. doi: 10.3969/j.issn.1004-1478.2011.03.021
Citation: SUN Nan and ZHANG Hua-wei. An new algorithm of Web page purification for data mining tools[J]. Journal of Light Industry, 2011, 26(3): 85-87,91. doi: 10.3969/j.issn.1004-1478.2011.03.021 shu

An new algorithm of Web page purification for data mining tools

  • Received Date: 2010-12-29
    Available Online: 2011-05-15
  • In order to eliminate noise preferably and extract topic content from Web pages efficiently,an algorithm of Web page purification is presented.This algorithm argues that topic content of Web page is mainly contained in table and p,hereby Web noise can be preprocessed.Then with the content match of relevant Web page,the topic content of Web page can be acquired by way of calculating the importance of node.This algorithm has achieved very precise results,correctly extracting 98.2% of the pages in a set of 6 318 pages in portal sites.When used for data mining tools,this algorithm is better than the other similar algorithms.It can eliminate noise efficiently.
  • 加载中

Article Metrics

Article views(957) PDF downloads(22) Cited by()

Ralated
    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return