SUN Nan and ZHANG Hua-wei. An new algorithm of Web page purification for data mining tools[J]. Journal of Light Industry, 2011, 26(3): 85-87,91. doi: 10.3969/j.issn.1004-1478.2011.03.021
Citation:
SUN Nan and ZHANG Hua-wei. An new algorithm of Web page purification for data mining tools[J]. Journal of Light Industry, 2011, 26(3): 85-87,91.
doi:
10.3969/j.issn.1004-1478.2011.03.021
An new algorithm of Web page purification for data mining tools
-
Received Date:
2010-12-29
Available Online:
2011-05-15
-
Abstract
In order to eliminate noise preferably and extract topic content from Web pages efficiently,an algorithm of Web page purification is presented.This algorithm argues that topic content of Web page is mainly contained in table and p,hereby Web noise can be preprocessed.Then with the content match of relevant Web page,the topic content of Web page can be acquired by way of calculating the importance of node.This algorithm has achieved very precise results,correctly extracting 98.2% of the pages in a set of 6 318 pages in portal sites.When used for data mining tools,this algorithm is better than the other similar algorithms.It can eliminate noise efficiently.
-
-
References
-
Proportional views
-
-