景区评论词频统计算法研究
Study on word frequency statistics algorithm of scenic sites
-
摘要: 针对人们在出游前查看景区网络评价信息难以得到对该景区之整体评价的问题,提出了一种适用于海量数据的词频统计算法TF-CT.该算法采用余弦相似性算法对海量的文本数据进行词性分类,将具有相同表达态度的数据归为一类;采用TextRank算法对各类别中的一条数据进行关键词语提取;采用改进的TFIDF算法对提取的关键词进行词频统计,获取文本数据的表达态度.实验结果表明,与TFIDF算法相比,TF-CT算法在结果准确度和时间复杂度上具有更大的优势.Abstract: Aiming at the problem that it is difficult to get the overall evaluation of the scenic spot when people are viewing the network evaluation information before the trip,a word frequency statistics algorithm TF-CT that is suitable for massive data was proposed.The cosine similarity algorithm was used to classify these huge text data,so that the data with the same expression attitude was categorized into a class.The TextRank algorithm was used to extract the key words in one of the data in each category.The word frequency of the extracted keywords was used to obtain the attitude of text data using the TFIDF algorithm. Experimental results showed that compared with the TFIDF algorithm,the TF-CT algorithm had greater advantages in accuracy and time complexity.
-
Key words:
- word frequency /
- text data /
- scenic evaluation /
- TF-CT algorithm /
- TFIDF algorithm
-
-
[1]
LIN W S,CHEN M F,CHEN Y Y.Understanding consumer search activity and online purchase intensions for improving the product recommendation search[C]//6th IEEE/ACIS International Conference on Computer & Information Science.Piscataway:IEEE,2007:1135-1140.
-
[2]
李莉,张捷.互联网信息评价对游客信息行为和出游决策的影响研究[J].旅游学刊,2013,28(10):23.
-
[3]
LUHN H P.A statistical approach to mechanized encoding and searching of literary information[J].IBM J Research & Development,2010,1(4):309.
-
[4]
SALTON G,YANG C S.On the specification of term values in automatic indexing[J].Journal of Documentation,1973,29(4):351.
-
[5]
YAN Y,LIANG H,MENG Q.Exploration and improvement in keyword extraction for news based on TFIDF[J].Energy Procedia,2011,13:3551.
-
[6]
MIHALCEA R,TARAU P.TextRank:Bringing order into texts[C]//Conference on Empirical Methods in Natural Language Processing.[S.l.]:[s.n.],2004:404-411.
-
[7]
顾益军,夏天.融合LDA与TextRank的关键词抽取研究[J].现代图书情报技术,2014,30(S1):41.
-
[8]
施聪莺,徐朝军,杨晓江.TFIDF算法研究综述[J].计算机应用,2009,29(S1):167.
-
[9]
张建娥.基于TFIDF和词语关联度的中文关键词提取方法[J].情报科学,2012(10):110.
-
[10]
张保富,施化吉,马素琴.基于TFIDF文本特征加权方法的改进研究[J].计算机应用与软件,2011,28(2):17.
-
[11]
徐文海,温有奎.一种基于TFIDF方法的中文关键词抽取算法[J].情报理论与实践,2008,31(2):298.
-
[12]
李建江,崔健,王聃,等.MapReduce并行编程模型研究综述[J].电子学报,2011,39(11):2635.
-
[1]
计量
- PDF下载量: 20
- 文章访问数: 1150
- 引证文献数: 0