节点文献

一种新的Web结构挖掘算法的研究

Research of a New Algorithm for Web Structure Mining

【作者】 刘王峰

【导师】 郑有才;

【作者基本信息】 西安电子科技大学 , 计算机软件与理论, 2010, 硕士

【摘要】 Web数据挖掘是数据挖掘技术和Internet应用研究相结合的研究领域,现已成为数据挖掘领域的重点研究方向。Web结构挖掘是Web数据挖掘中的一个很重要的方面,其经典算法有HITS算法和PageRank算法。虽然这两种算法都取得了定的成效,但是也都存在一些不足之处,如主题漂移现象。本文在对经典的Web结构挖掘算法HITS和PageRank进行了深入研究和分析的基础上,针对这两种经典算法的一些不足之处,提出了一种集超链接、超链接权重和时间权重三位于一体的新的算法—ANWSMA。该算法首先采用HITS算法中构造基集的思想得到有向图,然后用时间权重替换PageRank算法中的阻尼因子,同时针对链向网页的重要程度不同赋予不同的超链接权重,计算网页等级值,最后进行排序输出。最后,通过测试与分析,验证了ANWSMA算法的合理性和有效性。

【Abstract】 Web data mining is the combination of data mining technology and application of Internet research, and it has become the focus of the field of data mining research. Web structure mining is a very important aspect of Web data mining, it has the classic algorithm of the HITS algorithm and the PageRank algorithm. While these two algorithms have achieved some success, but there are also some shortcomings, such as the topic drift.In this thesis, on the basis of depth research and analysis of the classical Web structure mining algorithms HITS and PageRank, against to some of the inadequacies of the two classical algorithms, proposes a new algorithm—ANWSMA that set of hyperlinks, hyperlink weight and the time of weight. First, the algorithm get digraph using the ideas of the structure-based assembly of the HITS algorithm, and then replace the damping factor of the PageRank algorithm as time weight, give different Hyperlink weight to the web page according to the degree of the importance of the web page, to calculate the value of web rank and sorted out.Finally, its rationality and availability has been verified through simulation experiments and comparison with classical algorithm.

【关键词】 Web结构挖掘PageRankHITS时间权重ANWSMA
【Key words】 Web Structure MiningPageRankHITSTime WeightANWSMA
节点文献中: 

本文链接的文献网络图示:

本文的引文网络