èŠ‚ç‚¹æ–‡çŒ®

ä¸€ç§æ–°çš„Webç»“æž„æŒ–æŽ˜ç®—æ³•çš„ç ”ç©¶

Research of a New Algorithm for Web Structure Mining

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ åˆ˜çŽ‹å³°ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ è¥¿å®‰ç”µåç§‘æŠ€å¤§å¦ ï¼Œ è®¡ç®—æœºè½¯ä»¶ä¸Žç†è®ºï¼Œ 2010ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ Webæ•°æ®æŒ–æŽ˜æ˜¯æ•°æ®æŒ–æŽ˜æŠ€æœ¯å’ŒInternetåº”ç”¨ç ”ç©¶ç›¸ç»“åˆçš„ç ”ç©¶é¢†åŸŸ,çŽ°å·²æˆä¸ºæ•°æ®æŒ–æŽ˜é¢†åŸŸçš„é‡ç‚¹ç ”ç©¶æ–¹å‘ã€‚Webç»“æž„æŒ–æŽ˜æ˜¯Webæ•°æ®æŒ–æŽ˜ä¸çš„ä¸€ä¸ªå¾ˆé‡è¦çš„æ–¹é¢,å…¶ç»å…¸ç®—æ³•æœ‰HITSç®—æ³•å’ŒPageRankç®—æ³•ã€‚è™½ç„¶è¿™ä¸¤ç§ç®—æ³•éƒ½å–å¾—äº†å®šçš„æˆæ•ˆ,ä½†æ˜¯ä¹Ÿéƒ½å˜åœ¨ä¸€äº›ä¸è¶³ä¹‹å¤„,å¦‚ä¸»é¢˜æ¼‚ç§»çŽ°è±¡ã€‚æœ¬æ–‡åœ¨å¯¹ç»å…¸çš„Webç»“æž„æŒ–æŽ˜ç®—æ³•HITSå’ŒPageRankè¿›è¡Œäº†æ·±å…¥ç ”ç©¶å’Œåˆ†æžçš„åŸºç¡€ä¸Š,é’ˆå¯¹è¿™ä¸¤ç§ç»å…¸ç®—æ³•çš„ä¸€äº›ä¸è¶³ä¹‹å¤„,æå‡ºäº†ä¸€ç§é›†è¶…é“¾æŽ¥ã€è¶…é“¾æŽ¥æƒé‡å’Œæ—¶é—´æƒé‡ä¸‰ä½äºŽä¸€ä½“çš„æ–°çš„ç®—æ³•â€”ANWSMAã€‚è¯¥ç®—æ³•é¦–å…ˆé‡‡ç”¨HITSç®—æ³•ä¸æž„é€ åŸºé›†çš„æ€æƒ³å¾—åˆ°æœ‰å‘å›¾,ç„¶åŽç”¨æ—¶é—´æƒé‡æ›¿æ¢PageRankç®—æ³•ä¸çš„é˜»å°¼å› å,åŒæ—¶é’ˆå¯¹é“¾å‘ç½‘é¡µçš„é‡è¦ç¨‹åº¦ä¸åŒèµ‹äºˆä¸åŒçš„è¶…é“¾æŽ¥æƒé‡,è®¡ç®—ç½‘é¡µç‰çº§å€¼,æœ€åŽè¿›è¡ŒæŽ’åºè¾“å‡ºã€‚æœ€åŽ,é€šè¿‡æµ‹è¯•ä¸Žåˆ†æž,éªŒè¯äº†ANWSMAç®—æ³•çš„åˆç†æ€§å’Œæœ‰æ•ˆæ€§ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Web data mining is the combination of data mining technology and application of Internet research, and it has become the focus of the field of data mining research. Web structure mining is a very important aspect of Web data mining, it has the classic algorithm of the HITS algorithm and the PageRank algorithm. While these two algorithms have achieved some success, but there are also some shortcomings, such as the topic drift.In this thesis, on the basis of depth research and analysis of the classical Web structure mining algorithms HITS and PageRank, against to some of the inadequacies of the two classical algorithms, proposes a new algorithmâ€”ANWSMA that set of hyperlinks, hyperlink weight and the time of weight. First, the algorithm get digraph using the ideas of the structure-based assembly of the HITS algorithm, and then replace the damping factor of the PageRank algorithm as time weight, give different Hyperlink weight to the web page according to the degree of the importance of the web page, to calculate the value of web rank and sorted out.Finally, its rationality and availability has been verified through simulation experiments and comparison with classical algorithm.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ Webç»“æž„æŒ–æŽ˜ï¼› PageRankï¼› HITSï¼› æ—¶é—´æƒé‡ï¼› ANWSMAï¼›
ã€Key wordsã€‘ Web Structure Miningï¼› PageRankï¼› HITSï¼› Time Weightï¼› ANWSMAï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ è¥¿å®‰ç”µåç§‘æŠ€å¤§å¦

ã€åˆ†ç±»å·ã€‘TP311.13
ã€ä¸‹è½½é¢‘æ¬¡ã€‘44
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

ä¸€ç§æ–°çš„Webç»“æž„æŒ–æŽ˜ç®—æ³•çš„ç ”ç©¶

Research of a New Algorithm for Web Structure Mining

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

ä¸€ç§æ–°çš„Webç»“æž„æŒ–æŽ˜ç®—æ³•çš„ç ”ç©¶