èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽWEBæŒ–æŽ˜æŠ€æœ¯çš„ç½‘é¡µè‡ªåŠ¨åˆ†ç±»å’Œèšç±»çš„ç ”ç©¶

Research of Automatic Web Page Categorization and Cluster Based on Web Mining Technology

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ è°¢æŒ¯äº®ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ å¤©æ´¥å¤§å¦ ï¼Œ è®¡ç®—æœºåº”ç”¨ï¼Œ 2004ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ æ–‡æœ¬åˆ†ç±»å’Œæ–‡æœ¬èšç±»æ˜¯ä¿¡æ¯å¤„ç†ä¸çš„ä¸¤ä¸ªé‡è¦å·¥ä½œã€‚ä¼ ç»Ÿçš„åˆ†ç±»å’Œèšç±»ç®—æ³•ä¸»è¦é’ˆå¯¹çº¯æ–‡æœ¬æ–‡ä»¶ï¼Œéšç€Internetçš„è¿…é€Ÿå‘å±•ï¼ŒåŠç»“æž„åŒ–çš„Webæ•°æ®æ…¢æ…¢å æ®äº†ä¿¡æ¯å¤„ç†å¯¹è±¡çš„ä¸»ä½“ï¼Œè¿™ä½¿å¾—æ–‡æœ¬åˆ†ç±»å’Œèšç±»ç®—æ³•å¾—åˆ°äº†è¿›ä¸€æ¥çš„å»¶ä¼¸å’Œå‘å±•ã€‚æœ¬è®ºæ–‡ä¸»è¦ç ”ç©¶å¦‚ä½•åˆ©ç”¨WebæŒ–æŽ˜æŠ€æœ¯ï¼Œå¹¶ç»“åˆçŽ°æœ‰çš„åˆ†ç±»å’Œèšç±»æŠ€æœ¯ï¼Œå®žçŽ°å¯¹Webæ–‡æœ¬æ•°æ®çš„é«˜å‡†ç¡®çŽ‡çš„åˆ†ç±»å’Œèšç±»ã€‚è®ºæ–‡çš„å‡ºå‘ç‚¹æ˜¯ï¼šä¸€ä¸ªç½‘é¡µåœ¨ç½‘ç«™æ‹“æ‰‘ç»“æž„ä¸çš„ä½ç½®åŠå…¶å®ƒç½‘é¡µå¯¹å®ƒçš„é“¾æŽ¥æ–‡æœ¬éƒ½åŒ…å«äº†ç½‘ç«™ç®¡ç†è€…å¯¹è¿™ä¸ªç½‘é¡µçš„å†…å®¹åŠç±»åˆ«çš„å®šä½ï¼›å……åˆ†åˆ©ç”¨è¿™äº›ä¿¡æ¯ï¼Œæœ‰åŠ©äºŽå¯¹è¯¥ç½‘é¡µçš„åˆ†ç±»å’Œèšç±»ã€‚æœ¬è®ºæ–‡æå‡ºé€šè¿‡Webå†…å®¹æŒ–æŽ˜å’Œç»“æž„æŒ–æŽ˜ï¼Œæå–ç½‘é¡µåœ¨æ•´ä¸ªç½‘ç«™ä¸çš„å±‚æ¬¡ç±»åˆ«ä¿¡æ¯ï¼Œé€šè¿‡è¿™äº›å±‚æ¬¡ç±»åˆ«ä¿¡æ¯å¯¹ç½‘é¡µè¿›è¡Œåˆ†ç±»å’Œèšç±»ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Text classification and cluster are two important missions of information processing. Traditional algorithms of classification and cluster aim at pure text files, but with the development of Internet, half-struct web data become the main objects of information processing, and it makes evolution to the algorithms of classification and cluster.This paper focuses on how to achieve high precision of classification and cluster using web-mining technology compounded with existing technology. The stand of this paper is that the pageâ€™s positon in the site topology shows the managerâ€™s viewpoint of content and class of the page and this information is very helpful to classification and cluster. We extract the hiberarchy class infomation of pages through web content mining and web structure mining, and use this infomation to classify and cluster the pages.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ æ–‡æœ¬åˆ†ç±»ï¼› æ–‡æœ¬èšç±»ï¼› WebæŒ–æŽ˜ï¼› é“¾æŽ¥æ–‡æœ¬ï¼›
ã€Key wordsã€‘ Text Classificationï¼› Text Clusterï¼› Web Miningï¼› Anchor Textï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ å¤©æ´¥å¤§å¦

ã€åˆ†ç±»å·ã€‘TP393.092
ã€è¢«å¼•é¢‘æ¬¡ã€‘8
ã€ä¸‹è½½é¢‘æ¬¡ã€‘480

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽWEBæŒ–æŽ˜æŠ€æœ¯çš„ç½‘é¡µè‡ªåŠ¨åˆ†ç±»å’Œèšç±»çš„ç ”ç©¶

Research of Automatic Web Page Categorization and Cluster Based on Web Mining Technology

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

åŸºäºŽWEBæŒ–æŽ˜æŠ€æœ¯çš„ç½‘é¡µè‡ªåŠ¨åˆ†ç±»å’Œèšç±»çš„ç ”ç©¶