èŠ‚ç‚¹æ–‡çŒ®

èšç±»CLIQUEç®—æ³•åŠå…¶å¹¶è¡ŒåŒ–ç ”ç©¶

CLIQUE Algorithm and Parallelize

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ ç¥–å³°ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ é‡åº†å¤§å¦ ï¼Œ è®¡ç®—æœºç³»ç»Ÿç»“æž„ï¼Œ 2003ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ æ•°æ®æŒ–æŽ˜æ˜¯å¸®åŠ©äººä»¬åœ¨æµ·é‡æ•°æ®ä¸å‘çŽ°ä¿¡æ¯å’ŒçŸ¥è¯†çš„å·¥å…·ã€‚è¿‘å¹´æ¥æ•°æ®æŒ–æŽ˜æŠ€æœ¯æˆäº†å•†ä¸šæ™ºèƒ½çš„æ ¸å¿ƒæŠ€æœ¯ï¼Œè¢«å¹¿æ³›åº”ç”¨åˆ°äº†è¯¸å¤šé¢†åŸŸï¼Œå¼•èµ·äº†å¦æœ¯ç•Œæžå¤§çš„å…³æ³¨ã€‚èšç±»åˆ†æžæ˜¯æ•°æ®æŒ–æŽ˜ä¸çš„ä¸€ä¸ªé‡è¦ç ”ç©¶é¢†åŸŸï¼Œå®ƒä»Žæ•°æ®åº“ä¸å¯»æ‰¾æ•°æ®é—´çš„ç›¸ä¼¼æ€§ï¼Œä»Žè€Œä¼˜åŒ–å¤§è§„æ¨¡æ•°æ®åº“çš„æŸ¥è¯¢å’Œå‘çŽ°æ•°æ®ä¸éšå«çš„æœ‰ç”¨ä¿¡æ¯æˆ–çŸ¥è¯†ã€‚å¦‚ä½•è¿›è¡Œå¿«é€Ÿèšç±»ä»¥åŠå¦‚ä½•å–å¾—æ›´å¥½çš„èšç±»ç»“æžœæˆäº†èšç±»æ•°æ®æŒ–æŽ˜ç®—æ³•ç ”ç©¶çš„é‡ç‚¹å’Œéš¾ç‚¹ã€‚CLIQUEç®—æ³•ç»¼åˆäº†åŸºäºŽå¯†åº¦å’ŒåŸºäºŽç½‘æ ¼çš„èšç±»æ–¹æ³•ï¼Œå®ƒæœ‰ç€é€Ÿåº¦å¿«çš„ä¼˜ç‚¹ã€‚ä½†æ˜¯ç”±äºŽæ–¹æ³•å¤ªç®€åŒ–ï¼Œå¯èƒ½ä¼šé™ä½Žèšç±»ç»“æžœçš„ç²¾ç¡®æ€§ã€‚é€šè¿‡æ·±å…¥çš„ç ”ç©¶å’Œåˆ†æžï¼Œå‘çŽ°ç”±äºŽCLIQUEç®—æ³•æ²¡æœ‰è€ƒè™‘åˆ°å¦‚ä½•åˆ©ç”¨å½“å‰æŒ–æŽ˜æ•°æ®çš„ç‰¹æ€§ï¼Œè€Œæ˜¯è¿›è¡Œä¸€ç§ç¡¬æ€§çš„ç½‘æ ¼åˆ’åˆ†ï¼Œå› æ¤å¢žåŠ äº†è®¡ç®—å¤æ‚ç¨‹åº¦ï¼Œè€Œä¸ºäº†é™ä½Žè®¡ç®—çš„å¤æ‚ç¨‹åº¦å°±åªèƒ½é™ä½Žèšç±»ç»“æžœçš„ç²¾ç¡®æ€§ã€‚é’ˆå¯¹ä¸Šè¿°é—®é¢˜è®ºæ–‡å¼•å…¥äº†è‡ªé€‚åº”çš„ç½‘æ ¼åˆ’åˆ†æ–¹æ³•ï¼Œé€šè¿‡åœ¨ä¸€ç»´çš„æƒ…å†µä¸‹é¢„å…ˆåˆ†å‰²åŒºé—´ï¼Œç„¶åŽæ‰¾å‡ºå¯†é›†åˆ†å‰²åŒºé—´å¹¶å¯¹åˆ†ç•Œè¿›è¡Œè°ƒæ•´æ¥å¾—åˆ°å¯†é›†åŒºé—´ï¼Œæœ€åŽæŠŠè¿™äº›å¯†é›†åŒºé—´ä½œä¸ºåˆ’åˆ†ç½‘æ ¼çš„ä¾æ®ã€‚è¿™ç§åˆ’åˆ†ç½‘æ ¼çš„æ–¹æ³•å¾ˆå¥½åœ°åˆ©ç”¨äº†å½“å‰è¦æŒ–æŽ˜çš„æ•°æ®çš„ç‰¹æ€§ï¼ŒåŒæ—¶å‡å°‘äº†ç½‘æ ¼çš„æ•°é‡ä»¥åŠå¯†é›†å•å…ƒå€™é€‰é›†çš„æ•°ç›®ï¼Œå¤§å¹…åº¦å‡å°‘äº†è®¡ç®—çš„å¤æ‚ç¨‹åº¦ï¼Œä»Žè€Œä½¿å¾—åœ¨æ¯ä¸ªåç©ºé—´è¿›è¡Œè®¡ç®—æˆä¸ºäº†çŽ°å®žï¼Œä¹Ÿå¤§å¤§æé«˜äº†èšç±»ç»“æžœçš„ç²¾ç¡®æ€§ï¼Œä½†ç®—æ³•çš„æ—¶é—´å¤æ‚åº¦ä»æ˜¯æŒ‡æ•°çº§çš„ã€‚åªæ˜¯è¿™ä¸ªæŒ‡æ•°æ˜¯ç»´æ•°ï¼Œä½¿å¾—ç®—æ³•çš„æ—¶é—´å¤æ‚åº¦æ¯”èµ·å¾ˆå¤šèšç±»ç®—æ³•çš„ä»ç„¶ç®€å•å¾ˆå¤šã€‚ä¸ºäº†è¿›ä¸€æ¥æé«˜ç®—æ³•çš„æ‰§è¡Œæ•ˆçŽ‡ï¼Œè®ºæ–‡è¿˜å¯¹å¹¶è¡ŒCLIQUEç®—æ³•è¿›è¡Œäº†ç ”ç©¶ã€‚é€‰ç”¨é€šè¿‡å•†ç”¨ç½‘ç»œè¿žæŽ¥èµ·æ¥çš„PCæœºï¼Œä»¥åŠå¹¶è¡Œè™šæ‹ŸæœºPVMå’Œåˆ†å¸ƒå¼æ“ä½œç³»ç»ŸLINUXï¼Œå…±åŒæž„æˆäº†ä¸€ä¸ªæœºç¾¤ç³»ç»Ÿä½œä¸ºå¹¶è¡Œè®¡ç®—å¹³å°ã€‚åœ¨å¹¶è¡Œç¨‹åºçš„æ¨¡åž‹ä¸Šé€‰ç”¨äº†Master/Slaveæ¨¡åž‹ã€‚è¯¥å¹¶è¡Œç®—æ³•å°†æ•°æ®é›†åˆ†é…åˆ°å„ä¸ªèŠ‚ç‚¹æœºä¸Šå®žçŽ°äº†æ•°æ®å¹¶è¡Œï¼Œåœ¨æ•°æ®å¹¶è¡Œçš„åŸºç¡€ä¸Šï¼Œå½“ç”Ÿæˆå¯†é›†å•å…ƒå€™é€‰é›†ä»¥åŠéªŒè¯å¯†é›†å•å…ƒçš„æ—¶å€™åˆé‡‡å–äº†ä»»åŠ¡å¹¶è¡Œçš„æ–¹æ³•ã€‚ç”±äºŽä¸»ä½“æ˜¯æ•°æ®å¹¶è¡Œï¼Œå› æ¤è¾¾åˆ°äº†æŽ¥è¿‘çº¿æ€§çš„åŠ é€Ÿæ¯”ã€‚æ¯ä¸ªèŠ‚ç‚¹è®¡ç®—ä»»åŠ¡çš„æ—¶é—´å¤æ‚åº¦ç”±ä¸¤éƒ¨åˆ†æž„æˆï¼Œä¸€éƒ¨åˆ†æ˜¯æŒ‡æ•°çº§çš„éªŒè¯å¯†é›†å•å…ƒçš„æ—¶é—´å¤æ‚åº¦ï¼Œå¦ä¸€éƒ¨åˆ†æ˜¯çº¿æ€§çš„é€šä¿¡æ—¶é—´å¤æ‚åº¦ã€‚æœ€åŽï¼Œé€šè¿‡å®žéªŒéªŒè¯äº†å¹¶è¡ŒCLIQUEç®—æ³•çš„å¯è¡Œæ€§ï¼Œä»Žå®žéªŒä¸å¾—åˆ°çš„å¹¶è¡Œç®—æ³•çš„åŠ é€Ÿæ¯”ä¸Žç†è®ºåˆ†æžç»“æžœä¸€è‡´ã€‚å®žéªŒè¡¨æ˜Žï¼Œå¹¶è¡ŒCLIQUEç®—æ³•åœ¨æé«˜äº†èšç±»æŒ–æŽ˜ç»“æžœç²¾ç¡®åº¦çš„åŒæ—¶è¾¾åˆ°äº†è¾ƒé«˜çš„æ•ˆçŽ‡ï¼ŒåŒæ—¶ç”±äºŽç®—æ³•æ˜¯åŸºäºŽPVMçš„æœºç¾¤ç³»ç»Ÿå¼€å‘çš„ï¼Œå› æ¤ç®—æ³•çš„é€šç”¨æ€§è¾ƒå¼ºã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Data mining technology is used to help people finding the information and knowledge in the data. It has become the core technology of the intelligence commerce. It has been widely used in many areas and drawn the attention of the whole academe. Clustering is one of the most important areas in data mining Clustering finds the similarity among the data and use it to optimal the query of the large scale databases and find the hidden useful information and knowledge. How to make the clustering faster and the result of the clustering more accurate is of the most importance and hardness.CLIQUE is integrated density-based and grid-based method. It has the advantage of faster speed. But due to simplify the procedure, the accuracy of the clustering may be degraded. After deeply investigate and analysis, we found the drawback of CLIQUE lies in its inconsideration of the characteristic of the data being processed. It grid the data into a predefined grid and this adds up to the complexity of the computation. Then it has to degrade the accuracy of the result to degrade the complexity of computation,. We introduce adaptive-grid method to settle this problem. We divide each dimension into a fix interval and join the dense interval to dense part. At the boundary of the each dense part, boundary is adjusted by dividing a smaller interval. Finally the adaptive-grid is produced according to the dense part. This method makes full use of the characteristic of the data being processed. The number of dense unit and candidate dense unit is great reduced. At the same time the complexity of the computation is greatly decreased. So, computation in each dimension is feasible. This make the accuracy of cluster upgraded. But the computation complexity of the algorithm is still exponential. Due to the fact the exponent is dimension, the complexity of algorithm is still less than other clustering algorithms.To make the algorithms more efficient, it was parallelized. The hardware platform is PC connected with LAN. The software platform is PVM and LINUX. They construct the whole PC-cluster system. The parallel program model is master/slave model. The algorithm assign data set to each node realizes the data-parallel. When produce dense unit, task-parallel is used. Due to the fact the algorithm is complete data-parallel; the speedup of the algorithm is nearly liner. The time complexity of the each node is composes of exponential computation time and liner communication time. At last, the experiment proves the feasibility of the algorithm and the speedup gets <WP=6>from the experiment is in accord with theoretical one. The experiment also proves the parallel algorithm upgrade the accuracy of the clustering result combined with more efficient. Because the algorithm is based on PVM cluster, it is more popular.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ æ•°æ®æŒ–æŽ˜ï¼› èšç±»ï¼› å¹¶è¡Œç®—æ³•ï¼› å·¥ä½œç«™æœºç¾¤ï¼›
ã€Key wordsã€‘ Data miningï¼› clusteringï¼› parallel algorithmï¼› NOWï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ é‡åº†å¤§å¦

ã€åˆ†ç±»å·ã€‘TP311.13
ã€è¢«å¼•é¢‘æ¬¡ã€‘3
ã€ä¸‹è½½é¢‘æ¬¡ã€‘382

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

èšç±»CLIQUEç®—æ³•åŠå…¶å¹¶è¡ŒåŒ–ç ”ç©¶

CLIQUE Algorithm and Parallelize

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

èšç±»CLIQUEç®—æ³•åŠå…¶å¹¶è¡ŒåŒ–ç ”ç©¶