èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽGPUçš„ç¨‹åºåˆ†æžä¸Žå¹¶è¡ŒåŒ–ç ”ç©¶

Research of Programming Analysis and Parallelism Based on Graphics Processing Unit

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ çŽ‹æ¶›ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ è§£æ”¾å†›ä¿¡æ¯å·¥ç¨‹å¤§å¦ ï¼Œ è®¡ç®—æœºè½¯ä»¶ä¸Žç†è®ºï¼Œ 2010ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ é«˜æ€§èƒ½è®¡ç®—æœºæ˜¯ä¸€ä¸ªå›½å®¶ç»æµŽå’Œç§‘æŠ€å®žåŠ›çš„ç»¼åˆä½“çŽ°,ä¹Ÿæ˜¯ä¿ƒè¿›ç»æµŽã€ç§‘æŠ€å‘å±•,ç¤¾ä¼šè¿›æ¥å’Œå›½é˜²å®‰å…¨çš„é‡è¦å·¥å…·,å·²æˆä¸ºä¸–ç•Œå„å›½ç«žç›¸äº‰å¤ºçš„æˆ˜ç•¥åˆ¶é«˜ç‚¹ã€‚åœ¨äººä»¬è¿½æ±‚é«˜æ€§ä»·æ¯”çš„å¹¶è¡Œè®¡ç®—æœºç³»ç»Ÿçš„åŒæ—¶,åœ¨è®¸å¤šä¸“ç”¨é¢†åŸŸçš„ä¸“ç”¨è®¡ç®—éƒ¨ä»¶ä¹Ÿå‘æŒ¥ç€å…¶å¼ºå¤§çš„å¹¶è¡Œè®¡ç®—èƒ½åŠ›ã€‚å›¾å½¢å¤„ç†å™¨(GPU,Graphics Processing Unit)å°±æ˜¯ä¸€ç§ç”¨äºŽé€šç”¨è®¡ç®—çš„ä¸“ç”¨åŠ é€Ÿéƒ¨ä»¶ã€‚éšç€å¾®ç”µåæŠ€æœ¯çš„å‘å±•,å›¾å½¢å¤„ç†å™¨,æ— è®ºæ˜¯åœ¨é›†æˆåº¦è¿˜æ˜¯åœ¨æ•°æ®å¤„ç†èƒ½åŠ›ä¸Šéƒ½å·²è¿œè¿œè¶…è¿‡é€šç”¨å¤„ç†å™¨,ç‰¹åˆ«æ˜¯åœ¨å¯ç¼–ç¨‹èƒ½åŠ›ã€å¹¶è¡Œå¤„ç†èƒ½åŠ›å’Œåº”ç”¨èŒƒå›´æ–¹é¢å¾—åˆ°ä¸æ–æå‡å’Œæ‰©å±•,æˆä¸ºå½“å‰è®¡ç®—æœºç³»ç»Ÿä¸å…·å¤‡é«˜æ€§èƒ½å¤„ç†èƒ½åŠ›çš„éƒ¨ä»¶ã€‚ç›®å‰,å›½å†…å¤–é’ˆåŸºäºŽGPUçš„å¹¶è¡ŒåŒ–ç ”ç©¶,ä¸€èˆ¬éƒ½æ˜¯åœ¨åŽŸæœ‰ä¸²è¡Œç¨‹åºçš„åŸºç¡€ä¸Š,ç”±ç†Ÿæ‚‰GPUç¡¬ä»¶ç»“æž„çš„è®¡ç®—æœºä¸“ä¸šäººå‘˜è¿›è¡Œç¨‹åºæ”¹å†™ã€‚ä½†ç”±äºŽä¸²è¡Œç¨‹åºå¹¶è¡ŒåŒ–åŽå¸¦æ¥çš„å„ç§å¼€é”€,ä½¿å¾—å¹¶è¡ŒåŒ–åŽçš„æ‰§è¡Œæ•ˆçŽ‡å¯èƒ½ä¸åŠä¸²è¡Œç¨‹åºçš„æ‰§è¡Œæ•ˆçŽ‡ã€‚å› æ¤,å¦‚ä½•åˆç†åœ°å¯¹ä¸²è¡Œç¨‹åºè¿›è¡Œåˆ†æž,è¯„ä¼°ä¸²è¡Œç¨‹åºå¹¶è¡ŒåŒ–åŽåœ¨GPUä¸Šçš„æ‰§è¡Œæ•ˆçŽ‡å˜å¾—å°¤ä¸ºé‡è¦ã€‚æœ¬æ–‡é’ˆå¯¹å¦‚ä½•è¯„ä¼°ä¸²è¡Œç¨‹åºå¹¶è¡ŒåŒ–åŽåœ¨GPUä¸Šçš„æ‰§è¡Œæ•ˆçŽ‡å±•å¼€ç ”ç©¶,ä¸»è¦ç ”ç©¶å†…å®¹å¦‚ä¸‹:ä¸€ã€ç ”ç©¶æ”¯æŒCUDAæž¶æž„çš„GPUå¤šçº¿ç¨‹ç¡¬ä»¶ä½“ç³»ç»“æž„ä»¥åŠç¼–ç¨‹æ¨¡åž‹ã€‚åœ¨åˆ†æžç›®å‰é«˜æ€§èƒ½è®¡ç®—å’ŒGPUé€šç”¨è®¡ç®—çš„çŽ°çŠ¶çš„åŸºç¡€ä¸Š,è¯¦ç»†é˜è¿°äº†GPUåœ¨é€šç”¨è®¡ç®—ä¸çš„ä¼˜åŠ¿,å¯¹å›¾å½¢å¤„ç†å™¨çš„ç¡¬ä»¶ç»“æž„ä»¥åŠç¼–ç¨‹æ¨¡åž‹è¿›è¡Œæ·±å…¥ç ”ç©¶,ä¸ºå¼€é”€æ¨¡åž‹å»ºç«‹æä¾›ç†è®ºåŸºç¡€ã€‚äºŒã€ä¸ºå®žçŽ°å¾ªçŽ¯ä½“å·¥ä½œé‡çš„ç²¾ç¡®è®¡ç®—,æœ¬æ–‡åœ¨æ·±å…¥ç ”ç©¶ä¼ ç»Ÿçš„æ•°æ®ä¾èµ–å…³ç³»åˆ†æžæ–¹æ³•çš„åŸºç¡€ä¸Š,é’ˆå¯¹SUIFæ— æ³•å‡†ç¡®è®¡ç®—å¾ªçŽ¯ä½“ä¸Šä¸‹ç•Œä¸å›ºå®šæ—¶çš„è¿ä»£æ¬¡æ•°çš„æƒ…å†µ,æå‡ºäº†ä¸€ç§æ”¹è¿›çš„æ–¹æ³•ã€‚ä¸‰ã€ä¸ºäº†é¢„æµ‹ä¸²è¡Œç¨‹åºå¹¶è¡ŒåŒ–åŽåœ¨GPUä¸Šçš„æ‰§è¡Œæ•ˆçŽ‡,æå‡ºäº†ä¸€ç§åŸºäºŽCUDAæž¶æž„çš„GPUå¹¶è¡Œå¼€é”€æ¨¡åž‹,è¯¥æ¨¡åž‹ç»¼åˆè€ƒè™‘äº†ç¨‹åºå¹¶è¡ŒåŒ–çš„å„ç§å¼€é”€(è®¾å¤‡å¯åŠ¨å¼€é”€ã€æ•°æ®ä¼ è¾“å¼€é”€ä»¥åŠGPUæ‰§è¡Œå¼€é”€)ã€‚é€šè¿‡è¯¥æ¨¡åž‹å¯ä»¥é¢„æµ‹å‡ºä¸²è¡Œç¨‹åºç”¨GPUåŠ é€Ÿæ—¶çš„æ—¶é—´å¼€é”€,å°†å…¶ä¸Žä¸²è¡Œæ‰§è¡Œçš„å¼€é”€è¿›è¡Œå¯¹æ¯”,ä»Žè€Œåˆ¤æ–æ˜¯å¦ç”¨äºŽGPUåŠ é€Ÿ,è¿›è€ŒæŒ‡å¯¼ä¸²è¡Œç¨‹åºçš„å¹¶è¡ŒåŒ–ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ High performance computer is not only the integrated expression of a countryâ€™s economic and technological strength, but also an important tool for economic promotion, technology development, social progress and national security. It has become the strategic high ground. While people pursue the cost-effective parallel super-computer system, some dedicated computing components play their powerful parallel computing power in many special areas, Graphics Processing Unit, GPU, is one of them for image processing and general purpose computation. With the development of microelectronics technology, GPU is far better than general-purpose processor in integration and data processing capabilities. And GPU has become the component of high performance computer systems.At present, the research for GPU parallelism mainly based on the original serial program, and the professional, who is familiar with the GPU architecture, transforms the serial into parallel. But due to the various costs brought by the parallel implementation, the efficiency of the parallel program is less than that of serial program. This is undoubtedly a great waste of manpower and financial resources. Therefore, how to analyse the serial program reasonably and to predict the efficiency of parallel program on GPU becomes particularly important. This thesis studies how to make GPU more reasonable and effective in general purpose computation. The main research contents and innovations are as follows:1. The thesis analyses the current status of high performance computing, points out the difficulties and challenges which the traditional high performance computers are facing from different views, and studies the hardware architecture of GPU and the programming model, which will be the theoretic foundation of the following cost model.2. The thesis studies the data dependent relation technologies, and adopts an improved method to accurate the number of iteration for calculating loop body workload, which SUIF cannot do when the upper bound and the lower bound of loop body are not certain.3. In order to predict the execution efficiency of parallel program on GPU, the thesis presents a cost model for GPU based on CUDA architecture. The model takes into account several factors including the cost of data transfer, the cost of device startup and the cost of GPU execution. The model can estimate the total time cost of parallel program on GPU, which can determine whether it is worthy for GPU acceleration.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ å›¾å½¢å¤„ç†å™¨ï¼› GPUé€šç”¨è®¡ç®—ï¼› ç¨‹åºåˆ†æžä¸Žå¹¶è¡ŒåŒ–ï¼› ç»Ÿä¸€è®¡ç®—è®¾å¤‡æž¶æž„ï¼› å¼€é”€æ¨¡åž‹ï¼›
ã€Key wordsã€‘ Graphics Processing Unitï¼› General Computation on GPUï¼› Programming Analysis and Parallelismï¼› CUDAï¼› Cost Modelï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ è§£æ”¾å†›ä¿¡æ¯å·¥ç¨‹å¤§å¦

ã€åˆ†ç±»å·ã€‘TP332
ã€è¢«å¼•é¢‘æ¬¡ã€‘9
ã€ä¸‹è½½é¢‘æ¬¡ã€‘534
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽGPUçš„ç¨‹åºåˆ†æžä¸Žå¹¶è¡ŒåŒ–ç ”ç©¶

Research of Programming Analysis and Parallelism Based on Graphics Processing Unit

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

åŸºäºŽGPUçš„ç¨‹åºåˆ†æžä¸Žå¹¶è¡ŒåŒ–ç ”ç©¶