èŠ‚ç‚¹æ–‡çŒ®

å¤šæ ¸ç»“æž„ä¸Šé«˜æ•ˆçš„çº¿ç¨‹çº§æŽ¨æµ‹åŠäº‹åŠ¡æ‰§è¡Œæ¨¡åž‹ç ”ç©¶

Research on the Efficient Thread-level Speculation and Transactional Execution Model on Multi-core Platform

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ åˆ˜åœ†ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ ä¸å›½ç§‘å¦æŠ€æœ¯å¤§å¦ ï¼Œ è®¡ç®—æœºç³»ç»Ÿç»“æž„ï¼Œ 2007ï¼Œ åšå£«

ã€æ‘˜è¦ã€‘ ç‰‡ä¸Šå¤šæ ¸ä½œä¸ºå½“ä»Šå¤„ç†å™¨è®¾è®¡çš„ä¸»æµæŠ€æœ¯,éœ€è¦è¿è¡Œå¤šçº¿ç¨‹åº”ç”¨æ‰èƒ½å……åˆ†å‘æŒ¥æ€§èƒ½ã€‚æŽ¨æµ‹å¤šçº¿ç¨‹æ–¹æ³•èƒ½å¤Ÿç®€åŒ–å¹¶è¡Œç¼–ç¨‹,å…è®¸ç¨‹åºå‘˜æˆ–è€…ç¼–è¯‘å™¨åœ¨ä¸å®Œå…¨ä¿è¯æ£ç¡®æ€§çš„æƒ…å†µä¸‹,å°è¯•æ¿€è¿›çš„ä¼˜åŒ–æ–¹å¼æ¥å¼€å‘å’Œåˆ©ç”¨æ›´å¤šçš„ç¨‹åºå¹¶è¡Œæ€§ã€‚å®žçŽ°è¿™ç§æ–¹æ³•çš„éš¾ç‚¹åœ¨äºŽè®¿å˜æ“ä½œçš„å±€éƒ¨ç¼“å˜,å·²æå‡ºçš„ä¸€äº›æŽ¨æµ‹å¤šçº¿ç¨‹æ–¹æ¡ˆéƒ½ä½¿ç”¨äº†éžå¸¸å¤æ‚çš„ç¼“å˜æœºåˆ¶,ä¸å…‰å¢žåŠ äº†ç¡¬ä»¶è®¾è®¡å¤æ‚åº¦,ä¹Ÿåœ¨ä¸€å®šç¨‹åº¦ä¸Šå½±å“äº†åº”ç”¨å¼€å‘çš„æ•ˆçŽ‡ã€‚å®žçŽ°è¿™ç§æŠ€æœ¯çš„å¦ä¸€ä¸ªéš¾ç‚¹æ˜¯å¦‚ä½•æœ‰æ•ˆåœ°å‡å°‘è¯¯æŽ¨æµ‹å¯¹å¹¶è¡Œæ€§èƒ½çš„ä¸ç¡®å®šæ€§å½±å“ã€‚ä¸ºæ¤,æœ¬æ–‡å°è¯•é‡‡ç”¨äº‹åŠ¡å˜å‚¨å’ŒåŠ¨æ€å‰–æžæŠ€æœ¯æ¥è§£å†³è¿™ä¸¤å¤§éš¾é¢˜,ä¸ºå¤šæ ¸å¹³å°å¯»æ‰¾ä¸€ç§èƒ½å¤Ÿé«˜æ•ˆåœ°æŽ¨æµ‹å¹¶è¡ŒåŒ–åº”ç”¨ç¨‹åºçš„è½¯ç¡¬ä»¶ååŒçš„è§£å†³æ–¹æ¡ˆã€‚æœ¬æ–‡å›´ç»•åŸºäºŽäº‹åŠ¡å˜å‚¨çš„çº¿ç¨‹çº§æŽ¨æµ‹æŠ€æœ¯å¼€å±•äº†æ·±å…¥ç³»ç»Ÿçš„ç ”ç©¶,æ¶‰åŠç»“æž„æ¨¡åž‹ã€ç¼–ç¨‹å’Œæ‰§è¡Œæ¨¡åž‹ã€åŠ¨æ€ä¼˜åŒ–æ–¹æ³•ç‰æ–¹é¢çš„å†…å®¹ã€‚ä¸»è¦ç ”ç©¶æˆæžœåŒ…æ‹¬:(1)æœ¬æ–‡é¦–å…ˆæå‡ºäº†ä¸€ä¸ªåŸºäºŽäº‹åŠ¡å˜å‚¨çš„æŽ¨æµ‹å¤šçº¿ç¨‹ä½“ç³»ç»“æž„æ¨¡åž‹SPoTM(Speculatire Parallelization on Transactional Memory)ã€‚SPoTMåˆ©ç”¨äº‹åŠ¡å˜å‚¨æ¥å®žçŽ°çº¿ç¨‹é—´çš„è¯»å†™æ“ä½œéš”ç¦»,æä¾›äº†çº¿ç¨‹ä¹±åºæ‰§è¡Œã€é¡ºåºæäº¤ã€å†²çªæ£€æµ‹ä»¥åŠæŽ¨æµ‹å¤±è´¥åŽå›žé€€ç‰åŠŸèƒ½ã€‚(2)æœ¬æ–‡è¿˜ä¸ºSPoTMç»“æž„è®¾è®¡äº†ä¸€ä¸ªåŸºäºŽå¾ªçŽ¯å¹¶è¡Œçš„æŽ¨æµ‹å¤šçº¿ç¨‹ç¼–ç¨‹æ¨¡åž‹,æä¾›äº†å®žçŽ°è¯¥ç¼–ç¨‹æ¨¡åž‹æ‰€éœ€çš„æŽ¨æµ‹çº¿ç¨‹ç³»ç»Ÿåº“ä»¥åŠæŒ‡ä»¤é›†æ‰©å±•ç‰ã€‚SPoTMç¼–ç¨‹æ¨¡åž‹å®žçŽ°ç®€å•,å¹¶è¡ŒåŒ–éœ€è¦çš„ä»£ç è°ƒæ•´å¾ˆå°‘,å¯¹å¤šçº¿ç¨‹å¹¶è¡Œç¨‹åºè®¾è®¡çš„ç®€åŒ–éžå¸¸æ˜Žæ˜¾ã€‚(3)æœ¬æ–‡é€‰å–SPEC CPU 2000ä¸çš„è‹¥å¹²å…¸åž‹ç¨‹åº,åœ¨ä¸ºSPoTMç»“æž„å¼€å‘çš„æ¨¡æ‹Ÿæ‰§è¡Œå¹³å°fastTMå’Œsim-SPoTMä¸Šè¿›è¡Œäº†è¯¦ç»†çš„è¯„æµ‹,é‡åŒ–åˆ†æžäº†å„ç§ç¡¬ä»¶æœºåˆ¶å¯¹æŽ¨æµ‹æ‰§è¡Œæ€§èƒ½çš„å½±å“,ä»¥å¯»æ‰¾æ€§ä»·æ¯”è¾ƒå¥½çš„å®žçŽ°æ–¹æ¡ˆã€‚æœ¬æ–‡è¿˜å…¨é¢åˆ†æžäº†åœ¨æŽ¨æµ‹æ‰§è¡Œæ¡ä»¶ä¸‹Cacheå±€éƒ¨æ€§çš„å˜åŒ–,å¹¶æå‡ºå’ŒéªŒè¯äº†å‡ ä¸ªæ”¹å–„å±€éƒ¨æ€§çš„æ–¹æ³•ã€‚(4)é’ˆå¯¹å½“å‰æŽ¨æµ‹å¤šçº¿ç¨‹ä¼˜åŒ–ä¸æ™®éä½¿ç”¨çš„ç¦»çº¿å‰–æžæ–¹å¼å—åˆ°åŸ¹è®è¾“å…¥é›†é™åˆ¶çš„é—®é¢˜,æœ¬æ–‡æå‡ºå¹¶å®žçŽ°äº†ä¸€ç§åœ¨è¿è¡Œæ—¶æ ¹æ®åœ¨çº¿å‰–æžç»“æžœè‡ªåŠ¨å˜æ¢æŽ¨æµ‹å¤šçº¿ç¨‹ç¨‹åºçš„åŠ¨æ€ä¼˜åŒ–æ–¹æ³•ã€‚è¯¥æ–¹æ³•åœ¨è¿è¡Œæ—¶æ‰§è¡Œå‰–æžå’Œä¼˜åŒ–å·¥ä½œ,ä¸éœ€è¦å•ç‹¬çš„å‰–æžè¿‡ç¨‹ä»¥åŠé€šç”¨çš„æµ‹è¯•è¾“å…¥é›†,åŒæ—¶ä¹Ÿé€‚ç”¨äºŽé‚£äº›è¿è¡Œæ—¶è¡Œä¸ºç‰¹å¾å‘ˆé˜¶æ®µæ€§å˜åŒ–çš„ç¨‹åºã€‚å®žéªŒè¡¨æ˜Ž,åœ¨æŒ‡å¯¼äº‹åŠ¡åˆ’åˆ†å’Œé€‰æ‹©å¹¶è¡Œå¾ªçŽ¯æ–¹é¢,åŠ¨æ€ä¼˜åŒ–æ–¹æ³•èƒ½å¤Ÿè¾¾åˆ°å’Œç¦»çº¿ä¼˜åŒ–æ–¹æ³•ç›¸è¿‘çš„æ•ˆæžœã€‚åœ¨è®¾è®¡è¯„æµ‹SPoTMç»“æž„æ¨¡åž‹,å¼€å‘åŠ¨æ€è½¯ä»¶ä¼˜åŒ–ç³»ç»Ÿçš„è¿‡ç¨‹ä¸,æˆ‘ä»¬å¾—åˆ°äº†ä¸€äº›å…³äºŽå¦‚ä½•æœ‰æ•ˆåˆ©ç”¨æŽ¨æµ‹å¤šçº¿ç¨‹æŠ€æœ¯çš„å®šæ€§ç»“è®ºã€‚é¦–å…ˆ,ä¸ºäº†æå‡æŽ¨æµ‹æ‰§è¡Œæ€§èƒ½,æˆ‘ä»¬è®¤ä¸ºæ›´å¤šçš„åŠªåŠ›åº”å½“æŠ•å…¥åˆ°è½¯ä»¶ä¼˜åŒ–æ–¹é¢,è€Œä¸æ˜¯æ¿€è¿›åœ°è°ƒæ•´ç¡¬ä»¶ç»“æž„å’Œæ‰§è¡Œæœºåˆ¶ã€‚å…¶æ¬¡,æŽ¨æµ‹å¤šçº¿ç¨‹æŠ€æœ¯å¹¶ä¸èƒ½ä½¿è‡ªåŠ¨å¹¶è¡Œå®Œå…¨å–ä»£æ‰‹å·¥å¹¶è¡Œ,è¿™ç§æŠ€æœ¯å¯ä»¥ä½œä¸ºæ‰‹å·¥å¹¶è¡Œçš„è¾…åŠ©å·¥å…·æ¥ä½¿ç”¨ã€‚æœ€åŽ,ä¸è®ºæ˜¯æ‰‹å·¥å¹¶è¡Œè¿˜æ˜¯è‡ªåŠ¨å¹¶è¡Œ,ä¸€ä¸ªæ¸è¿›çš„å¹¶è¡Œä»£ç å˜æ¢è¿‡ç¨‹éƒ½æ˜¯éœ€è¦çš„,è€Œåœ¨æ¤è¿‡ç¨‹ä¸,å‰–æžæŒ‡å¯¼çš„ä¼˜åŒ–æŠ€æœ¯èµ·ç€éžå¸¸å…³é”®çš„ä½œç”¨ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Multi-core architecture has become the main stream of processor designs, but to make full use of the parallel computing resources on the multi-core platforms the multi-threads application is desired. The speculative parallel threading technique has been proposed in order to simplify the parallel programming. Its distinct characteristic is to relax the constraints about sequential semantics between threads, which allowed programmers or compiler to attempt the aggressive optimizing ways even though the validity of transformations canâ€™t been guaranteed in the static compiling phase.It is an issue to buffer memory accesses during implementing speculative multithreading. Current speculative multithreading projects used too complicated buffering mechanism, which increased the complexity of hardware designs and impacted the efficiency of multithreaded application developments. The other problem is how to reduce the indetermination of parallel performance gains from mis-speculation. So this dissertation uses transactional memory and dynamic profiling technique to address the two problems. And the research target is to find an efficient software-hardware associative solution to speculative multithreading for multi-core platforms.This dissertation focuses on the implementation of the speculative technique based on transactional memory, which covers architecture model, programming model, threaded execution model, and dynamic optimizing methods. The detailed work includes the following aspects: First, a speculative multithreading architecture based on transactional memory, named as SPoTM (Speculative Parallelization on Transactional Memory), has been proposed. SPoTM isolates the load/store operations contained in different threads through the transactional, and support out-of-order execution, in-order commitment, violation detection and recovery from speculation failure. Second, a simple programming model which targets the loop parallelization has been designed, and the speculative system library and the ISA extension go along with it. It needs very few modifications to parallel sequential programs using this programming model, so this model significantly simplifies the parallel programming. Third, we have developed two simulation tools for the verification and experiments, one of which is fastTM performing the function-level simulation, the other of which is sim-SPoTM, supporting cycle-precious simulation. To evaluate the effect of various software and hardware factors on the speculative execution performance, we attempt some design choice and use several applications in SPEC CPU 2000 benchmark as test cases running on the SPoTM simulation platform. We also consider and analyze the change of Cache locality under the speculative multi-threading environment, and propose a few methods to improve the locality. Finally, an online profile guided dynamic optimization framework has been proposed on the SPoTM platform as the core component of the continuous gradual profile guided software parallel optimizing system for speculative execution. The offline profile way canâ€™t guide effectively and accurately the optimization of the program without a representative training input, but in most cases there arenâ€™t such training inputs. We attempt to adopt the online profile to extend the usage of profile in speculative optimization. The evaluation shows that the ability of this approach is comparable to the traditional offline implementation on two aspects: identifying the loops suitable to be speculatively parallelized; and performing transactional partition optimization. So we believe that this approach is able to serve as an individual guide to speculatively parallelize the applications when traditional offline profile is unavailable due to the lack of general training inputs.We have drawn some conclusions of the speculative parallel threading technique itself during the process of the implementation and evaluation to the SPoTM architecture. First, we think that more efforts should be devoted to the software optimization, not complicated hardware design, because we have found that even the very aggressive hardware mechanism achieved only limited performance gains. Second, it is impractical to improve the performance of most applications through automatic parallelization using speculation, and this speculative multi-threading technique can be regarded as an assistant tool for the sophisticated manual parallelization. Finally, profile technique plays a key role in gradual speculative multi-threading optimization, whether it is the offline way or the online way, but in the future the latter is more and more important because of the requirement of dynamic optimization.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ å¤šæ ¸ä½“ç³»ç»“æž„ï¼› æŽ¨æµ‹å¤šçº¿ç¨‹ï¼› äº‹åŠ¡å˜å‚¨ï¼› ç¨‹åºå‰–æžï¼› åŠ¨æ€ä¼˜åŒ–ï¼›
ã€Key wordsã€‘ multi-core processor architectureï¼› speculative multithreadingï¼› program profileï¼› dynamic optimizationï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ ä¸å›½ç§‘å¦æŠ€æœ¯å¤§å¦

ã€åˆ†ç±»å·ã€‘TP332
ã€è¢«å¼•é¢‘æ¬¡ã€‘5
ã€ä¸‹è½½é¢‘æ¬¡ã€‘272

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

å¤šæ ¸ç»“æž„ä¸Šé«˜æ•ˆçš„çº¿ç¨‹çº§æŽ¨æµ‹åŠäº‹åŠ¡æ‰§è¡Œæ¨¡åž‹ç ”ç©¶

Research on the Efficient Thread-level Speculation and Transactional Execution Model on Multi-core Platform

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

å¤šæ ¸ç»“æž„ä¸Šé«˜æ•ˆçš„çº¿ç¨‹çº§æŽ¨æµ‹åŠäº‹åŠ¡æ‰§è¡Œæ¨¡åž‹ç ”ç©¶