节点文献

基三体系结构存储系统相关问题的研究

【作者】 刘梦晓

【导师】 石峰;

【作者基本信息】 北京理工大学 , 计算机软件与理论, 2010, 博士

【摘要】 众所周知,多核处理器体系结构是下一代微处理器体系结构的主流,而该结构的一种具体实现方式为片上多核处理器(CMP),也称为多核微处理器。设计CMP的首要问题是选择程序执行模型,只有适当的执行模型才能最大限度地挖掘程序的并行性,最大化地利用CMP中多处理节点的潜能。面向对象是一种基于人类认知规律的方法学,它所倡导的抽象分层原则是解决复杂问题的关键。经过几十年的发展,面向对象模型已经显现出其适用于作为并发和分布式系统架构的前途。但是,计算机体系结构与程序设计语言之间越来越大的鸿沟导致程序运行效率越来越低,而当程序结构贴近目标机器多核体系结构时,机器的能力将会被彻底挖掘出来。因此若能针对复杂问题的分解方式,将面向对象模型与多核处理器体系结构相结合,设计支持对象运行的多核处理器,不仅可以解决多核处理器下并行编程的问题,而且能够提高面向对象程序的运行效率。此外,由于传统共享总线通信结构中存在的延迟、通信性能瓶颈以及设计效率等问题,片上网络(NoC)被认为是一种更适合于构建多核系统的方式。本文基于一个以片上基三互连网络为拓扑结构搭建的支持面向对象范型的基三多内核体系结构——TriBA(Triple-based Architecture),并重点研究TriBA中存储体系结构的相关问题,以及支持面向对象思想的关键技术。研究内容及成果主要包括:(1)提出一种适用于TriBA体系结构的层次化分布式共享存储体系结构——HSMA(Hierarchical Shared Memory Architecture),以及适用于该存储结构的部分包含存储映射策略。HSMA在并行多端口共享存储器的支持下,采用了将分布式存储器和共享存储器相结合的方法,有效的体现并支撑了TriBA的运算和通讯局域性,为TriBA提供了高效的存储访问支持。给出了HSMA的具体设计方案及实现策略,对分布式系统的存储体系结构设计具有很好的指导意义。此外,结合TriBA支持面向对象的特点,设计了一种新的部分包含缓存映射策略,研究表明该种方法着重关注系统层次特性、面向对象特性及协同工作特点,对于HSMA具有很好的适应性和灵活性。(2)研究TriBA系统对面向对象思想的支持方法和实现机制。介绍了面向对象程序在TriBA中的软件运行模型,并从面向对象特征实现和面向对象语法支持的角度提出了具体方案。由于TriBA中以对象为基本处理单元,因此为对象在系统中的存储、访问和管理设计了对象全局唯一标识和分类多级对象表结构。此外,设计了用有限的对象标识位数表示系统中不断生成的对象的方法,即可重用对象表及对象编号重用算法。最后,给出了对象寻址过程,包括对象地址空间划分和访问及对象地址转换。(3)设计TriBA片上多核系统中用于系统通讯的消息机制。给出了消息请求和消息对象的具体格式,以及消息机制的运行方法。在比较了基三网络与2D-Mesh的拓扑特点之后,得出了基三网络用来构建多核处理器上的核间互连网络的优越性。针对TriBA基三网络的传输瓶颈问题,充分利用基三网络的局域性特点,提出了提高系统通讯效率的两点措施,并重点探讨了TriBA层次化分布式共享存储结构对系统通讯的辅助作用。结合通讯方式及消息内容,为系统设计了六种通讯消息类型,其中具体研究了每种消息的运行过程和适用领域。最后从理论上和模拟实验结果上证明该种消息机制的高效性及适用条件。(4)提出一种公平的动态分时复用共享存储器带宽的存储访问调度机制——DBTDMA(Dynamic Bandwidth Time Division Multiplex Access),用于解决多核计算机系统中由于处理核间竞争共享资源而加重存储墙的问题。随着处理器性能的不断提高,存储系统的速度已经逐渐成为整个系统的性能瓶颈。随着处理器中处理核数量以及应用程序中线程数量的增加,越来越多的应用程序的性能将受制于作为共享资源的处理器存储系统带宽的限制,这个问题在TriBA系统中体现在同组处理核对组内共享存储与组间共享存储之间数据通道的争用。为共享存储器设计一种新的访存请求响应的调度机制,并提出一种可变优先级仲裁及调整策略,实现对多处理核访存请求的公平响应和动态规划。实验结果表明DBTDMA机制避免了访存请求无法预料的长时间等待或饿死,并且缩短了存储访问响应的平均延迟。

【Abstract】 It is well known that multi-core processor is the mainstream of the next generation of computer architecture, and the chip multiprocessors (CMP) will be the dominant design paradigm of this area. In order to provide higher potential performance of multi-cores and realize bigger parallelism in programs, the programming model becomes the most important part of CMP design. Object-oriented paradigm (OOP) is based on the cognizance method of human being, the information hiding and data abstraction principles of OOP is the key of resolving complex problems and building parallel programming. Although, after several decades of development, it is widely accepted that OOP can improve code reusability and facilitate code maintenance, and software engineers and developers embrace object-oriented programming for benefits. However, the performance of object-oriented programs running on the non-object-oriented processors is always lower than procedure-oriented programs. As objects are independent of each other and have natural parallel essence, a multi-core processor supporting object-oriented computing not only alleviates the burden of parallel program design, but also can accelerate the execution of object-oriented programs. On the other hand, classical on-chip communication architecture uses a traditional Time-Division Multiplexed (TDM) bus. Bus-based architecture suffers from the clear bottleneck of the share media used for the transmission. Network on Chip (NoC), a new chip design paradigm, is expected to be an important architectural choice for CMP. Using network to replace global wiring has advantages of structure, performance and modularity. So a novel architecture which is based on the NoC and supports the object-oriented technology will become the major trend in the design of future generations of micro architectures. Researches in this dissertation were based on a novel object-oriented multi-core architecture named TriBA (Triplet-based Architecture). This dissertation was focused on the key aspects of memory architectures and object-oriented scheme. The main research works and contributions of this dissertation are listed as follows:(1) A novel Hierarchical Shared Memory Architecture (HSMA) which is suitable for TriBA architecture as well as its partially-inclusive memory mapping scheme was proposed. With the support of multi-ports shared storage, HSMA combined the distributed memory and shared memory to build a high efficiency memory system for TriBA. This work introduced the design and implement of HSMA, and the analysis of the structure shown that HSMA fully utilized the operation and communication localization of TriBA. HSMA has been proven to have superior arrangement, adaptability and flexibility. Besides, the partially-inclusive memory mapping scheme used on TriBA is very appropriate for object-oriented memory management.(2) Research on the support method and implement scheme for object-oriented paradigm in TriBA. TriBA as an object-oriented processor can achieve object properties and object operation in both software and hardware level. This work proposed an object supporting scheme including object mapping, object denotation, object realization and object management. Using object identifier as the reference of object, object indirect addressing achieved using multilevel object table. Object addressing space and address mapping process also explained.(3) Communication scheme design which was based on message passing for TriBA is proposed. Message passing was the only way of the communication among cores and objects in TriBA. First of all, the detailed format of the message, the definition of the message class and the message scheme were shown. Secondly, after comparing the topology properties of triplet-based network with 2D-Mesh interconnection, a novel strategy to provide high-performance communication for TriBA was introduce, which used the data channels along with the on-chip inter-core channels to transfer message and data. Thirdly, messages classified to six kinds according to the transaction mode. Simulation results shown that this strategy could enhance the efficiency of communication in TriBA.(4) A novel fair Dynamic Bandwidth Time Division Multiplex Access (DBTDMA) scheme for memory access scheduling was proposed. Since the gap between the speed of the processor and storage system, the performance of the whole processor is enslaved to the efficiency of the memory system. In object-oriented on-chip multi-core systems, the memory bandwidth is the key shared resource among cores, the memory access become the bottleneck of the performance. In order to improve the efficiency of the memory system, this work put forward a kind of scheme which made multi-cores sharing the memory bandwidth dynamically. Assistant with the alterable access priority of cores, DBTDMA can provide fairly memory access among cores and shorten the memory access latency.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络