节点文献

高性能计算机可扩展并行调试技术的研究与实现

Research and Implementation of Scalable Parallel Debug Technique for High-Performance Computer

【作者】 孙国华

【导师】 宋君强;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2007, 硕士

【摘要】 并行调试器在并行程序开发过程中的意义重大。如何有效地捕获并行程序的异常,纠正并行程序的错误,从而提高并行程序的开发质量是研究者一直关注的问题。并行程序除了具有串行程序常见的指针错误、变量错误、语法错误等错误以外,还会发生死锁、活锁、竞争条件等并行程序特有的错误,所以并行程序调试比串行程序调试要复杂得多。相比串行调试,并行程序调试面临可扩展性不足、不确定性、大时间开销等并行调试特有的问题。本文的重点是基于MPI编程模型的并行调试器的可扩展性研究。传统的并行调试器采用Client/Server平面通信模式,系统主控端的负载较大,当需要调试上千个进程规模时,主控端往往由于通信和计算的负载太大而不能正常工作。根据系统设计时提出的可扩展性要求,本文提出TreeNet树型通信协议。采用TreeNet通信协议,系统有效地将主控端负载分散到中间通信节点上,从而使系统的启动时间以及每条命令的操作时间大大缩小,提高系统的可扩展性。可扩展并行调试器采用Tree通信协议,可以支持512进程规模以上的并行程序调试。可扩展并行调试器采用组调试技术,根据需要将并行作业的进程划分为若干逻辑进程集,所有的操作都可以同时作用于当前活跃进程集中。该调试器以断点、单步等调试手段支持源级调试要求。该调试器支持Load和Attach两种调试模式。在Load模式下,并行作业在其整个生存周期内均处于调试器的控制之下,直到调试会话终止;在Attach模式下,并行作业独立加载,用户可在需要时启动调试会话并将整个并行作业纳入并行调试器的控制,调试完成后,用户可剥离调试器对并行作业的控制,并行作业继续执行。可扩展并行调试器的实现以Eclipse平台为基础,利用Eclipse插件体系结构,扩展视图、透视图等界面元素,实现良好的图形用户接口。

【Abstract】 Parallel debugger is greatly helpful to facilitate programmers find problems such as incorrect returned results or abnormal interruption when writing parallel programming. Therefore, in order to improve the quality and efficiency of parallel programs, most researchers usually focus on how to catch the exception, correct the program early.Similar to those errors that occur in sequential programs such as unexpected handling of pointers, mixing of variables, various syntax errors, parallel programs have many other own distinguished problems like deadlock, livelock and race condition. In addition, comparable to sequential debugging, parallel debugging also face some other solid unique problems such as non-deterministic execution, low scalability, time-consuming.This paper mainly concentrates on the research and implement of Scalable Parallel Debugger. Our aim is to develop an efficient parallel debugger based on MPI model. Usually, traditional debugger architecture usually consists of a root debugger connected directly to and controlling debug servers. There are many problems to this approach as the number of application processes scales. Further, we designed a TreeNet protocol to improve the scalability of parallel debugger. Scalable Parallel Debugger uses TreeNet protocol; it can debug more than 512 processes.Scalable Parallel Debugger uses the process group concept to manipulate parallel processes. The group compartmentalizes processes by logic. The basic methods for source level debugging are breakpoint and step. Scalable parallel debugger supports load and attach debugging. In the load mode, the parallel tasks are totally in the control of parallel debugger from the beginning of the tasks. In the attach mode, parallel debugger can debug the parallel tasks when they are running at anytime.Scalable Parallel Debugger is based on Eclipse platform. We use eclipse plug-in architecture and extend extensions of the view and perspective to develop a user-friendly debugger.

节点文献中: