节点文献

舰载分布式构件系统的容错技术研究

Research on Fault-Tolerance Techniques of Shipborne Distributed Component-Based Systems

【作者】 陈昀林

【导师】 曹万华;

【作者基本信息】 中国舰船研究院 , 计算机软件与理论, 2011, 硕士

【摘要】 舰载作战指挥系统是舰载作战系统的核心部分,是一种典型的分布式实时嵌入式应用系统。它面临着基础计算平台复杂、系统功能多样和用户需求多变等问题。随着作战需求的变化,舰载作战指挥系统的软件规模不断增大,系统的可移植、软件重用性以及可集成能力变得越来越重要,传统结构化软件开发方法很难适应新一代舰载作战指挥系统的研制模式,采用基于构件的软件开发(Component-Based Software Development,CBSD)方法是解决上述问题的有效途径。在构件开发过程中加入冗余、容错功能是保证系统可靠性的方法之一。传统的构件冗余、容错方法是构件开发者根据所需的冗余、容错控制方案编写专用的容错管理代码,使得构件开发的工作量加大,构件复用度变小。为解决上述问题,开展适合构件系统的容错技术研究是很有意义的。本文所做工作是“十一五”国防预先研究课题“海战场综合电子信息系统服务集成技术研究”的组成部分,主要研究舰载分布式构件系统的容错技术和实现,结合课题的具体研制要求,设计和实现舰载分布式构件系统容错模块。本文主要完成了以下工作:(1)结合课题的研究内容和背景,分析了容错技术的国内外研究现状和发展趋势,对基于构件的舰载指挥系统及其系统特点进行了讨论。(2)对舰载分布式构件系统容错模块进行了整体设计,完成了各个子模块的实现。容错模块在设计实现过程中充分考虑了舰载计算环境对实时性和可靠性的需求,在保证可靠性的同时兼顾了舰载计算环境对可用性的要求。(3)给出了一种适合分布式构件系统的基于检查点的后向恢复机制,该机制针对系统应用环境,简化了失效检测和错误诊断子模块,从系统中分离出存储子模块,减小了系统运行开销,适用于系统资源有限的嵌入式平台。(4)给出了分布式构件系统容错模块实验效果,并对该模块的基本功能、错误恢复时间和检查点信息存储时间进行了测试。结果表明,分布式构件系统容错模块具有较好的错误恢复时间和检查点保存速度。

【Abstract】 Shipborne combat command system is an essential part of shipborne combat systems. As a typical distributed and embedded real-time system, nowdays it is encounted with several problems about complicated basic computing platforms, various system functions and requirements, and so on. Additionally, the software scale of shipborne combat command system keeps on increasing due to the changing of combat requirements, which leads to the more and more importance of the portability, reusbility and integration for system software.Traditional structured software development method is difficult to adapt to the development of new generation of shipborne combat command system, and the component-based software development (CBSD) is effective to solve this problem. Adding redundancy, fault tolerance function in the component development process is one way to ensure system reliability. Traditional way to achieve fault-tolerance in component system is that every developped component included a dedicated fault-tolerant management module. It makes component development workload increase and the effect of component multiplexing decrease. To deal with the issues, the research on fault-tolerance techniques and mechanisms in component system is of great significance.This paper takes the "Eleventh Five-Year" national defense research topic in advance—"Research on General Services of sea battlefield Electronics Information System Integration Technology" as the background, carrys out an in-depth research on the design, techniques and implemention of fault-tolerance in shipborne distributed component system. The finished work in the paper mainly includes:Analysis the research status of fault-tolerance technology in and its development tendency, on this basis, study the features of component-based shipborne command systems.Complete the overall design of the fault-tolerance module of shipborne distributed component system, and finish the implementation of each module. In the process, the needs of real-time and reliability in shipborne computing environment are taken into account, while the availability requirement of shipborne computing environment is also considered based on its reliability.A checkpoint-based rollback recovery mechanism for distributed component systems is presented. The mechanism for the system application environment simplifies the failure detection and error diagnosis sub-module, isolated the storage sub-module from system. It reduces the system operating costs to apply to the embedded system whose resource is limited. The basic function, fault recovery time and checkpoint storage time of fault-tolerance module of shipborne distributed component systems are verified by experiment. The results show that the fault recovery time and checkpoint storage time of fault-tolerance module for shipborne distributed component system are acceptable.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络