节点文献

机群操作系统中的高可用管理

【作者】 刘建华

【导师】 孟丹;

【作者基本信息】 中国科学院研究生院(计算技术研究所) , 计算机系统结构, 2004, 硕士

【摘要】 机群系统的优点是可扩展性好,但随着机群系统规模的增大,节点数目的增多,机群系统整体的可靠性会相应降低。因此提高机群系统可用性的软件将成为机群操作系统中必不可少的部分。特别是故障恢复手段对大规模系统和长时间运行的应用显得尤为重要。另外,由于在机群操作系统中为每个子系统或子服务以及第三方应用独立维护自身高可用所带来的系统复杂性、系统运行时资源的浪费以开发维护过程中人员浪费与困难导致了机群操作系统中需要开发独立的高可用管理软件用以维护其它子系统或应用的高可用性。 曙光4000机群操作系统是一个集成的、一体化的机群中间件系统,高可用管理软件HA触发器是这个中间件系统的一个重要组成部分,我们称之为机群操作系统的一个重要“服务”。该服务是从原有的机群系统软件中抽取出来的可以共享的服务之一,它负责小规模应用和服务的高可用管理。HA触发器软件的设计采用了基于服务和一体化构件的思想,以基于CORBA的分布式构件方式实现,具有良好的可扩展性、高可用性和系统的包容性。 本文以提高机群系统中应用和服务的可用性为目的,以曙光4000机群操作系统为工程背景,探讨设计和实现机群操作系统中高可用管理软件过程中面临的关键问题及其解决方案。论文首先介绍的是课题背景、高可用研究目和高可用基本理论等相关内容。接着介绍了曙光4000机群操作系统的高可用性设计并提出了高可用管理在其中面临的关键问题。然后围绕这几个问题设计并实现了机群系统的高可用管理软件HA触发器。最后对高可用管理带来的应用和服务的可用性影响进行了量化建模分析。

【Abstract】 The advantage of cluster system is scalability. But with the number of cluster increasing, the whole dependability of cluster will decrease. So, the high availability software will be an inevitable part of cluster operating system. The recovery method is especially important to the application that is large scale and will run long time. On the other hand, that every subsystem or sub service take care of them high availability in cluster operating system is complex and will result in the waster of resource of system and labor in the development and run process. So the independent management software is needed to maintain the high availability of subsystem.The "Dawning 4000" cluster operating system is an integrated middleware system. HA triger belongs to it and is called a service. Being taken out from the past cluster system software, this service can be shared by different system components.. HA triger is useful for managing the high availability of applications and services which are small scalar. The author adopts the idea of service-based design and component-based development to design the HA triger and to implement it with CORBA. The HA triger has the characteristic of high availability, scalability, compatibility and so on.The purpose of this paper is to increasing the service availability in cluster system. And the project context is the Dawning 4000 cluster operation system. It discussed the key problem and the methord that solve it in the designing and development of high availability software service management software of cluster operating systems.In the beginning, this paper introduces its background,the purpose of researching and high availability basic theory. Then it describes the high availability designing of Dawning 4000 operating system and put forwards the key problem of high availability admistration in cluster operating system. Furthermore, we designed HA trigger architecture surrouding these problem. At last, we used the math model to analyze the effect of service availability that HA trigger brought forth quantitatively.

【关键词】 机群机群操作系统构件高可用性建模
【Key words】 clusterhigh availabilitycomponentmodel
  • 【分类号】TP316.4
  • 【被引频次】1
  • 【下载频次】68
节点文献中: 

本文链接的文献网络图示:

本文的引文网络