节点文献

大规模分布式软件系统的伴随式监控技术研究

Research on Accompanied Monitoring Techniques for Large-scale Distributed Software Systems

【作者】 刘东红

【导师】 邹鹏;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2011, 博士

【摘要】 随着软件规模和复杂度不断增加,软件故障和软件失效问题日益严峻,给人们生活、社会和经济运转等带来极大的影响。如何保证大规模分布式软件系统的可信运行已经成为现实挑战。大规模分布式软件系统具有规模庞大、部署分散、部件异质、环境开放、持续演化等特点,其可信问题很难通过测试、形式化验证等手段在开发阶段解决。要保证大型分布式软件系统持续、可靠运转,需要运行时对其进行必要的“把握”和“调整”,其中的重要实现技术之一就是软件监控。软件监控技术经历了数十年的发展,研究者已经从软件调试、软件维护、软件调优、软件质量评估、软件实时容错等角度开展了研究。但在支持大规模分布式软件系统的可信运行方面,此类系统特点仍然为现有工作带来较大的挑战,突出表现在:(1)缺乏大规模分布式软件系统监控体系架构的研究,特别是针对此类系统中监控架构的适应性、灵活性、非侵入性等的研究;(2)缺乏对大规模分布式软件系统监控使能机制和方法的研究,特别是监控需求的完备性和系统性、监控设计的效率等方面的研究;(3)缺乏对分布式系统监控机制的普适开发方法的研究。本文从软件工程层面入手,针对上述挑战,从监控体系、监控使能机制、监控开发方法三个角度展开初步探索。本文工作的主要创新点包括:(1)提出了面向大规模分布式软件系统的伴随式监控架构和方法体系。“伴随式”是指监控机制的开发和运行基本独立于被监控系统。本文提出了伴随式软件监控架构和方法体系,通过应用面向方面编程(Aspect-OrientedProgramming,AOP)技术和相关设计模式支持大规模分布式系统开发时的伴随,通过监控使能的分布式软件运行框架实现运行时的伴随,通过监控系统的动态部署方法实现演化维度上的伴随,从而实现在开发阶段可以支持多种软件开发模式,运行阶段减少对被监控系统行为影响以及支持监控机制的独立在线演化的目标。(2)提出了设计约束驱动的软件监控使能机制和方法。本文剖析了软件设计过程中的计算表达失真、物理-数字映射失真和人的因素三类主要失真问题,从计算系统本源角度分析了软件系统监控的基本对象和要素,进而提出基于设计约束的不变式监控使能机制和操作规程强制执行机制。两种机制以伴随式思想为引擎,实现了对计算表达中存在的设计约束、操作人员应当遵守的使用规程的运行时监督,相对应的方法和工具拓宽了监控需求源,提高了监控使能的自动化程度和设计效率。(3)提出了基于元信息的软件系统监控能力注入方法。业务逻辑和监控逻辑混杂在一起,会导致大规模分布式软件系统复杂性的增加。基于元信息的软件监控能力注入方法通过“分而治之”思想来实现伴随效果:监控代码的开发和维护独立于业务逻辑,监控代码开发完成后,通过提取源代码的元信息,结合反射机制和工具支持将监控能力注入到软件系统中。这一方法由源代码分析、监控需求建模、监控代码生成、监控代码织入等环节构成,并由相应的软件工具支持,可以实现软件开发过程中和软件发布以后的监控能力注入,支持多种开发方法和环境。(4)提出了监控信息的多用途归纳和综合方法。软件系统监控的最终目标是服务于用户对系统的要求,这些要求来源于两个空间和三类用户,两个空间是数字空间和物理空间,三类用户是维护人员、保障人员和业务人员。本文从监控能效发挥的角度,提出了在数字空间通过监控综合解决软件运行时“黑盒”问题的运行轨迹追踪方法,面向调试和缺陷定位的监控定制方法,以及基于监控的数字-物理空间虚实协同分析方法,探索通过监控信息归纳和综合方法形成新能力的思路和途径。

【Abstract】 Along with the increase of the scale and complexity of software systems, softwarefaults and failures is becoming more and more severe, bringing a tremendous impact topeople’s lives as well as social and economic functioning. How to ensure thetrustworthiness of large scale distributed software system becomes a real challenge.Large-scale distributed software systems have the following characteristics: immensesize, decentralized deployment, heterogeneous components, opening environment andcontinuous evolution, which makes that its trustworthiness cannot be ensured by meansin the development phase such as software testing and formal verification. To guaranteereliable operation of large scale distributed software system, we have to "inspect” and"adjust" the system at runtime, in which one of the key enabling technology is softwaremonitoring.Software monitoring technology has undergone decades of development, andresearchers have carried out their research from different point of views, such assoftware debugging, software maintenance, software tuning, software qualityassessment, software, real-time assurance and fault-tolerance. However, to support theassurance of the trustworthiness of large-scale distributed software systems, we stillconfront a set of huge challenges. The most prominent ones are:(1) The lack of researchon the large-scale software monitoring architecture, especially the adaptability,flexibility, and non-invasiveness of for such an architecture;(2) The lack of research onthe monitoring-enabling mechanisms and methods in large-scale systems, especially theresearch on the completeness of monitoring requirements and the efficiency of themonitoring design;(3) The lack of pervasive development methodology to design themonitoring system in large systems.This dissertation launches the research in view of the above challenges and mainlyfocuses in the software engineering field. Our work covers three major topics: themonitoring architecture, the monitoring-enabling mechanism, and the monitoringdevelopment method. The main contributions of this dissertation are as follows:(1) The Adjoin Monitoring Architecture for large-scale distributed softwaresystemsWe present the Adjoin Monitoring Architecture (AMA) for large-scale distributedsoftware system in this dissertation. Here,"adjoin" means the separation of themonitoring mechanisms and the system being monitored. AMA supports the adjoindevelopment through the application of the Aspect-Oriented Programming (AOP)technology and related design patterns, supports the adjoin running through theapplication of monitoring-enabling distributed software run-time framework, andsupports the adjoin evolution through the dynamic deployment of the monitoring system. As a result, AMA can support a variety of software development paradigms, reduce theimpact on the behavior of the system being monitored and support the independentevolution of the monitoring mechanisms.(2) The monitoring-enabling mechanisms and methods driven by designconstraintsThis paper analyzes three main types of distortion in the process of software design:the calculation distortion, the physical-digital mapping distortion and the human-relatedfactors. We present the basic objects and elements in software monitoring from thepoint of view of the origin of computing systems. Based on those work, we propose theinvariant monitoring-enabling mechanism based on design constraints and themonitoring execution mechanism forced by operation specifications. Those twomechanisms are based on the idea of the adjoin monitoring architecture, which achievethe runtime supervision of the design constraints in calculation expression and theoperation specification which should be followed by the operators. The correspondingmethods and tools widen the source of the monitoring requirements and improve theautomation as well as design efficiency of the monitoring-enabling procedure.(3) The monitoring ability injecting method based on meta-informationThe intermixing of the business logic and the monitoring logic will lead to theincrease of the complexity of large-scale distributed software systems. The monitoringability injecting method based on meta-information is the realization of the idea of“divid and conquer”. In this method, the development and maintenance of themonitoring code is independent of the business code, and after the development of themonitoring code, it will be weaved into the target system based on its meta-information(such as the class name, function name, the variable type in the code) with the aid of thereflection mechanism and a set of tools. This approach is composed of four steps: thesource code analysis, the monitoring requirement modeling, the monitoring codegeneration and the monitoring code weaving, which can support the injection ofmonitoring ability before and after the software release as well as a variety ofdevelopment methods and environments.(4) The multi-purpose induction and synthesis method of monitoring informationThe ultimate goal of software monitoring is to serve the requirements from theusers to the system, which comes from the two spaces and three types of users. Theformer is digital space and physical space, the latter includes maintenance personnel,assurance personnel and operational personnel. The main purpose of traditionalsoftware monitoring is the "fault detection." From the perspective of improvingmonitoring efficiency, this dissertation put forward the trace tracking method in the“black-box” problem at runtime based on monitoring synthesis in the digital space, themonitoring customization method for debugging and fault location, and the approach torealize the digital-physical space analysis based on software monitoring.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络