节点文献

基于支持向量机的软件可靠性模型分类及失效分析

The Classification of Software Reliability Models and Failure Analysis of Software Reliability Based on Support Vector Machines

【作者】 晁冰

【导师】 徐仁佐;

【作者基本信息】 武汉大学 , 计算机软件与理论, 2010, 博士

【摘要】 软件可靠性是软件质量体系中最重要的衡量指标之一。软件可靠性的高低,决定了软件是否能稳定,可靠地工作。软件可靠性工程概念的提出距今已有近40年的历史,许多科研机构,高等院校,企业单位和专家学者为软件可靠性工程的普及和实施不遗余力地做了大量的工作。但是,与目前软件开发水平和方法的飞速进展相比,软件可靠性工程的实施并没有取得长足的进步和根本性的改观。现在,不要说真正实现软件可靠性工程既定的理想目标,只要在软件项目中能够真正落实一些软件可靠性工程的指导思想,实施一些软件可靠性工程的方法,就已经是让决策者和开发人员头疼的事情了。为什么会出现这样的现象呢?这是因为,第一,软件可靠性工程是软件工程范畴的一部分,同时,它又是软件工程中最特殊的一部分。实施软件可靠性工程,不管需要做多少工作,最终都是为软件可靠性分析服务的。软件可靠性分析是以数学为工具,综合运用概率论,数理统计和数据分析等方面的数学知识,对软件的各种可靠性指标进行各种评估或预测。这就使得实施软件可靠性工程的有关人员要具备这些数学方面的坚实基础。这在目前的软件项目开发单位中是很难做到的。第二,虽然目前已有软件可靠性模型不下数百种之多,但是真正能够适应常见软件项目(也就是具有普适性的)的可靠性模型尚不存在。这也是由于各种可靠性模型在应用时都有着这样或那样的要求或假设。那么如何在这么多模型中找到适合的模型无疑也就成了一件繁琐甚至有可能是劳而无功的事。第三,软件可靠性工程的实施并不仅仅是做数学上的分析,另外它还要兼顾落实软件工程的指导思想和方法,这也是落实软件可靠性工程的繁重任务之一基于上述三种原因不难发现,要真正地落实软件可靠性工程,一方面要落实好软件工程的指导思想,另外一个更重要的方面就是要有一种简捷易行且有效地评估软件可靠性的方法。本文提出了基于最小二乘支持向量机(Least Squares Support Vector Machines LSSVM)理论的软件可靠性模型分类及选择方法和软件可靠性分析方法。前者是解决在已有众多的软件可靠性模型中进行选择的问题,后者是直接应用支持向量机理论对软件可靠性进行分析。支持向量机(Support Vectoer Machines SVM)是基于统计学习理论中结构风险最小化归纳原则的一种机器学习方法,是统计学习理论中最年轻,也是最实用的方法。Vapnik从数学上给出了这种做法的理论依据以及一整套的求解步骤。SVM的基本思想是在样本空间或特征空间,构造出最优超平面,使得超平面与属于不同类的样本集之间的距离最大,从而达到最大的推广能力。支持向量机构造简单,构造过程具有很大的通用性,它对问题的求解具有全局最优性,其本身也有较好的推广能力,它是求解模式识别(分类问题)和函数估计(回归分析和函数逼近)问题的有效工具,而LSSVM是对一般SVM的改进,它使得对基于SVM的最优化求解问题变得简单,从而也更加易于实现。由于求解分类问题是SVM的理论基础,本文从对软件可靠性模型的分类问题入手,以两类分类问题为例,引出针对软件可靠性模型进行多类分类的最小二乘支持向量分类机(Least Squares Support Vector Classification Machines-LSSVCM)方法。同时也归纳了几种其它对软件可靠性模型进行分类的方法。这些方法基本上都是以数学概念或模型的数学特征为标准,对可靠性模型进行分类,这样仍旧难免使它们的实用性大大降低。本文提出了在LSSVCM方法的基础上,依据现有软件可靠性模型的应用条件和假设,采用主成分分析方法和聚类分析方法对它们进行适当的归约和组合,并以此为依据实现对模型的选择及分类。由于模型的应用条件和假设都是容易理解且容易实现的现实因素,不需要有准确的数学定义或数学特征,这就使得基于LSSVCM的模型分类方法在使用上更符合现实情况的需求,有助于人们依靠简单的软件特性或故障特征快速准确地选择合适的可靠型模型,从而降低可靠性模型在实际应用中的难度。虽然基于LSSVCM的可靠性模型分类方法,在很大程度上提高了选择合理的可靠性模型的效率,但是并没有解决可靠性模型在应用时仍然需要许多假设条件和大量数学运算的问题。而且,所选出的可靠性模型也未必是适合该软件项目的,或者不是始终能够适合该项目。为此,本文更进一步地提出了基于最小二乘支持向量回归机(Least Squares Support Vector Regression Machines LSSVRM)的软件可靠性失效函数构造方法,实现对软件失效情况进行最佳的回归拟合,从而方便对软件的可靠性作出适当的评估。为了解决LSSVRM在进行学习训练时对相似数据重复学习的问题,对失效数据采用聚类分析方法删除特征接近的数据,从而提高了LSSVRM的学习效率。使用基于LSSVRM的可靠性回归分析方法,具有以下几方面的优势。第一,该方法实际上是一种算法,可通过各种编程语言实现,具有通用性,一旦编写完成,可以在不同的项目中反复使用。即,该方法可以适用于任何类型软件的可靠性分析,而不受任何外部环境或假设条件的约束。第二,该方法不需进行参数估计,可以直接根据已有数据进行函数逼近。这一点与其他的可靠性模型分析方法有根本的不同。第三,该方法可以适用于整个软件生命周期,而无需根据软件开发阶段的不同重新选择模型或者重新估计模型参数。通观这些优点,最重要的就是该方法在应用时无需掌握大量的数学知识,且具有较强的通用性。为了更有效地使用基于SVM的软件可靠性分析方法,解决在失效分析模型中仍然存在的人为因素对模型中某些特征参数的影响,本文提出了利用模拟退火(Simulated Annealing SA)算法对SVM中的某些自由参数进行自动寻优。在完成构造SVM的同时,能够对那些并不构成支持向量回归机最优解的参数给出最优值,从而避免了人为因素的影响,也减少了构造模型的工作量。考虑到现实中的软件的可靠性并不只是受单一或少数几个因素的影响,讨论了基于多变量的软件可靠性模型构造方法,采用状态分析法确定主要的可靠性因子,利用核主成分分析及LSSVRM构造多变量的软件可靠性模型。本文对基于LSSVM的软件可靠性模型分类和选择方法以及软件可靠性分析方法做了一定程度地分析研究。这些工作仍然有待于更进一步的深化和细化,以期达到更理想的应用效果。这也是在完成本论文之后仍需继续进行的工作。

【Abstract】 The reliability of software is one of the most important software quality factors. The level of software reliability decides whether the software could work stably and reliably or not. The software reliability engineering (SRE) has presented for near 30 years, and many institutions, universities, industry and scholars have done their best to carry out and popularize SRE. But, in comparison with the high progress of software development methods, there are not still high advancements or radical changes for the SRE. To leaving aside the desirable target of SRE, to fulfill some guide thoughts and implement some SRE methods is a troublesome issue for project policymaker and software developer.There are three reasons that cause the phenomenon above; the first is that the SRE is the most special part of software engineering. The main purpose of SRE is used for the analysis of software reliability indexes. The analysis of software reliability is based on mathematical tool and applies the probability theory, statistics and data analysis synthetically to evaluate and predict the software reliability indexes, which require the analyst with solid foundation of mathematic; but it’s difficult to meet it. The second is that there are hundreds of software reliability models, but none of them is universal for common software project. It’s because of the requests or hypotheses that the application of model is restricted. How to find the most suitable model for software project becomes a complicated even nonsensical thing. The third is that the practice of SRE is not only mathematical analysis but also implementation of guide thought and methods of software engineering, which is a heavy mission of implementation of SRE.To take account of these inconveniences, the way to solve them is that, one side, the guide thought of software engineering should be applied thoroughly, and on the other side, the reliability of software can be computed easy and simply.In this dissertation, the classification method of software reliability model and the computation of software reliability base on the least square support vector machines (LSSVM) are presented. The classification method could solve the problem of selection of software reliability model and the computation method uses the LSSVM theory to analyze and evaluate the software reliability.The support vector machines (SVM) is a machine learning method based on the structure risk minimum (SRM) of statistical learning theory (SLT) and is the youngest and very practical in SLT. Vapnik provided the theory foundation for it from mathematics and deduced the evaluation method of its risks performance and a series of solution steps. The basic thought of SVM is to construct an optimal super-plan based on the sample space, which has the largest distance to sample sets belonging to different class, so that the ability of generalization can also be optimal. The SVM’s structure is simple and it has global optimum and better popularization ability, so it has been studied extensively since 1990s. The SVM is an effective tool to solve problems like as classification and function regression.As a SVM’s main application is data classification, beginning with the classification of software reliability models, this dissertation introduced the Least Squares Support Vector Classification Machines (LSSVCM) based on the problem of binary classification to solve multicategory problem. In this dissertation, other methods of software reliability models classification were generalized. These methods use the mathematical conception or mathematical characteristics of models to classify models, which reduced the practicability of models. The classification method based on the LSSVCM uses the cluster analysis and principal component analysis to reduce and combine the application conditions and hypotheses that are bases for the models. The application conditions and hypotheses are some real factors that can be understood and realized easily and they are not perfect mathematical definitions or mathematical characteristics, so the classification method based on the LSSVCM are even suitable for real situation that can help analyst select the reliability model fast based on some simple software properties or failure characteristics and the difficulties of model application can be decreased.The classification of reliability models based on LSSVCM can improve the efficiency of model selection highly, but it does not solve a problem that the application of reliability model needs many hypotheses and large mathematical computation and it’s uncertain that the model selected is suitable for current software project. This dissertation presented a method to construct the failure function of software reliability analysis based on the Least Squares Support Vector Regression Machines-LSSVRM to fit to the failure data of software reliability accurately and evaluate the software reliability appropriately. A shortcoming of LSSVRM is that the LSSVRM would be done repeatedly because of the resemble data. To solve this defect, the cluster analysis is used to delete some data that has similar characteristics, so that the learning efficiency of LSSVRM could be improved.There are some advantages by using reliability model based on the LSSVRM in reliability analysis. The first is that the LSSVRM is an algorithm that can be realized by any programming language, and can be used repeatedly in different projects. On the other words, this method can be used to analyze the reliability of different software projects and does not be restricted by any circumstance factors or hypothesis conditions. The second is that the model based on the LSSVRM hasn’t parameters to estimate and it can fit to the failure data immediately, which is a main difference to other software reliability analysis methods. The third is that this method can be used in whole software life cycle and the problem of model re-selecting and parameters re-estimating, because of the different stage of software development, can be averted. A most important point about these advantages is that this method needn’t many mathematical theories and it has a high generalization.There are still some parameters in a failure model based on the SVM that need be modified by hand. To use this model effectively, the simulated annealing (SA) algorithm is introduced to automatically optimize these parameters in model. With SA algorithm, those parameters that are not the optimal solution of SVM optimal problem can be optimized aotumatically and meanwhile the best failure model based on the optimal SVM can be constructed, so the man-made interference can be averted and the workload of constructing model can also be decreased.Because of there being many factors that influence the software reliability, the method to construct the multi-variable software reliability model is discussed in this dissertation. The key-thought is to discided the major factors by state analysis and build the model by kernel prime component analysis and SVRM.In this dissertation, the method of classification and analysis of software reliability based on the LSSVM was studied in certain depth. The study should still be made further and thoroughly.

  • 【网络出版投稿人】 武汉大学
  • 【网络出版年期】2011年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络