节点文献

消息传递并行程序测试数据进化生成理论与应用

Theory of Evolutionary Generation of Test Data for Message Passing Parallel Programs and Applications

【作者】 田甜

【导师】 巩敦卫;

【作者基本信息】 中国矿业大学 , 控制理论与控制工程, 2014, 博士

【摘要】 随着并行计算技术的发展,越来越多的大规模高精度的科学问题,如图像处理、生物信息、分子动力学模拟,以及天气预报等,可以通过并行程序得到有效解决。并行程序的多个进程并行执行,进程之间通过共享存储或者消息传递的方式进行交互,协同完成计算任务。与串行程序相比,并行程序更加复杂,可靠性要求更高,这意味着并行程序的测试尤为关键。已有学者提出了多种方法,着重测试并行程序进程之间的交互序列,检测死锁、资源冲突和数据竞争等问题。在正确通信的前提下,为了测试程序的其他内部细节,有必要研究并行程序的结构覆盖,以进一步提高并行程序的可靠性。而结构覆盖的关键是使用有效的理论和方法,生成满足一定准则的测试数据。在软件测试领域,尽管遗传算法已经成为一种生成测试数据的重要方法,但是,已有工作主要面向串行程序。本文针对路径覆盖问题,研究消息传递并行程序测试数据进化生成理论及其应用。首先,研究了用于路径覆盖测试数据生成的协同进化解决方案。针对一类确定执行的并行程序,建立测试数据生成问题的数学模型;根据某一进程路径与程序输入的相关性,提出一种新的协同进化遗传算法。该方法中,包含多个子种群和一个合作团体群,每个子种群用来优化与某一进程路径相关的输入分量,基于这些种群的优良个体,合作团体群优化所有的输入分量。通过多个子种群与合作团体群的交替协同进化,求解上述优化模型,进而高效生成期望的测试数据,同时也为协同进化方法在软件测试中的应用提供新的思路。其次,考虑程序执行过程的不确定性,研究了并行程序单路径覆盖测试数据进化生成方法。当程序中使用一些不确定的通信语句时,由于任务划分、进程调度和网络延迟等原因,并行程序的执行过程也具有的一定的不确定性,这给并行程序的测试数据生成带来很大挑战。对于此类程序,为了降低不确定性执行给测试数据生成带来的影响,定义目标路径的等价路径,并给出判定方法,基于目标路径及其等价路径,建立有针对性的测试数据生成问题的数学模型,并采用遗传算法求解该模型。为不确定执行并行程序的测试数据生成问题,提供合理的解决途径。为了减少测试数据生成过程中的计算量,进一步提出了一种基于覆盖难度选择待覆盖路径的测试数据进化生成方法。该方法首先通过影响路径执行的变量、路径关键条件的复杂度,以及Halstead测度等,在目标路径和等价路径中,选择待覆盖路径,以降低路径覆盖的难度;然后,采用遗传算法生成覆盖相应路径的测试数据,以提高测试数据生成的效率。再次,为了有效解决多路径覆盖问题,研究了多路径覆盖测试数据进化生成方法。建立多路径覆盖测试数据生成问题的数学模型,并提出有针对性的进化求解方案。首先,基于每条给定路径及其等价路径,将多路径覆盖测试数据生成问题建模为一个包含多个目标的优化问题,每一个目标与一条目标路径及其等价路径相关;然后,设计评价个体的适应度函数,提出采用遗传算法求解该模型的方法。在测试数据生成过程中,同时考虑所有的目标路径和等价路径,使得一次运行遗传算法生成所有期望的测试数据。最后,针对很多路径覆盖的测试数据生成问题,研究了基于路径分组的多种群进化求解方案。该方法高效生成测试数据的前提是对目标路径合理分组。为此,首先,研究了用于串行程序很多路径覆盖测试数据并行生成的目标路径分组问题,提出一种新的目标路径分组方法。该方法根据可以利用的计算资源和路径相似程度,将待覆盖的目标路径分成若干组,使得不同组包含的目标路径条数相差很小,且属于同一组的目标路径具有很大的相似程度。以此为基础的测试数据并行生成方法能够充分利用并行计算资源,改进测试数据的生成效率。然后,在该分组方法的基础上,考虑并行程序多进程并行执行的特性,根据计算资源和程序包含的进程数两个因素,确定路径的分组数和每组包含的路径数,综合考虑程序所有进程路径的相似度,将目标路径进行分组。基于每一组路径,建立很多路径覆盖测试数据生成问题的数学模型,并采用多种群遗传算法进行求解,进一步提高了测试数据的生成效率。本文的研究工作在一定程度上,丰富了并行程序测试理论,大大提高了测试数据的生成效率,拓宽了进化算法的应用范围,具有重要的理论意义和应用价值。该论文有图18幅,表31个,参考文献132篇。

【Abstract】 Due to the development of parallel computing techniques, more and morescientific problems, such as image processing, weather forecast, and moleculardynamic simulation, can be solved by parallel programs. Multiple processes in aparallel program are executed in parallel and cooperate with each other bycommunication or shared memory to solve practical problems. Compared to serialprograms, parallel programs are more complicated and require higher reliability,suggesting that the testing of parallel programs is necessary. Many scholars haveproposed various methods which focus on testing interleaving sequence to detectdeadlocks, unintentional races, and resources competition. On the premise of thenormal communication, it is necessary to investigate the structure coverage of parallelprograms for testing other internal details, further improving the reliability of parallelprograms. The key of the structure coverage is to generate test data satisfying theadequacy criteria. In the field of software testing, although genetic algorithms havebecome an important strategy of generating test data, existing methods are mainlysuitable for serial programs. Considering the path coverage, this issue investigatestheories and applications of evolutionary test data generation for message passingparallel programs. The main studies include:A co-evolutionary solution for generating test data to cover the path of parallelprograms is investigated. Aiming at parallel programs with determinacy, in whichexecution paths of each progress are determined uniquely by input components relatedto it, the mathematical model of generating test data is built and a novelco-evolutionary genetic algorithm is proposed. According to the correlation between aprocess path and its associated input components, a set of sub-populations and acooperative population are introduced. Each sub-population independently evolves tooptimize a part of the input associated to the specific process path, whereas thecooperative population based on excellent individuals from all sub-populations is usedto optimize the whole input. These two kinds of populations alternatively evolve toeffectively solve the above model and generate test data covering the target path,while providing new ideas for employing co-evolutionary methods to software testing.Considering the non-determinacy of parallel programs, a method of evolutionarygeneration of test data covering one path is presented. When some non-deterministiccommunication statements are employed in a parallel program, it may be executed in a non-deterministic way because of the task partition, process schedule andcommunication delay, which brings a great challenge for generating test data. Forreducing the influence of non-deterministic execution on generating test data, anequivalent path is defined and a method of searching for it is given. Based on thesepaths, a mathematical model of generating test data for path coverage of this kind ofprograms is formulated, and a strategy of solving the model by using a geneticalgorithm is presented. This provides a reasonable way to solve the problem ofgenerating test data for the programs with non-deterministic feature. In order todecrease the computation cost required by generating test data, a method ofevolutionary test data generation through selecting target paths is further proposedbased on the coverage difficulty. First, the method selects the path to be covered toreduce the coverage difficulty in the light of variables affecting a path’s execution,complexities of a path’s crucial conditions, and Halstead’s metric. Second, the geneticalgorithm is employed to generate test data covering the path, so as to improve theefficiency of generating test data.For the problem of multiple paths coverage, an evolutionary solution of generatingtest data covering multiple paths is studied. A mathematical model is built and anevolutionary method is presented against it. The problem of test data generation formultiple paths coverage is first formulated into a multi-objective optimizationproblem, where each objective is corresponding with a target path and its allequivalent paths. An appropriate fitness function is designed when employing agenetic algorithm to solve the above problem. The proposed method can synthesizemultiple test data to cover the given paths or their equivalent ones in one run.Finally, a multi-population parallel genetic algorithm based on grouping thetarget paths is employed to tackle the many paths coverage of parallel programs. Thepremise on which the above method is efficient is appropriately grouping target paths.The problem of grouping target paths is first investigated for the generation of testdata covering many paths of serial programs, and a novel method of grouping targetpaths is presented. In this method, target paths are divided into several groupsaccording to calculation resource available and the similarities among target paths,making a small difference in the number of target paths belonging to different groups,and a great similarity among target paths in the same group. The parallel geneticalgorithm based on this method can make full use of the calculation resources,improving the efficiency of generating test data. Then, based on this, a grouping method suitable for parallel programs is proposed by considering their features ofparallel execution. The number of groups and paths contained in them are determinedaccording to the processes and calculation resource available. The paths are groupedby leveraging all process paths’ similarities. According to the paths of each group, amathematical model is built for parallel generation of test data covering many paths,and a strategy of solving the above model by using a multi-population geneticalgorithm is presented.The research work of this issue enrich the theory of parallel program testing tosome extent, improve the efficiency of generating test data, and extend the applicationdomain of the evolutionary methods, thus having important theoretical and practicalvalues.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络