节点文献

用于软件故障定位的差异比较方法及其改进

Fault Localization Method Based on Difference Comparison and Its Improvement

【作者】 洪丽娜

【导师】 陈荣;

【作者基本信息】 大连海事大学 , 计算机科学与技术, 2010, 硕士

【摘要】 软件故障定位目的是快速准确定位软件中出现的错误,它是一项复杂又耗时的过程。基于测试的软件故障定位技术(TBFL)利用测试覆盖信息对程序错误(亦称故障)进行定位。TBFL可以分为基于差异度量的技术和基于特征统计的技术:基于差异度量的方法计算与失效运行与所有成功运行的最小差异,但是程序实际故障位置很可能不出现在最小差异中,并且这种方法都没有对可疑代码按照可疑程度进行排序;基于特征统计的方法是通过对动态程序行为的统计信息进行分析来定位故障,这种方法对代码进行了可疑性排序,但排序往往受制于测试用例的质量。对TBFL的研究工作需要大量的失效运行和成功运行,而这些所需数据不可避免地存在大量的运行路径冗余,因此导致定位效率和准确性的降低。针对以上问题,我们首先提出了一种基于控制流对齐的运行约减算法,然后提出了一种基于运行约减的差异统计软件故障定位方法。为此,我们首先对运行约减,把运行分成若干类,消除运行冗余;再计算每一类成功运行之间及每一类失效运行之间差异,最后对得到的差异进行统计分析,得出故障报告。我们不但对可疑代码进行了可疑性排序,并对一些非可疑代码进行了非可疑性排序。实验表明,我们的算法对分支错误的定位效果明显优于其他算法,对其他类型错误的定位也起到了较好的效果,大大提高了定位效率和准确性,使得用户能快速准确找到错误位置。

【Abstract】 Software fault localization with the goal to locate fault in program quickly and correctly is a complex and time-consuming process. Testing-based fault localization (TBFL) localizes program errors (or faults) by using testing coverage information. TBFL can be divided into distance measures-based and characteristic statistics-based methods. Distance measures-based method calculates the minimal difference between failing run and successful runs for fault location, while the position of the fault may not be in the minimal difference but in other differences, also this method doesn not rank suspicious codes in terms of their likelihood of being faulty. In contrast, feature-based statistical method locates faults through analyzing statistical information of dynamic program behaviors, which ranks the suspicious codes but the ranking metrics are sometimes subject to the quality of test cases. Most of the research works in TBFL require a large amount of failing runs and successful runs. Those required execution data inevitably contain a large number of redundant execution paths, and thus leads to a lower efficiency and accuracy of locating.To tackle these problems, we propose a runs reduction algorithm on the basis of control flow alignment, then propose an improved fault localization method based on statistical differences between reduced runs. To do so, we first cluster all successful runs and failing runs respectively in order to eliminate run redundancy, then calculate the differences between each class of successful runs and each class of failing runs, and finally return bug reports through statistical analysis of these differences. We not only rank suspicious codes by their likelihood of being faulty, but also rank some unsuspicious codes according to their possibilities of not containing the bug. The experimental results show that our algorithm performs significantly better than competitors when there are branch errors in programs, and also makes a good effect on other error types. Our algorithm greatly improved the efficiency and accuracy of fault localization, and can help the user find errors quickly and correctly.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络