节点文献

基于进化计算的行为模型自动精化和排序学习方法的研究

Research on Evolutionary Computation Based Approaches for Automatic Behavioral Refinement and Learning to Rank

【作者】 王帅强

【导师】 马军;

【作者基本信息】 山东大学 , 计算机系统结构, 2009, 博士

【摘要】 进化计算是研究仿照生物进化自然选择过程中所表现出来的优化规律和方法,以解决复杂的工程技术领域或其他领域的优化问题的一种计算方法。随着进化计算自身的发展,一些新的进化计算方法在不断的提出,而一些旧的方法也在焕发着新的活力。近些年,在一些领域中的某些问题的研究中,传统的方法遇到了很大的困难,例如在软件工程领域中的模型精化问题;而在一些领域中,某些问题利用传统的人工智能方法不能得到很理想的结果,例如信息检索领域中的排序学习问题。这些问题都促使我们从进化计算的角度重新审视,以期利用自然进化的思想对他们进一步处理,从而得到理想的结果。基于以上的应用背景和需求,本文重点研究了利用进化计算相关算法解决传统方法难以处理的某些问题:并以此为基础,开发了一系列的原型系统对研究成果进行了验证。本文主要的研究内容和创新点包括以下方面:1对软件系统中行为的建模和精化理论的研究。本论文根据自动精化方法必须的条件,提出了一套描述行为模型的形式化符号体系,并据此提出了行为模型的形式化建模方法和行为精化理论。行为建模方法和行为模型的自动精化方法的提出,为推动模型驱动开发理论,尤其是UML和形式化方法集成建模技术的发展和应用,将会起到巨大的推动作用。在模型驱动开发中,行为的建模和精化是一个关键问题,它需要考虑对象的一系列的动作语义,包括动作在何时触发,系统状态如何变化,以及行为执行结束后最终状态如何描述等等。由于UML在模型语义等方面的缺失,以及形式化方法和UML混合建模研究的广泛开展,使得我们不得不采用以集合论和谓词逻辑为基础的形式化建模方法,对行为进行精确的建模并逐步精化至代码。近几十年中,尽管有很多精化理论和精化工具相继提出,但总体而言,传统的关于精化方面的研究主要考虑的是整个软件系统模型的精化定义和验证问题。随着UML和形式化方法集成建模的研究的广泛开展,形式化建模方法所关注的领域转变为软件系统的对象行为的细节。在这方面,国内外并没有相关的专门用于描述行为精化的思想提出,而传统的基于整个系统的精化显得过于庞大,对于我们的问题并不十分合适。因此,行为精化理论的提出,是一个亟待解决的重要问题。根据自动精化方法必须的条件,参照前人的工作,我们提出了一套描述行为模型的形式化符号体系,并据此提出了行为模型的形式化建模方法和行为精化理论。本文根据行为精化理论,从问题求解的角度上,提出了一种实现行为精化的一般性的指导思想,即通过搜索状态空间,从而找到一个可以满足特定条件的状态,进而得到精化结果。通过估计状态空间可以证明,遍历式的穷尽搜索办法是不可行的,必须引入启发式的智能搜索机制。2对基于遗传规划的行为模型自动精化方法的研究。本论文提出了一种基于遗传规划的自动精化方法,它为形式化模型的自动精化方法提供了一个新的思路,同时为UML和形式化方法集成建模方法的推广和应用提供了更加有效的工具支持。软件开发的模型化和自动化是软件技术的发展趋势,而缺乏自动精化方法是阻碍形式化方法在业界应用的一个重要原因。在本文中我们提出了一种基于遗传规划的自动精化方法。由于遗传规划算法源自于遗传算法,它也有其自身的弱点,最突出的问题就是它最适合线性结构的问题求解逻辑,而不能有效地处理显式的循环结构和选择结构。为了解决上述问题,我们提出了一种基于谓词逻辑的遗传规划方法。首先,对抽象模型的后置条件进行自下而上的归约精化,以生成显式的循环结构并将问题简化和分解;然后将生成的子问题采用基于遗传规划的精化方法进一步精化,将行为模型的精化看作一个问题求解最终得到实现模型,通过基于遗传规划的方法最终得到一个由若干基本操作组合而成的具体行为。我们还提出了一种在遗传规划中采用组合终止条件的方法,用以生成选择结构;最后,我们提出一种对实现模型的优化方法,以减少实现模型中操作逻辑的冗余。根据我们提出的方法,成功的演化出了冒泡排序算法。事实上,本方法具有一定的通用性,它适用于任何由若干基本操作组合以完成复杂操作的问题求解过程。3对基于进化计算的排序学习的研究。本论文提出了基于进化计算的排序函数发现算法的框架;并根据该框架,提出一种基于免疫规划的排序学习方法RankIP,并得到了很好的性能,有力的证明了我们提出的基于排序学习算法框架的有效性。本论文的研究将有力的促进进化计算方法在排序学习问题的应用。网页排序问题是Web信息检索领域的一个中心问题,一个好的排序算法能够明显的提高检索质量。传统的网页排序算法分为三类:基于链接的方法、基于内容的方法和混合方法。此外,基于机器学习技术的网页排序算法,即“排序学习”,越来越广泛的用来解决信息检索排序问题。这一问题作为信息检索和机器学习的交叉领域,已经成为非常活跃的一个研究热点,并引起广泛关注。目前提出的各种排序学习算法,其结果都不甚理想。一些学者依据进化计算的思想,提出了基于遗传规划的排序学习方法;而最近包括群智能算法,免疫算法等一系列性能更为优异的进化计算方法的提出,也为基于进化计算的排序学习方法提供了更加有力的工具。在这个基础上,我们介绍了一系列的定义,精确的表达了基于进化计算的排序函数发现算法的一般原理;并且描述了基于进化计算的排序函数发现算法的框架,使得任何的进化计算方法都可以灵活的嵌入算法框架中。算法框架的提出,将有力的促进进化计算方法在信息检索领域,尤其是排序学习问题的应用。根据我们提出的基于进化计算的排序函数发现的算法框架,在本文中我们还提出一种基于免疫规划的排序学习方法RanklP。我们采用微软亚洲研究院提供的LETOR 2.0数据集合作为训练和验证数据集合,实验证明RankIP较之其他排序学习方法在P@n,MAP和NDCG等评价标准上均优于目前提出的著名的排序学习方法,如RankingSVM和RankBoost等。在实验中我们还对比了免疫规划和遗传规划在排序学习中的应用,结果表明,由于免疫规划在多样性方面优于遗传规划,因此在几乎相同的条件下,基于免疫规划的排序学习算法的性能更为优越。

【Abstract】 Evolutionary Computation (EC) is a kind of intelligent computing approach which is based on the evolutionary principles and strategies in the nature. It can be used to cope with the complex problems involving optimization, forecast and so on in the engineering, technology and other fields, which cannot be managed effectively by traditional theory and approaches. Nowadays, a lot of new EC approaches have been proposed, and the original technologies leap forward with renewed energy.In recent years, classical approaches encountered great difficulties in some research issues, e.g., the automatic refinement problem in software engineering field; for some problem, classical approaches cannot gain ideal results, e.g., learning to rank in information retrieval area. These phenomenons drive us to try to think of these problems from the point of view of natural evolution.Based on the application backgrounds and requirements mentioned above, this thesis tries to employ EC principles to resolve these problems which are difficult for the classical approaches. A series of prototype tools are also developed to validate the research results. The main research contents and innovations of this thesis are listed as follows:1 The theories of modeling and refinement for behaviors in software systems are studied.According to the requirements of the automatic refinement technologies, we introduce a series of formal symbol systems to express the behavior models. In addition, based on these symbols, we propose an approach of formal modeling and a refinement theory for behaviors. We believe that our research will definitely promote the theory and application of the modeling theory integrating formal methods into UML.In Model Driven Development, behavior modeling and refinement are critical issues which take into account a series of action sematics. The problem of behavior modeling essentially involves many issues, such as (1) when should a behavior be performed? (2) How will the states of a system be transferred step by step due to the behavior? (3) What will the final state be like?Since UML has a loosely defined semantics, according to the principles of integrating formal methods into UML, we have to employ formal methods to express behaviors accurately. These formal methods are usually based on set theory and predicate logic, and can be used to provide an unambiguous and precise supplement to natural language descriptions. In addition, formal models can be stepwise refined, thus codes can be generated finally.In recent decades, a great variety of refinement theories and tools were proposed. In general, they take into consideration the definition and verification issues for whole software systems. However, as the modeling theory integrating formal methods into UML has been proposed and rapidly developed, nowadays formal models turn their attention to behavior details in software systems. Unfortunately, there has not been any refienment theories focus on behaviors yet, and traditional refinement theory seems too complicated to match this situation. Therefore, it is really an urgent matter to propose a behavior refinement theory.Guided by other refinement theories, we introduce a collection of formal symbols for expressing behavior models. We also provide an approach of formal modeling and a refinement theory for behaviors using these symbols. According to the behavior refinement theory, a refinement process can be thought of as a problem of figuring out a special state satisfying certain formulae, thus a rational refinement result can be constructed. Thus firstly we should estimate the number of the states in the state space, so as to calculate the complexity of the behavior refinement problem. The result shows that an exhaustive algorithm is impractical, thus some effective strategies should be introduced.2 An automatic approach for behavior refinement based on genetic programming is proposed.Software development tends to be modelized and automatical. Actually, lack of automatic refinement mechanisms is one of the largest barriers to use the formal methods into industries. Thus in this dissertation we propose an automatic behavior refinement approach based on Linear Genetic Programming (LGP).However, LGP has its own weakness. The prominent issue is that it cannot evolve explict loop structures and choice structures. In order to resolve these problems, we introduce a linear genetic programming algorithm based on predicate logic. In the beginning, the abstract models are refined by the bottom-up reduction refinement method according to the forms of their post-condition predicates. After the reduction stage, explict loop structures can be generated, together with a few simpler abstract models. These abstract models generated in the reduction process are refined continuously by the search refinement method based on LGP. In this phase behavior refinement can be regarded as a problem of finding out and a series of proper basic operations and coordinating them in reasonable forms. Furthermore, we provide an optimizing method for the refinement results by reducing redundant operations. Finally, we offer the Combinatorial Termination Criterion (CTC) in LGP algorithm. Based on CTC, choice structures can be evolved. As a classical example, the bubble sort algorithm is evolved successfully by our approach.Our approach provides a novel way to aotumatically refine the formal abstract models. Meanwhile, it can be recognized as a tool to support the modeling theory integrating formal methods into UML. In addition, this approach has strong universality and is suitable for evolving any complex operation composed of some basic ones.3 The ranking function discovery based on evolutionary computation is studied.The performance of a search engine is mainly evaluated by the accuracy of its ranking results. As a matter of fact, ranking problem is a central issue in the Information Retrieval (IR) field. There are mainly three types of page ranking solutions: hyperlink base approaches, content based approaches and hybrid approaches. Besides, machine learning techniques, that is, learning to rank, are becoming more widely used for the ranking problem of IR. This task has emerged as an active and growing area of researches both in information retrieval and machine learning. However, none of current approaches is ideal.Recently some Genetic Programming (GP) based ranking function discovery technologies are proposed. Meanwhile, some more excellent EC algorithms have been provided, such as swarm intelligent algorithms, immune algorithms, etc, which are regarded as some effective tools for resolving the learning to rank problem. Since there are quite substantial similarities in these EC algorithms, in this dissertation we provide a series of generic definitions, together with a common algorithm framework, for EC based learning to rank. In this way, any EC algorithm can be integrated into this framework. We believe that this work will be beneficial to the use of EC in the IR field, especially for the learning to rank problem.Furthermore, RankIP, an approach based on Immune Programming (IP) for the discovery of the effective page ranking function, is proposed. To validate RankIP, we perform experiments on LETOR 2.0 data collections provided by Microsoft Research Asia (MSRA). Results indicate that the use of RankIP leads to effective ranking functions that significantly outperform the baselines, including Ranking SVM, RankBoost, etc. we also theoretically study the dissimilarities between IP and GP. In order to validate these arguments, we carry out some experiments. The results show that under the same conditions IP gains the advantage of GP.

  • 【网络出版投稿人】 山东大学
  • 【网络出版年期】2010年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络