节点文献

基因表达式编程理论及其监督机器学习模型研究

Gene Expression Programming: Theory and Supervised Machine Learning

【作者】 张克俊

【导师】 孙守迁;

【作者基本信息】 浙江大学 , 计算机科学与技术, 2010, 博士

【摘要】 基因表达式编程是进化算法的新成员,虽然得到了广泛而深入的应用研究,但至今尚未有系统和完善的理论研究成果,还无法揭示其运行的数学机理,对于一些关键技术的改进也就缺少了理论支撑,这对基因表达式编程和进化算法的研究和发展都是非常不利的。此外,目前经典的机器学习方法在解决分类和复杂函数关系发现问题时所构建的监督机器学习模型,在可理解性、精度以及泛化能力上,还存在诸多问题。有鉴于此,本文对基因表达式编程理论和关键技术进行了系统的研究。首先,通过分析基因表达式编程的运行机理,利用一系列的数学推理,构建了基因表达式编程的模式理论,给出了基因表达式编程的基因块假设,分析了基因表达式编程的模式收敛性,其次,基于模式定理成立的必要条件,提出了自适应遗传算子的设计方案。最后,在理论研究的基础上,对基本基因表达式编程中的染色体生成、个体解码求值等关键技术提出了改进方法,进行了理论和实验分析后给出了改进的基因表达式编程算法。在监督机器学习模型研究方面,为了提高学习模型的噪声数据处理能力和泛化能力本文基于改进的基因表达式编程算法构建了监督机器学习模型,对算法的适应值函数和算法提前终止条件进行了重点研究,并通过两类分类、多维分类、复杂函数关系发现问题等实验对此监督机器学习模型的有效性进行了验证。本文的主要研究成果包括:(1)证明了基因表达式编程有效的同时为更深入地研究基因表达式编程奠定了理论基础。本文通过研究和分析遗传算法、进化规划、进化编程、遗传编程、基因表达式编程各自的特点,对基因表达式编程进行了形式化定义,在此基础上,证明了个体编码的合法性,给出了基因模式和基因表达式编程模式的定义,通过一系列的数学推理过程,构建了基因表达式编程的模式理论,给出了基因表达式编程的模式定理和基因块假设,利用数学分析的方法揭示了基因表达式编程的运行机理,最后,给出了基因表达式编程的模式收敛性证明,指出了采用精英保留策略可以防止基因表达式编程中最优模式丢失的问题。(2)提出了自适应遗传算子方案。本文深入研究了优势模式在相关遗传算子作用下的存活概率,给出了基因表达式编程模式定理成立的必要条件,在满足模式定理成立的必要条件下,对遗传策略和遗传算子的参数设计进行了研究,指出了相关遗传算子的参数选择上限,并提出了自适应遗传算子设计方案,最后就自适应变异算子进行了相关实验研究,验证了自适应遗传算子设计方案的可行性;(3)提出了改进的基因表达式编程算法。本文在理论分析的基础上,结合实验研究,对基本基因表达式编程中的染色体生成、多样化种群、个体解码求值等关键技术提出了改进方法,给出了相异结构染色体生成、个体快速解码求值等算法,并进行了相关理论和实验分析,最后,基于这些改进思想给出了改进的基因表达式编程算法;(4)构建了有效的监督机器学习模型。本文基于改进的基因表达式编程算法构建了监督机器学习模型,用于解决分类和复杂函数关系发现问题。通过构造独特的适应值函数和交叉验证方法来获得改进的基因表达式编程算法的提前终止条件,提高了所构建的机器学习模型的噪声数据处理能力和泛化能力。Monk’s problems、乳腺X光片微钙化点检测、wine recognition、复杂函数关系发现、定量构效关系建模等相关实验结果进一步验证了此机器学习模型的有效性。

【Abstract】 Gene Expression Programming (GEP) is a new member of Evolutionary Algorithms, although it has been widely applied in many fields, there are not systematic and comprehensive theoretical research and mathematic analysis of the operation mechanism, and the improving of the key technologies lack of theoretical basis, which is very unfavorable to the research and development of gene expression programming and evolutionary algorithm. In addition, there are still many problems for the classical machine learning methods in solving classification and complex function finding problems in supervised machine learning field, such as difficult to understand, not very precise and lack of generalization.To bridge the gap, in the paper, we studied the theory of GEP thoroughly. By analyzing the operation mechanism of GEP, using a series of mathematical reasoning, we constructed a theory of GEP schema and presented a building block hypothesis. Through analysis of the convergence of GEP schema and further study of GEP schemas, we obtained the necessary conditions of schema theorem, which conducted to the design of adaptive genetic operators. Finally, some key technologies of GEP, such as the production of chromosome, the chromosome decoding and fitness evaluation had been studied, which will be the mainly part of the Revising Gene Expression Programming (RGEP).In the supervised machine learning, we applied the RGEP to construct the model of supervised machine learning, studied the fitness function and stopping criterion, finally, we provided the experimental results of two types of classification, multi-dimensional classification, and complex function finding which verified the validity of the RGEP model.The main contributions of the paper include:(1) The work proved that the GEP is effective as well as laid a theoretical foundation for further study of GEP. After analysis of genetic algorithms, evolution programming, evolution programming, genetic programming, gene expression programming, we presented a formal definition of GEP. The legitimacy of the individual coding was proposed and the gene schema and GEP schema were also provided. After a series of mathematical reasoning, we presented a GEP schema theorem and building block hypothesis, and proved that the convergence of GEP schema and proposed that elitist strategy of GEP can prevent the loss of the optimal schema. (2) Proposed an adaptive genetic operator algorithm. After studied the survival rate of the better individuals, we presented the requirement of schema theorem and the parameter limit of genetic operator, also, we proposed a adaptive genetic operator algorithm and conduct a experiment for adaptive mutation operator, which shown the algorithm is effective.(3) Proposed a revising gene expression programming. After a fundamental analysis of the GEP, we presented a different structure chromosome generation algorithm and individual quick decoding and evaluating algorithm to improve the population diversity and efficiency. Relevant analysis and experiment were made to verify the validity of the improving, later we proposed a revising gene expression programming based on these two algorithms.(4) Presented an efficient supervised machine learning model. We construct a supervised machine learning model for solving classification and function finding problems based on RGEP. The model used a unique fitness function and cross-validation method to obtain the conditions for early stopping criterion so as to improve the noise immunity and generalization ability. Finally, we verified the validity of the RGEP based model by experiments of monk’s problems, detection of micro-calcification in mammogram, wine recognition, complex function finding and quantitative structure activity relationship modeling.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2011年 08期
节点文献中: