节点文献

差分演化算法的改进及其在聚类分析中的应用研究

Differential Evolution Algorithm and Its Application in Clustering Analysis

【作者】 龚文引

【导师】 蔡之华; Charles Ling;

【作者基本信息】 中国地质大学 , 地学信息工程, 2010, 博士

【摘要】 在工程设计、运筹学、生物医学、数据挖掘等领域经常遇到优化问题。如何设计有效的算法求解优化问题,一直是研究的热点。近年来,利用演化算法求解优化问题得到越来越多的关注。由于演化算法具有自适应、自组织、隐并行性等特点,以及对所求解问题不需要连续、可导、单峰等要求,因此利用演化算法求解复杂的优化问题是可行的,且具有普适性。差分演化算法是演化算法的一种,它是一种简单、高效、快速的基于群体的全局优化算法,具有结构简单、容易实现、鲁棒性强等特点。尽管该算法已经在很多领域取得了成功的应用,但也存在一些不足之处,主要有:1)缺乏局部搜索能力,使算法在演化后期收敛速度变慢,从而不能满足算法在较少适应值函数评价次数下快速收敛到问题最优解的要求;2)传统差分演化算法对缩放因子F和杂交概率CR设置是很敏感的,对不同的问题需要选择不同的F和CR值;3)在差分演化算法中,虽然提出了多种变异策略,但是,不同策略具有不同的性能,适合求解不同的问题,因此如何根据不同问题选择最优的变异策略具有一定的困难。聚类分析是数据挖掘中一种重要的数据分析方法,它是一种无指导分类算法。它把无标记的数据集通过一定的相似性评价标准分成很多组(簇),即相似性高的个体被分在一组,而不相似的个体则分在不同组。大多数聚类分析算法都可以转化成一个多峰优化问题。从优化的观点来看,聚类问题是一个NP-难问题。由于聚类问题的复杂性,因此,大多数传统的聚类分析算法都是局部最优算法。为了弥补传统聚类算法的不足,把演化算法与传统聚类算法相结合,形成的演化聚类分析算法能增强原有聚类算法的有效性。但是,目前已有的演化聚类算法存在以下两个缺点:1)由于演化算法中存在很多控制参数,而大部分参数对不同数据集是敏感的,因此影响了算法的通用性;2)演化聚类算法的运行时间远大于传统聚类算法。本文在综述了当前差分演化算法的改进以及应用的基础上,针对其存在的不足进行了改进研究,并把差分演化算法应用于多目标优化问题和聚类问题的求解中。本文主要工作概括如下:(1)针对差分演化算法存在利用能力不足的缺点,将其与生物地理学优化(BBO)算法相结合,提出了混合的差分演化算法,DE/BBO算法。算法中设计了一种通用的混合移动算子,该算子把BBO算法的移动算子与差分变异算子相混合,既能利用差分变异算子对搜索空间进行开采(exploration),对新空间进行搜索,增加算法找出全局最优解的可能性,从而增强了算法的鲁棒性;又能结合BBO算法移动算子利用(exploitation)能力强的特点,对当前群体的信息进行有效利用,进而加快算法的收敛速度。基于该算子的DE/BBO算法能有效平衡算法的开采能力和利用能力。通过标准测试函数对算法进行测试,并与其他改进的差分演化算法进行比较,验证了DE/BBO算法的优越性。此外,还对群体大小、问题维数、不同变异策略和自适应参数控制对DE/BBO和DE算法的影响进行了研究,实验结果进一步表明了DE/BBO算法在解的求解精度和收敛速度上优于传统DE算法。(2)针对差分演化算法对不同问题进行策略选择困难的缺点,提出了一种简单、通用的个体策略产生方法,并在此基础之上设计了三种自适应多策略选择技术,从而实现算法对策略库中策略的自适应选取,增强算法的性能。把所提出的自适应多策略选择技术与JADE算法相结合,应用于函数优化问题的求解中,取得了很好的效果。同时,对算法的复杂度和简单性进行了分析。通过实验验证了SaJADE算法的优越性能,表明SaJADE算法对策略选取具有自适应的特征,且该算法增强了JADE算法的性能。(3)为了能使差分演化算法有效求解多目标优化问题,提出了一种基于ε占优的正交多目标快速差分演化算法,ε-ODEMO算法。算法采用正交实验设计方法来产生初始群体,从而使群体中的个体能在决策变量空间上均匀分布,增加找出较好个体的可能性,而这些好的个体可以在演化过程中得到利用,并加快算法的收敛。利用Archive群体来保存演化过程中找出的非劣解,使得算法始终可以保存精英解。为了对Archive群体进行有效更新,保证非劣解集的收敛性和分布的多样性,采用基于ε占优技术的群体更新方法。提出了一种混合选择策略,对差分变异算子中的基向量采用混合选择机制选取。该混合选择策略在演化初期强调算法对搜索空间的开采,寻找新的搜索空间,而在演化中后期则强调对Archive群体中个体信息的利用,加速算法的收敛。通过多个2目标和3目标标准测试函数对算法的有效性进行了验证;同时,对混合选择策略的控制参数和杂交概率进行了初步实验分析。(4)在ε-ODEMO算法的基础上,提出了一种适合求解约束多目标工程优化问题的paε-ODEMO算法。该算法在保留了ε-ODEMO算法优点的基础上,还具有以下特征:1)采用自适应ε占优技术代替ε-ODEMO算法中原有的ε占优技术,从而弥补了传统ε占优技术容易丢失一些重要的非劣解的缺点;2)采用基于约束Pareto占优的约束函数处理技术,增强算法处理带约束的多目标问题的能力。paε-ODEMO算法在约束多目标标准测试函数和工程优化问题中进行测试,并通过与NSGA-Ⅱ算法进行比较,验证了改进算法处理带约束多目标问题的有效性。(5)针对当前演化聚类算法的不足,把简单、高效的差分演化算法用于聚类分析中,提出差分演化聚类分析算法,DEPS-C算法。该算法采用基于点对称距离作为相似性准则来对数据点进行分配,同时考虑了对称的封闭性,以增强对原有点对称距离标准的鲁棒性。采用基于Kd树的最邻近点搜索方法来寻找对称点,从而减少了算法的复杂度;并设计了适合聚类分析的演化群体中个体的表示方法,并对演化中错误个体进行修正。通过标准测试数据集和UCI数据集对算法性能进行了测试。此外,对差分演化算法杂交概率CR的不同取值对算法性能的影响进行了研究,实验结果表明:对于具有对称性的数据集,DEPS-C算法能对其进行正确的聚类划分;且该算法控制参数少,对杂交概率CR的不同取值不敏感,从而增加了该算法的实用性和通用性。(6)尽管DEPS-C算法具有很好的性能,但该算法在对给定数据集进行聚类之前需要用户给定簇的准确个数,这在很多实际问题中是很难的。为此,在DEPS-C算法基础上提出了自动差分演化聚类算法(ACDEPS算法),使算法能通过演化自动确定簇的个数,并得到相应最优的聚类划分。在ACDEPS算法中,采用了有效的个体表示以满足差分演化算法进行自动聚类的需求,同时提出了改进的基于点对称距离的聚类有效性验证索引(CVI)Sym’-index,利用对个体所包含簇个数进行动态惩罚来避免算法在演化初期找出过多的簇数目,从而影响算法的收敛。通过实验验证了:1)采用Sym’-index的ACDEPS算法对于具有对称性特征的数据集能自动确定簇的正确个数,且能找到其对应的最优聚类划分;2)改进的基于点对称的CVI标准Sym’-index在ACDEPS算法中能增强原始Sym-index的性能;3)基于差分演化算法的聚类算法性能优于基于遗传算法的聚类算法。综上所述,本文在分析了基本差分演化算法和已有改进差分演化算法的基础上,指出了差分演化算法的不足;同时,对当前演化聚类算法的关键技术进行了综述,分析了当前演化聚类算法的缺点。在此基础之上,重点研究了基于生物地理学优化算法的混合差分演化算法、自适应多策略差分演化算法、基于ε占优技术的正交多目标快速差分演化算法、改进的正交多目标差分演化算法在工程优化中的应用、给定K值的差分演化聚类算法和自动差分演化聚类分析算法等。在研究中通过实验的方法证明了各算法的有效性。

【Abstract】 Global optimization is a common problem in almost every field, such as engineer design, operational research, bio-medical science, data mining, and so on. How to design an efficient algorithm to solve this problem is essential in science and real applications. Recently, using Evolutionary Algorithms (EAs) to solve the global optimization problem is more popular. Since EAs are self-adaptive, self-organized, parallel, using EAs for global optimization is reasonable and available.Differential evolution (DE), which is an evolutionary algorithm, is a simple, fast, and efficient population-based direct search algorithm for global optimization. The DE algorithm has been widely used in many fields. Among its advantages are its simple structure, ease of use, robustness, and fast convergence. However, the original DE algorithm has some pitfalls:1) DE is good at exploring the search space and locating the region of global minimum, but it is slow at the exploitation of the solution; 2) The parameters of DE are problem dependent and the choice of them is often critical for the performance of DE; 3) There are many strategies in the DE literature, however, choosing the best among different mutation strategies available for DE is also not easy for a specific problem.Clustering analysis is an important technique for the data analysis in data mining and machine learning. Clustering is the unsupervised classification of objects (patterns) into different groups, or more precisely, the partitioning of a dataset into subsets (clusters), so that the data in the same clusters is similar and the data in different clusters is dissimilar according to some defined distance measure. Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis, and bioinformatics. Most of the clustering algorithm can be converted into a global optimization problem. From the optimization point of view, clustering is an NP-hard problem. Due to the complexity of clustering, most of the traditional clustering algorithm is a local optimal algorithm. In order to solve the clustering problem more effectively, one possible way is the hybridization of EAs with the traditional clustering algorithm, so that the evolutionary clustering algorithm is able to enhance the robustness and efficiency of the traditional clustering algorithm. However, there are two main drawbacks in the evolutionary clustering algorithm:1) In EAs, there are many control parameters, such as crossover probability and mutation probability. These parameters are problem-dependent and sensitive. Hence, this drawback may degrade the commonality of the evolutionary clustering algorithm.2) The execution time of EAs is significantly higher than that of other traditional clustering algorithms (e.g. K-means and FCM), especially when applied to large datasets.In this dissertation, firstly, I described the advances of the DE algorithm. Then, to remedy some of DE’s pitfalls, I proposed two variants of DE. Thirdly, theε-domination based orthogonal DE algorithm was proposed for the multi-objective optimization problems and the engineering design problems. Finally, the DE algorithm based on the point symmetry metric was proposed for the clustering problem. The main contributions of this dissertation are as follows.(1) Biogeography-Based Optimization (BBO) is a new biogeography inspired algorithm. It mainly uses the biogeography-based migration operator to share the information among solutions. To enhance the exploitation ability of DE, a hybrid DE with BBO, namely DE/BBO, is proposed for the global numerical optimization problem. DE/BBO combines the exploration of DE with the exploitation of BBO effectively, and hence it can generate the promising candidate solutions. To verify the performance of the proposed DE/BBO approach,23 benchmark functions with a wide range of dimensions and diverse complexities are employed. Experimental results indicate that the proposed approach is effective and efficient. Compared with other state-of-the-art DE approaches, DE/BBO performs better, or at least comparably, in terms of the quality of the final solutions and the convergence rate. In addition, the influence of the population size, dimensionality, different mutation schemes, and the self-adaptive control parameters of DE are also studied.(2) The fusion of multi-methods is an efficient technique to enhance the performance of the single algorithm. As the above-mentioned, the choice of the best mutation strategy is difficult for a specific problem. To alleviate this drawback and enhance the performance of DE, a family of improved DE that attempts to adaptively choose the suitable strategies for different problems is presented. In addition, in the proposed strategy adaptation mechanism (SaM) different parameter adaptation methods of DE can be used for different strategies. In order to test the efficiency of our approach, the proposed SaM is combined with JADE, which is a recently proposed DE variant, for numerical optimization. Twenty widely used scalable benchmark problems are chosen from the literature as the test suit. Experimental results verify the expectation that SaM is able to adaptively choose the suitable strategy for a specific problem without any prior knowledge. Compared with other state-of-the-art DE variants, the proposed approach performs better, or at least comparably, in terms of the quality of the final solutions and the convergence rate.(3) Evolutionary multi-objective optimization (EMO) has become a very popular topic in the last few years. However, to design an efficient and effective EMO algorithm to find the near-optimal and near-complete Pareto front is a challenging task. A novel differential evolution algorithm,ε-ODEMO, is proposed to solve multi-objective optimization problems (MOPs) efficiently. The proposed approach uses an Archive population to retain the obtained non-dominated solutions; also it adopts the orthogonal design method with quantization technique to generate an initial population of points that are scattered uniformly over the feasible solution space, so that the algorithm can evenly scan the feasible solution space once to locate good points for further exploration in subsequent iterations. Moreover, it is based on theε-dominance concept to obtain a good distribution of Pareto-optimal solutions and gets them in a small computational time. To make the algorithm converge faster, the new approach employs a hybrid selection mechanism in which a random selection and an elitist selection are interleaved. Experiments on eight benchmark problems of diverse complexities show that the proposed approach is able to obtain a good distribution in all cases. Compared with several other state-of-the-art evolutionary algorithms, it achieves not only comparable results in terms of convergence and diversity metrics, but also a considerable reduction of the computational effort. Furthermore, the discussion of the influence of different CR value and the parameter value of hybrid selection mechanism to the performance of the algorithm is conducted experimentally.(4) Solving engineering design and resources optimization via multi-objective evolutionary algorithms (MOEAs) has attracted much attention in the last few years. An efficient multi-objective differential evolution algorithm is presented for engineering design. The approach is the extension of s-ODEMO; and it has some modifications:1) An archive (or secondary population) is employed to keep the non-dominated solutions found and it is updated by a new relaxed form of Pareto dominance, called Pareto-adaptiveε-dominance (paε-dominance), at each generation.2) To handle the constraints, a new constraint-handling method is employed, which does not need any parameters to be tuned for constraint handling. The proposed approach is tested on four benchmark-constrained problems to illustrate the capabilities of the algorithm in handling mathematically complex problems. Furthermore, four well-studied engineering design optimization problems are solved to illustrate the efficiency and applicability of the algorithm for multi-objective design optimization. Compared with NSGA-II, one of the best MOEAs available at present, the results demonstrate that my approach is found to be statistically competitive. Moreover, the proposed approach is very efficient and is capable of yielding a wide spread of solutions with good coverage and convergence to true Pareto-optimal fronts.(5) To remedy some of the drawbacks of the evolutionary clustering algorithm, a DE-based clustering approach (DEPS-C), which combines several features of previous clustering techniques in a unique manner, is presented. It is characterized by:(a) employing a recently proposed point symmetry-based distance measure as the similarity measure, (b) incorporating the closure property when assigning a data point to a cluster, and (c) using the Kd-tree based nearest neighbor search to reduce the complexity of finding the closest symmetric point. Experiments have been conducted on 10 artificial data sets and 8 real-life data sets of diverse complexities. The results indicate that DEPS-C is suitable for both the symmetrical intra-clusters and the symmetrical inter-clusters. Compared with GAPS, a recently proposed genetic algorithm with point symmetry distance based clustering algorithm, the proposed approach performs better, or at least comparably, in terms of the quality and stability of the final solutions. Moreover, DEPS-C is faster than GAPS with respect to the execution time.(6) Based on the DEPS-C algorithm, an automatic clustering DE technique (ACDEPS) for the clustering problem is proposed. This approach can be characterized by:(ⅰ) proposing a modified point symmetry-based cluster validity index (CVI) as a measure of the validity of the corresponding partitioning, (ⅱ) using the Kd-tree based nearest neighbor search to reduce the complexity of finding the closest symmetric point, and (ⅲ) employing a new representation to represent an individual. Experiments have been conducted on 6 artificial data sets of diverse complexities. The results demonstrate that ACDEPS is suitable for both the symmetrical intra-clusters and the symmetrical inter-clusters. In addition, ACDEPS is able to find the optimal number of clusters of the dataset. Furthermore, based on the comparison with the original point symmetry-based CVI, the modified point symmetry-based CVI shows better performance in terms of the F-measure and the number of clusters found.In summery, based on the analysis of the original DE algorithm and its variants, this dissertation pointed out the drawbacks of DE. Meanwhile, I described the key techniques of the evolutionary clustering algorithms and analyzed their pitfalls. Based on the above-mentioned analysis, this dissertation is mainly studied:the BBO-based hybrid DE algorithm, DE with the adaptive multi-strategies selection,ε-domination based orthogonal DE for MOPs, paε-domination based constrainedε-ODEMO for engineering design, DE-based clustering algorithm with predefined K, and automatic DE-based clustering algorithm. In addition, for each improved algorithms, the experimental study was conducted to validate its performance.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络