节点文献

确定性退火技术在数据挖掘中的应用

The Application of the Deterministic Annealing Technique in Data Mining

【作者】 黄名

【导师】 曹长修;

【作者基本信息】 重庆大学 , 控制理论与控制工程, 2003, 硕士

【摘要】 近年来,数据挖掘引起了信息产业界的极大关注,其主要原因是存在大量数据,可以被广泛使用,并且迫切需要将这些数据转换成有用的信息和知识。数据挖掘主要应用模块有:分类模式;聚类模式;估计模式;预测模式。本文主要是针对其中的分类、聚类和预测模式进行研究。数据挖掘的能力大小,取决于挖掘工具的效能。目前,能够用于解决机器学习问题的方法主要有三种类型,即:模糊规则的学习方法、神经网络学习方法和遗传进化的学习方法。而神经网络具有一些独特的优点,如非线性映射、容错能力等,因而相对于其它方法,神经网络在数据挖掘中具有更大的前景,但神经网络也存在一些问题,比较典型的包括在学习算法选择不当时,系统易陷于局部极优状态,网络的复杂度与泛化能力之间存在矛盾等等。为了从某种程度上解决以上出现的问题,这儿就提出了按自然法则计算方法——确定性退火技术,应用到数据挖掘中。确定性退火技术是根据退火过程,将求解优化问题的最优点转化为求一系列随温度变化的物理系统的自由能函数的极小,它能够使算法避开局部极小而得到全局极小。该算法利用了信息论中极大熵原理,使算法以较小的规模得到较好的最优解。确定性退火技术一方面是一退火过程,可用来求解全局最优解,另一方面又不同于模拟退火算法用Metropolis抽样准则来模拟系统的平衡态,而是在固定的温度下利用确定的优化方法求解自由能函数的极小,对系统的平衡态进行模拟。因此,利用确定性退火技术的求解方法要比模拟退火算法的速度要快。但确定性退火技术在不同问题的应用中,由于它物理系统建立的联系不同,因此确定性退火技术中自由能函数的选取也不同,从而试图给出自由能函数的一般选取方法将是十分困难。因而,本文结合这两种算法的优点,提出了将确定性退火技术和神经网络技术相结合的方法,而RBF神经网络的学习速度、网络计算速度优于其他常用的神经网络,用确定性退火技术进行聚类和使之来优化RBF神经网络参数的中心和网络宽度,使它有对初始值的选取不敏感的优点,且利用到概率问题,可以很好克服系统存在死节点问题,这样不仅保证每种方法各自的优点,同时使得改进后的方法具有了新的特点。从大量的仿真实验分析得到,该方法在聚类模型中,实用性很强,特别适用于大规模的数据处理。在预测模型中,使得预测模式结果更加准确。

【Abstract】 Data mining technology is drawing the increasing concern of information field in recent years. The main reason is that there are abundant data, which are very important in researching. They need to be changed into the useful information and knowledge. Main model applications of data mining are classification, clustering, prediction, estimation and so on. This paper mainly aims at the research of classification, clustering and prediction. The ability of data mining lies on the efficiency of mining instrument. At present, there are three kinds of methods of solving learning questions: fuzzy rule, neural networks and genetic algorithm. But neural networks have particular advantages, such as nonlinear mapping and error correcting capability. Relative to other algorithms it has a further prospect. But it isn’t ideal because it is fallible in solving down-to-earth questions. For example, it is easy for the system to fall into local optimization state when learning algorithm is unsuitable. Furthermore, there is a conflict between the complexity of network and generalization ability. Thus, the deterministic annealing technique of a new branch of physical computation, which can get over above-mentioned limitation, is applied to data mining.According to annealing process, the deterministic annealing technique transforms the optimum point to optimal solution into a series of the minimum of free energy function of a physical system, which varies with temperature. The deterministic annealing can make the algorithm avoid local minima and get global minima. The algorithm makes use of maximum entropy of information theory and gets optimal solution by a small-scale. On the one hand, the deterministic annealing technique is an annealing process and can be used to attain globally optimal solution. On the other hand, it is different from simulated annealing algorithm, which simulates the equilibrium state of system by use of sample rule in Metropolis. It simulates the equilibrium state of system by use of deterministic optimization methods to find the minimum of free energy function in a given temperature. It can be proved in theory that the global optimal solution is a continuous map to temperature when free energy function satisfies certain appropriate conditions. So the velocity of the deterministic annealing algorithm is faster than the simulated annealing algorithm. But the choosing of free energy function in the deterministic annealing varies with physical systems. It leads the difficulty in the giving of general searching measure for all free energy functions.<WP=6>In this paper, the deterministic annealing technique and RBF neural networks are combined in term of their advantages. The learning velocity and calculating velocity of RBF neural networks is superior to others. The deterministic annealing is applied to cluster and optimize the center of RBF neural networks parameter and networks width. It can remedy the problems of the sensitivity to initial value. At the same time, it can remedy the dead-node by probability. This not only ensures the advantages of every algorithm but also has new advantages by improving them. Simulation results show the validity of the method. It can be concluded from a variety of simulation results that the uniting of the deterministic annealing technique and RBF neural networks is very practicable, especially in large-scale data processing. As such the forecasting result is accurate in forecasting model.

  • 【网络出版投稿人】 重庆大学
  • 【网络出版年期】2004年 02期
  • 【分类号】TP311.13
  • 【被引频次】1
  • 【下载频次】167
节点文献中: 

本文链接的文献网络图示:

本文的引文网络