节点文献

MapReduce与量子进化算法的研究及应用

Research and Application of Quantum Evolutionary Algorithm and Mapreduce

【作者】 刘范范

【导师】 贾瑞玉;

【作者基本信息】 安徽大学 , 计算机应用技术, 2012, 硕士

【摘要】 21世纪是一个信息化的时代,信息以及数据快速增长,这对计算能力提出了更高的要求,云计算在此环境下应运而生,它带来了新的变革。云计算是一种商业计算模型,它将计算任务分布在大量计算机构成的资源池上,使各种应用系统能够根据需要获取计算力、存储空间和信息服务。云计算是分布式计算、网格计算和并行计算的进一步发展,提供了一种更为有效的并行模型,因此如何将现有的并行算法应用于云计算中成为研究的主要内容。编写云平台下的并行化程序不同于以往单机环境下的并行程序,以往的并行化实现主要是基于多线程,且局限于单机内。而云计算环境下的并行化注重多机间,甚至是机器集群间的并行化,而且云环境的搭建都是基于普通的计算机。它主要是将大任务划分为多个小任务,然后分配给计算机集群来执行的,这大大降低了成本。数据挖掘领域常常受到海量数据的困扰,如果将云计算引入数据挖掘领域,必然会带来一场新的变革。MapReduce模型是谷歌在2004年提出的,是其开发的在超大集群下进行海量数据计算的一种编程模式,主要被用来处理信息量大且需要在可接受时间内完成的任务。目前MapReduce模型已被用来解决数据挖掘与机器学习中的一些问题了。量子进化算法近年来也开始受到广大研究者的关注,量子进化算法具有天然的并行性,非常适合在大规模并行平台上实现,而云平台为量子进化算法并行性的实现奠定了物质基础。覆盖算法是由张铃教授提出的可以解决数据挖掘中分类问题的算法,它是一种构造性的神经网络学习算法,采用的是M-P神经元的几何意义。覆盖算法实质上是将求出的覆盖领域作为三层神经网络的隐含层,输入层看做测试集,输出层看做测试集的分类结果。目前覆盖算法已得到了广泛的推广。本文利用量子进化算法的天然并行性及云平台的优越性,在云平台上实现了量子进化算法,结果显示,在云平台下该算法可以达到更好的并行效率。为了进一步研究量子进化算法的性能,本文结合量子进化算法种群规模小,收敛速度快,全局寻优性能强等特点,将其用于覆盖算法,优化覆盖中心,采用适应度来评价解的优劣,提出了一种改进的量子优化覆盖算法,利用五组数据进行分析对比,表明本文提出的改进算法可以有效地提高分类的精度和效率。最后,利用MapReduce模型的并行平台实现了对淘宝网数据信息的处理和检索。

【Abstract】 21st century is an era of information, information and data grow rapidly, this is a higher demand on our computing power, cloud computing came into being in this environment, It has brought us a new change. Cloud computing is a commercial calculation model, in this model, the computing tasks are distributed in a pool of a large number of computer resources, and it makes all kinds of application system can acquire computing power, storage space and information services according to the need. Cloud computing is the further development of distributed computing, grid computing and parallel computing, it has given us a more efficient parallel model, so how to put the existing parallel algorithms used in cloud computing become the content of our study.Write parallel programs on the cloud platform, which is different from the traditional parallel procedures, traditional parallel realization is mainly based on multi-threaded, and limited in the single machine inside. Parallel thought on cloud platform focuses on multiple computers, even the computer clusters, and the construction of the cloud environment is based on the ordinary computer. The main idea is that dividing a big task into many small tasks which are carried out on computer clusters, which will greatly reduce the cost. The field of data mining is often plagued by huge amounts of data, if introduce the cloud computing to the field of data mining, is bound to bring a new change.MapReduce is put forward in2004by Google, it is a programming model for the mass data calculation under the large cluster, and it is mainly used to handle large amount of information and complete the task within an acceptable time. Now MapReduce model has been used to solve some problems in data mining and machine learning.In recent years, many researchers begin to pay more attention to quantum evolutionary algorithm, quantum evolutionary algorithm has natural parallelism, and suits the realization on large-scale parallel computer, and cloud platform for quantum evolutionary algorithm and the realization of the parallel established the material basis. Covering algorithm, a solution of classification problems in data mining algorithms, was originally proposed by Professor Zhang Ling, It is a constructive neural network learning algorithm, using the geometric meaning of the M-P neurons. The essence of Covering algorithm is that regard coverage areas as the hidden layer of the three-layer neural network, regard the test set as input layer, and regard the classification results of the test set as output layer. Currently Covering algorithm has been widely promoted.In this paper, taking advantage of the natural parallelism of the quantum evolutionary algorithm and the cloud platform, realized the parallel quantum evolutionary algorithm on cloud platform, and the results show that there will be a better parallel efficiency in a cloud platform. To further study the performance of quantum evolutionary algorithm, this paper takes advantages of it, for example: diversity features good, small population, fast convergence, strong global optimal performance and so on, introduces it into covering algorithm to optimize coverage center, adopts fitness to evaluate the pros and cons of the solution, and propose an improved quantum-optimal coverage algorithm. Through experiment and comparative analysis in five groups of data, its results show that the proposed algorithm can effectively improve the classification accuracy and efficiency. At last, using MapReduce model realized the processing and retrieval of taobao’s data information.

  • 【网络出版投稿人】 安徽大学
  • 【网络出版年期】2012年 10期
  • 【分类号】O413;TP311.13
  • 【被引频次】3
  • 【下载频次】228
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络