节点文献

计算机围棋中的算法研究

The Study of Algorithms in Computer Go

【作者】 岳鹏

【导师】 邱玉辉;

【作者基本信息】 西南大学 , 基础心理学, 2007, 博士

【摘要】 博弈是人工智能的重要研究主题,人工智能的发展在很大程度上得益于博弈研究的发展。作为博弈研究的主要内容之一,棋类博弈得到了满意的解决,唯一的例外的是围棋,目前最优秀的围棋程序的水平还不及人类初级棋手。由于围棋的搜索空间太大、计算机难于处理模糊概念且难于设计学习算法,造成了计算机围棋程序的棋力难于提高。围棋是检验人工智能发展水平的良好环境,如何提高围棋程序的棋力是人工智能领域的一大难题。同时,开发出与人类棋手水平相当的围棋程序也有助于对人类认知能力的理解。所以计算机围棋研究具有重要的理论意义和实用价值。我们首先介绍了国内外计算机围棋研究现状,包括基础算法、搜索算法和学习算法三方面并介绍了部分计算机围棋程序,认为计算机围棋的搜索算法主要有minmax算法、alphabeta算法、failsoft算法、negmax算法、negscout算法和mtdf算法等等,涉及到的学习算法和理论基础主要有组合博弈理论、数学形态学、蒙特卡罗算法、模糊学习、分治法、强化学习算法、遗传算法、神经网络、支持向量机、贝叶斯模式分类、基于解释的泛化和并行算法等等,指出了目前研究中存在的主要不足主要表现为局面表示法欠完善、中盘策略欠完整以及学习算法欠成熟。然后,我们简述了本研究的相关理论基础,包括数学形态学、有限状态机、线性模型、感知机与遗传算法。接着,我们阐明了本研究提出的棋手思维模型、基础算法、搜索算法、学习算法及相应实验结果。具体说来,我们完成的主要工作与创新点包括以下几个方面:一、提出了一个完整的棋手思维模型。这是在提出了领土领海和领空等地域概念、提出了局面的层次表示法、归纳并分类了大量围棋术语、提取了目标概念、建立了目标图、总结了若干目标选择原则和走步属性并分析了棋风概念的基础上完成的。这个模型的特点在于它的完整性和围棋术语的分类、目标选择原则与走步属性的全面性。二、设计了基于数学形态学的局面层次表示法、棋群聚类算法和地域划分算法。这些有统一理论基础的算法计算简单,实验结果表明其效果良好。利用已有的数学形态学理论可以设计更多有意义的启发式策略。三、设计了PEMIS模式编码方法。它基于模式的邻近特征、行列特征和轮廓特征进行编码,其突出优点是与模式的黑白对称性、旋转与翻转对称性以及平移对称性均无关,实验结果表明这种模式编码方法性能良好。在基础算法方面,我们还设计了一种走步增量算法。四、设计了复合目标搜索算法。我们认为复合目标可看作是由“与”或“或”关系构成的单一目标的二维向量。复合目标搜索算法的优点是其调用的基本函数可由单一目标搜索算法的基本函数合成。我们还比较了经典搜索算法的性能。五、设计了PEMIS模式库与定式库学习算法。实验结果表明了其有效性,最终学习到的模式库与定式库占用的空间比较小。另外,还设计了ZOBRIST定式库学习算法,实验结果也表明了其有效性。在学习算法方面,我们还设计了棋形与气术语的示教学习算法和棋风模型的遗传学习算法。六、开发了以此棋手思维模型为核心的计算机围棋程序ShoutGo,实现了上述各算法。ShoutGo认为棋手拥有模式库和定式库,有各自的棋风;棋手在完成棋群聚类和地域划分后,在目标选择原则的指引下以对方最后所下之子为焦点进行目标猜测,同样在目标选择原则及棋风的指引下生成特定目标,继而以目标为导向在各自的模式库和定式库推荐走步的作用下进行搜索发现走步,再根据走步属性选取特定走步,如果目标不成功或无可行走步,则重新进行地域划分或根据其它决策原则生成其它目标,直到发现合适走步;在这一过程中,模式库和定式库影响了走步的推荐,棋风影响了目标之间的跳转。最后,我们探讨了棋手思维模型的评价、走步增量算法与走步扫描算法的关系、数学形态学方法在基础算法中的应用、劫与共活现象对搜索的影响、搜索树特点与心理因素的关系、搜索时间估计、局面评价函数、目标搜索的可学习性以及棋风建模等问题,并探讨了机器学习方法在计算机围棋中的应用可能性,提出了进一步的研究计划。计算机围棋研究作为人工智能领域的一个分支,与心理学有着天然联系。我们在研究过程中,特别强调以人类棋手为本的原则,力求棋手思维模型与人类棋手真实思维过程的高度契合,力求其学习算法的完善。我们今后的研究重点将在学习算法上,能象人类棋手一样地不断地学习,计算机围棋才有希望。

【Abstract】 Game is one of the important subjects of Artificial Intelligence (AI) and AI’s development contributes to the development of the study of game greatly. As one of the main subjects of game, the board game has been studied thoroughly, except for Go. The level of the most excellent Go programs is lower than the level of the primary human player at present. Because of the too large search space in Computer Go, computer is hard to get hold of the fuzzy concept of Go. Therefore, the level of Computer Go is hard to rise. Go is a nicer environment to inspect the development level of AI. It is a puzzle in Artificial Intelligence about how to improve the capability in the Computer Go programs. At the same time, it is helpful to comprehend the human cognition that develop the Computer Go programs, whose level is equal to human’s. Therefore, the study of Computer go is important both in theory and in practice.At first, we review the study of Computer Go in the world from the aspects of basis algorithm, search algorithm and learning algorithm, and then we introduce some programs of Computer Go. Search algorithms in Computer Go chiefly include minmax algorithm、alphabeta algorithm、failsoft algorithm、negamax algorithm、negascout algorithm and mtdf algorithm. Learning algorithms and theories in Computer Go involve combinatorial games theory, mathematic morphology, Monte-Carlo algorithm, fuzzy learning, divide and conquer, reinforcement learning, genetic algorithm, neural network, support vector machine, bayesian pattern classification, generalization by explaining, parallel algorithm, and so on. At present, the mainly defects rest with the deficient method in representation of position, incomplete tactics of middle game, and immature learning algorithm. Subsequently, we introduce the relative theory in this study, including mathematics morphology, finite state machine, linear model, perception, and genetic algorithm.After that, we introduce the player’s thinking model, basis algorithm, search algorithm and learning algorithm in this study, and the result of experiments of these algorithms. In detail, the major contributions of this study are:1. Put forward an integrated player’s thinking model. We do this after putting forward the method of erarchical representation of position, generalizing and classifying plenty of Go terms, extracting target notion, setting up target graph, summarizing some target choices principle and move attributes, and analyzing the notion of player’s style. The model’s characteristics are comprised of integration, and completeness of the classification of Go terms, target choice principles and move attributes.2. Design an erarchical representation of position, Go cluster algorithm and area partitions based on mathematic morphology. These algorithms based on united theory are simple and have nicer effect tested by the experiment. More meaningful heuristic tactics will be introduced by utilizing the existing mathematic morphology methods.3. Design a pattern encoding method independent of symmetry (PEMIS). Its advantages are encoding based on adjacent character, queue character and figure character of pattern, while irrelevant to black-white symmetry, rotation and transposition symmetry, and translation symmetry of the pattern. The experiment shows that the method works well. In addition, we design a move increment algorithm as a basis algorithm.4. Design a compound target search algorithm. We think that a compound target is composed of simplex targets in two dimensions by AND or OR logic. Its advantages lie on that its basic function may be constituted of the basic functions of simplex target search algorithm. Furthermore, we compare the performances of classic search algorithms.5. Design a pattern/joseki library learning algorithm by PEMIS. The experiment shows that it work well, and the learned library occupied in a small storage area. We design a joseki library learning algorithm by ZOBRIST too, and it works well. Moreover, we design a Go shape and liberty term learning algorithm by teaching, and a player’s style learning algorithm by genetic algorithm.6. Develop a Computer Go program named ShoutGo whose core is the player’s thinking model, and put these algorithms into practice. ShoutGo deems that a human player own a pattern/joseki library and respective style. After making Go cluster and partitioning the area, a player can proceed to do target guess, focusing on the last stone of the opponent by target choice principle. Likewise, led by target choice principle and respective style, a player can generate new targets, and then piloted by the target searching move with the effect of recommendable move put by respective pattern library/joseki library, a player can select a special move based on move attributes. If the target is unsuccessful, or there is no feasible move, we need to partition areas again or generate other targets based on other decision principles, until finding suitable move. During this course, the pattern/joseki library will affect the recommendation of move, while the player’s style model affects the switch of targets.Finally, we discuss the evaluation of the human thinking models, the relation between move increment algorithm and move scanning algorithm, the application of mathematics morphology methods in the basis algorithm, the effect on searching by ko and seki phenomenon, the relationship between characters of searching trees and psychology factors, estimation of searching time, position evaluation function, feasibility of the target search and player’s style model building etc.; at the same time, we discuss the application possibility about the method of the machine learning in Computer Go, and put study plan forward.As a branch of AI, Computer Go is related with psychology naturally. In the study, we emphasize the principle of taking human players as model especially, and the identity with human players’ thinking, and the perfection of its learning algorithm. Our future work will focus on the learning algorithm. Only when the Computer Go program can learn something just like a human player, Computer Go is hopeful.

  • 【网络出版投稿人】 西南大学
  • 【网络出版年期】2007年 05期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络