节点文献
基于增强学习的计算机博弈策略的研究与实现
Research and Implementation of Computer Game Strategy Based on Reinforcement Learning
【作者】 宫瑞敏;
【导师】 吕艳辉;
【作者基本信息】 沈阳理工大学 , 计算机应用技术, 2011, 硕士
【摘要】 计算机博弈作为人工智能领域的一个重要分支,得到了极其快速的发展。计算机博弈是一个有关对策和斗智问题的研究领域,属于人工智能中的问题求解与搜索技术。博弈的核心思想实际上就是对博弈树节点的估值过程和对博弈树搜索过程的结合。估值是各种博弈问题中最难以处理的一个问题,局面估值的准确性在很大程度上决定了博弈程序的棋力高低。本文基于增强学习,研究了计算机博弈中的一些关键技术。针对静态估值函数依赖人类棋类知识水平和评估不够准确的问题,将TD(λ)算法与BP神经元网络相结合,即BP-TD(λ)算法。该算法使用BP神经元网络作为局面的估值函数,利用TD(λ)算法直接从原始经验中学习,自动调整估值函数的参数,将BP神经元网络的有监督学习转换为无监督学习,避免了神经网络在有监督学习下调整参数值容易受人类经验影响的缺陷。为了更好地提高博弈训练的性能,针对开局和中局,提出分阶段设置参数值的策略。设置开局阶段的参数值时,着法选择使用的是随机的着法选择策略;设置中局阶段的参数值时,着法选择使用的是极大极小的选择策略。采用以上的方法和策略,以五子棋为模型,实现了基于增强学习的五子棋博弈系统TDRenju,通过对估值部分的改进和增强,提高了棋力。
【Abstract】 As an important branch of artificial intelligence, Computer game has been got extremely rapid development. Computer game is a battle of wits on strategies and research issues. It belongs to problem solving and search technology in artificial intelligence. The core idea of game is actually the combination of evaluation process of the game tree node and game-tree search process. Evaluation is one of the most difficult problems to tackle in game playing. The accuracy of evaluation usually determines the discretion of Game.In this paper, the key technology of game and the relevant principles of Reinforcement Learning were studied. The static evaluation function dependent on human chess knowledge and assessment is inaccurate. Aiming at this problem, BP-TD(λ) algorithm is put forward which combining TD(λ) algorithm with BP neural network. Using BP neural network as the evaluation function of the situation, TD(λ) algorithm can adjust the weights of BP neural network automatically by learning directly from the original experience. The supervised learning of BP neural network is converted into unsupervised learning. The BP neural network is easy to affected by the human experience when adjust parameter values by supervised learning. This algorithm that learning unsupervised can avoid this defect. In order to put training performance into better play, the paper also proposed the strategy of setting parameter values in stages for the opening and the middlegame. When using opening parameter values we choose the method of random selection strategy and When using middlegame parameter values we choose the method of minimax selection strategy.Taking the above-mentioned method and strategy and choosing Renju Game as a model, TDRenju that Renju Game based on reinforcement learning system is implemented. Through the improvements and enhancements of evaluation function, thinking depth is increased.
【Key words】 Game; TD(λ) Algorithm; Evaluation Function; BP Neural Network; Renju;