节点文献

基于强化学习的互联电网CPS指令动态优化分配算法

Reinforcement Learning Algorithm Based CPS Order Dynamic Optimal Dispatch Methodology for Interconnected Power Systems

【作者】 王宇名

【导师】 刘前进;

【作者基本信息】 华南理工大学 , 电力系统及其自动化, 2010, 硕士

【摘要】 互联电网调度端的自动发电控制(Automatic Generation Control, AGC)功率指令到各类型AGC机组的动态优化分配是随机最优问题。1997年北美电力可靠性委员会(NERC)正式推出联络线功率与系统频率偏差(TBC)模式下AGC的最新控制性能标准(CPS)。CPS标准更注重AGC系统的中长期收益,改变了传统AGC的控制思想,如何设计适应CPS标准下AGC系统功率指令(简称CPS指令)的动态优化分配策略成为一个全新的理论研究课题。首先,本文概述了CPS指令动态优化分配的基本原理,介绍了国内外在AGC经济调试及机组组合问题的研究现状,研究CPS考核指标的数据统计特性及优化控制目标。在分析CPS指令动态分配问题特点及控制目标的基础上,指出CPS标准下的AGC系统可看作“不确定的随机系统”,数学模型以高斯—马尔可夫随机过程建模,动态负荷分配问题可理解为一个离散时间马尔可夫决策过程(Discrete Time Markov Decision Processes, DTMDP).从而将强化学习理论中基于随机最优控制技术的Q-学习方法引入CPS指令优化分配策略的研究。其次,以标准两区域互联系统和广东电网的负荷频率控制(LFC)模型为研究对象,系统地应用单步Q-学习和多步回溯Q(λ)学习方法进行详细仿真比较分。根据优化目标的差异,设计不同的奖励函数,并将其引入到算法当中,有效结合水、火电机组的调节特性,并考虑水电机组的调节裕度,提高AGC系统调节能力。统计性仿真比较试验显示引入Q-学习方法能有效实现分配策略的在线自学习和动态优化决策,增强了AGC系统的鲁棒性和适应性且提高了CPS考核合格率。据国内外查新显示,迄今尚未有任何基于马尔可夫决策理论的优化和控制方法在CPS指令动态优化分配领域中出现。最后,本文指出基于经典强化学习方法的CPS指令分配算法不可避免面临“维数灾难”问题,提出应用分层强化学习的方法,将全网机组按调频时延做初次分类,CPS指令逐层分配形成任务分层结构。在分层Q学习算法层与层之间引入一个时变协调因子,改进的分层Q学习算法有效提高原算法收敛速度。奖励函数中设计不同的权值线性组合,展示保守及乐观控制下系统CPS控制水平和调节成本的变化关系。南方电网统计性仿真研究显示,改进分层Q学习较分层Q学习算法平均收敛时间缩短60%,在复杂随机扰动的环境中改进算法能够有效提高系统CPS考核合格率和降低调节成本4%。本论文的研究得到国家自然科学基金面上项目“CPS标准下AGC最优松弛控制及其马尔可夫决策过程(50807016)”、广东省自然科学基金项目“非马氏环境下随机最优松驰发电控制及其半马氏决策过程”(9151064101000049)及中央高校基本科研业务费专项资金“基于分层半马氏决策过程的智能电网负荷频率控制的自学习和自演化理论”的资助(No.2009ZM0251)。

【Abstract】 The dynamic optimization of automatic generation control (AGC) generating command dispatch is a stochastic optimization problem for the interconnected power system. The North American Electric Reliability Council (NERC) formally released new Control Performance Standards (CPS) for AGC of interconnected power systems under Tie-lines Bias Control (TBC) mode in 1997. CPS standard pays more attention to the medium and long-run returns of AGC performance and change the traditional AGC control philosophy, and how to design the CPS order dynamic optimal dispatch strategy under CPS standards has become a fire-new topic of theoretical research.Firstly, the paper states the CPS order dynamic optimal dispatch principle briefly, and introduces the background, current research status of CPS order dispatch at home and abroad. This paper also gives the mathematical analysis of NARI’s CPS control rules. On the basis of in-depth study of CPS order dispatch characteristics and the optimal control objective, the paper suggests that the NERC’s CPS based AGC system is a stochastic multistage decision process, and the dispatch problem should be suitably modeled as a reinforcement learning (RL) problem based on Discrete Time Markov Decision Process (DTMDP) theory, and Q-learning method based optimal stochastic control techniques is introduced into the domain of CPS order dispatch for its solution.Secondly, by applying the Matlab/Simulink and DTMDP simulation modeling, the load frequency control (LFC) models of two-area power system and Guangdong power grid are taken as examples for detailed comparison and analysis three Q-learning based CPS order dispatch algorithm. Reward functions in Q-learning are designed based on different optimization objectives. Thermal and hydro units are integrated, with the regulating margin for hydro units being considered, to improve the regulating performance of the AGC system. The multi-step optimization Q(λ) method with the backtracking function is also employed to overcome the problem of long control time-delay in the AGC control loop. The statistical experiment results show the proposed dispatch methodology with online self-learning technique and dynamic optimization capability can obviously enhance the robustness and adaptability of AGC systems while the CPS compliances are ensured. Finally, this paper presents an improved hierarchical reinforcement learning (HRL) algorithm to solve the curse of dimensionality problem in the multi-objective dynamic optimization of CPS order dispatch. The CPS order dispatch task is decomposed into several subtasks by classifying the AGC committed units according to their response time delay of power regulating. A time-varying coordination factor is introduced between layers of HRL to speed up the algorithm by 60%. Numbers of linear combination of weights in reward function are designed to optimize hydro capacity margin and AGC production cost. The application of improved hierarchical Q-learning in the China southern power grid model shows that the proposed method can enhance the performance of AGC systems in CPS assessment and save AGC regulating cost over 4%, compare with the hierarchical Q-learning and genetic algorithm.This paper is supported by National Natural Science Fund of China "AGC Optimal Relaxed Control and its Markov Decision Process based on Control Performance Standards" (50807016), Guangdong Natural Science Foundation Project (9151064101000049) and the Fundamental Research Funds for the Central Universities (No.2009ZM0251).

节点文献中: 

本文链接的文献网络图示:

本文的引文网络