节点文献

认知无线电中智能学习技术研究

Research on Intelligent Learning in Cognitive Radio

【作者】 伍春

【导师】 于全; 易克初;

【作者基本信息】 西安电子科技大学 , 军事通信学, 2014, 博士

【摘要】 在当今世界经济和社会发展信息化的潮流中,无线通信技术得到了迅速发展,各种无线业务和应用以惊人迅速增长,新的无线通信及网络技术不断涌现。无线通信和网络蓬勃发展的同时,其高速和宽带化特征使得对频谱资源的需求日益加大,同时也使得频谱资源的供需矛盾问题日益突出。认知无线电提出的从固定频谱使用到动态频谱使用的变革可有效地提高频谱利用率;同时,认知无线电强调的高度智能性符合无线通信系统和网络的发展方向。本论文关注认知无线电智能性体现的最重要环节——学习,在离线学习和在线学习两个方面开展研究,主要完成了以下具有创新性的研究成果:1.提出一种通用的CR离线学习与决策框架,在此框架下研究神经网络(NN)和最小二乘支持向量机(LSSVM)两种学习方法的具体应用。在NN方面,提出了“直接的”径向基函数神经网络(RBF-NN)学习与决策方法;与传统“间接的”方法相比,通过在学习训练前增加优化案例搜索处理,减少了输入及输出神经元数量,降低了训练的复杂度;并可直接完成配置参数决策,提高决策实时性。在LLSVM方面,从复杂度、性能等方面对比研究了CR场景下的几种多分类LSSVM;研究了非支配排序遗传算法完成LSSVM超参数搜索以提高学习算法的普适性。仿真结果表明,RBF-NN和LSSVM方法都能提升认知无线电系统性能,非支配排序的遗传算法能够在较少进化代数内搜索出合适的超参数,基于LSSVM的离线学习具有更好的决策性能和泛化性能。2.提出一种能促进多用户学习收敛的,基于用户聚类和可变学习速率的多Agent强化学习方法,以解决多用户在下垫式频谱共享下的信道选择与功率分配问题。首先使用分层处理分离信道选择与功率控制,采用快速最优信道搜索结合基于性能预测的用户数均衡调节方法实现信道分配;其次,使用随机博弈框架对多用户功率控制问题进行建模,引入K均值用户聚类减少博弈参与用户数量和降低单个用户的环境复杂度,并提出可变Q学习速率和策略学习速率的方法进一步促进多Agent强化学习的收敛。仿真结果表明,该方法能使多个用户的功率状态和总收益有效收敛,并且获得整体性能达到次优。3.提出了一种在总功率资源受限条件下基于纳什议价解的方法,用于多信道多用户的信道选择与功率分配。设计了合理的纳什议价效用函数,使纳什乘积能明确表征认知无线电系统的性能指标,并证明了纳什议价解的存在性与唯一性。提出基于梯度下降思想和性能变化预测的具体迭代议价过程完成信道与功率的分配。理论分析与仿真结果表明,基于纳什议价的功率分配满足全局比例公平,信道与功率分配的迭代算法能够得到较好性能,达到系统总性能的次优解。4.提出协作去耦合方法和跨层联合方法解决多跳认知无线网络的多层资源分配问题。协作去耦合方法首先单独完成路径选择任务,随后进行信道与功率的博弈分配;跨层联合方法则通过博弈直接对路径、信道、功率三层资源进行同时分配。两种方法都综合考虑网络层、MAC层、物理层的启发原则,引入了节点被干扰度信息和节点主动干扰度信息来辅助路径的选择;设计了基于功率允许宽度信息的Boltzmann探索来完成信道与功率选择;设计了长链路和瓶颈链路替换消除手段以进一步提高网络性能。从促进收敛角度,选用序贯博弈方法,并设计了具体的博弈过程;此外还分析了博弈的纳什均衡,讨论了两种算法的复杂度。仿真结果表明,协作去耦合方法和跨层联合方法在成功流数量、流可达速率、发射功耗性能指标上均优于简单去耦合的链路博弈、流博弈方法。5.提出一种无需信息交互的多用户自主Q学习方法,用于实现认知无线电中的多用户动态频谱访问。该方法采用自学习方案,每个认知无线电用户不需要耗费通信资源与其他用户进行信息交互,仅通过观察自己的回报进行强化学习,定义的回报值能反应信道优劣以及信道冲突状态;设计了充分探索、倾向优势信道、冲突惩罚的学习策略,实现多用户多信道的动态频谱访问。对2用户2信道的场景,提出一种快速学习算法并证明了它能够收敛到整体回报最大。仿真结果表明,该方法能使认知无线电多用户多信道选择以大概率收敛到纳什均衡,且得到高的整体回报性能。

【Abstract】 Along with the informational progress in current world and society, the wirelesscommunication technology has experienced rapid development. Various wirelessservices and applications increase at an amazing speed, and novel techniques in wirelesscommunication and networking emerge continuously. Simultaneously, the characteristicof wireless networks in high data rate and wide band leads to the increasing requirementfor spectrum resource, so that the contradiction between the supply and the demand ofspectrum resource appears to be more and more extrusive.Cognitive radio (CR) proposes a reform strategy with a dynamic spectrum utilizinginstead of the fixed one, which can effectively raise the frequency utilizing efficiency.Moreover, the high intelligence emphasized by CR accords with the developmentdirection of wireless communication systems and networks.This dissertation focuses on the learning part in CR, which should be the best oneto exhibit its intelligence. The author researches on off-line learning and on-linelearning, and the major contributions are as follows:1. A universal learning and decision making framework for CR is proposed. Theconcrete applications of two learning methods based on the framework, i.e. aneural network (NN) method and the least square support vector machine(LSSVM) method, are investigated. In the aspect of NN, a “direct” radial basisfunction neural network (RBF-NN) based learning and decision making method isproposed. Compared with the traditional “indirect” methods, the “direct” one hasan additional process for optimal cases searching before training stage, which canreduce the number of input neurons and output neurons to decrease the trainingcomplexity, and can improve its real-time performance by means of direct decisionon parameter configuration. In the aspect of LSSVM, several multiclassclassification approaches are discussed and compared in term of complexity andperformance under the constructed CR scenario. The non-dominated sortinggenetic algorithm is adopted to implement the hyper-parameters searching ofLLSVM, so as to enhance its universality. Simulation results show that both theRBF-NN method and the LSSVM method can improve the performance of CRsystems, and that the non-dominated sorting genetic algorithm can search outsuitable hyper-parameters within a few evolutional generations, and that theLSSVM method behaves better decision performance and generalization property. 2. For solving the problem of channel allocation and power control in spectrumunderlay cognitive radios, a multi-agent enforcement learning method based onuser clustering and a variable learning rate is proposed, which can effectivelyimprove convergence of multiuser learning. Firstly, a hierarchy processing methodis used to separate channel selection and power control, and channel allocation isimplemented by fast optimal search combined with user-number balance based onperformance prediction. Secondly, stochastic game framework is adopted to modelthe multiuser power control issue. In subsequent multi-agent enforcement learning,K-means user clustering method is employed to reduce the user number in gameand single user’s environment complexity, and a variable learning rate scheme forQ learning and policy learning is proposed to promote the convergence ofmultiuser learning. Simulation results show that the method can make themultiuser’s power status and global reward converging effectively, and moreoverthe whole performance can reach sub-optimal.3. A Nash bargaining solution based method is proposed for multi-channel multi-userchannel selection and power allocation under the condition of limited total powerin CR. A reasonable Nash bargaining utility function is designed, which makesNash product be able to explicitly express the performance of CR systems;furthermore, the existence and uniqueness of Nash bargaining solution is proved.An iteration bargaining procedure based on the idea of gradient descent andprediction of performance variation is proposed to implement allocation ofchannel and power. The theoretical analysis and experimental simulation show thatthe power allocation method based on Nash bargaining solution conforms toproportional fairness, and the iteration algorithms for channel and powerallocation can achieve sub-optimal performance of the whole systems.4. A cooperative decoupling method and a cross-layer joint method are proposed formulti-layer resource allocation in multi-hop cognitive radio networks. In thecooperative decoupling method, the task of path choosing is accomplishedindependently, and then is the game allocation of channel and power. In thecross-layer joint method, the three-layer resource of path, channel and power issimultaneously allocated by a game process. The both methods syntheticallyemploy the heuristic principles of network layer, MAC layer and physical layer;they assist the path choosing by using the information of interference receivingdegree and interference sending degree. The Boltzmann exploration based onwidth of permitting power is designed to execute the selections of channel and power. A method of replacement and elimination of long link or bottleneck link isused to further enhance the network performance. For improving convergence, asequential game process instead of simultaneous game process is chosen and itsconcrete implement process is provided, since the former has better behavior incurrent scenario. Besides, the Nash equilibrium of the games and the complexityof two related algorithms are analyzed and discussed. Simulation results show thatthe cooperative decoupling method and the cross-layer joint method have betterperformance in the number of success flows, the achievable data transmission rateand power consumption than the cooperative link game and the local flow gamewith simple decoupling.5. A multiuser independent Q-learning method without information interaction isproposed for multiuser dynamic spectrum accessing in cognitive radios. Themethod adopts self-learning paradigm, where each CR user performsreinforcement learning only through observing individual performance reward, sothat it can save the communication resource in exchanging information with others,and where the reward is suitably defined according to the present channel qualityand channel conflict status. A learning strategy to implement multiuser dynamicspectrum accessing is designed, which performs sufficient exploration based on acriterion of trending to the better channel while punishing the conflict channel. Fortwo users two channels scenario, a fast learning algorithm is proposed and it isproved that the algorithm can converge to the maximal total reward. Thesimulation results show that the CR system based on the proposed method inmulti-user and multi-channel selection can converge to Nash equilibrium withlarge probability and acquire high performance in the whole reward.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络