节点文献

小子样下数据处理的若干问题研究

Research on Some Problems of Data Processing under Small Sample

【作者】 梁武

【导师】 王建州;

【作者基本信息】 兰州大学 , 应用数学, 2008, 硕士

【摘要】 传统的数据处理都是立足于大子样的前提下,并且所提出的各种预测方法,如:时间序列预测方法、人工神经网络等都是在大子样时其性能才有理论上的保证。而在多数实际情况中,样本数目通常是非常有限的,甚至是很少的,这样很多方法都难以取得理想的效果,且大部分时间序列预测方法没有包含非线性的因素,而人工神经网络得到的解易陷入局部最优。这些不足极大限制了这些方法在实际中的应用。因此,小子样预测一直是统计界研究的难点问题。灰色预测理论较好地解决了小子样预测问题,但灰色预测在模型检验为不合格时,即P≤0.7,C≥0.65时不可用。支持向量机(Support Vector Machine,即SVM)用来解决非线性函数估计问题,服从结构风险最小化原理而非经验最小化原理,其算法是一个凸二次优化问题,保证找到的解是全局最优解,能较好的解决小子样、非线性、高维数等实际问题。最小二乘支持向量机(Least Squares Support VectorMachine,即LS-SVM),是支持向量机的一种演变,即将SVM法中的不等式约束改为等式约束,且将误差平方和损失函数作为训练集的经验损失,这样我们就把问题转化为一个线性矩阵求解问题。该方法具有专门针对小子样、算法复杂度与样本维数无关、处理非线性等优点。本文的主要研究成果及贡献如下:1)小子样预测问题一直是统计界研究的难点问题。本文通过对尖峰负荷及传染病发病率的预测,比较了几种方法的优劣,找到对于尖峰负荷及传染病发病率的最优预测。2)首次将LS-SVM方法应用于小子样传染病发病率的预测中,通过与灰色预测方法的比较,验证了该方法对于传染病发病率预测的有效性和先进性。3)提出了粒子群优化算法优化灰色预测模型的参数及输入集的方法,通过模拟计算预测精度明显提高。

【Abstract】 The traditional statistics analysis bases on large sample data, and all kinds of estimate methods, such as, time series forecasting method, the artificial neural network etc. has theoretical assurance all just under the big sample. But in most actual circumstances, the sample number is usually very limited, even is few, thus a lot of methods are hard to obtain ideal result and many time series forecasting methods don’t include the nonlinear factors, and the solution that artificial neural network gets into the local superior easily. These shortages limit these methods in actual application. Therefore, the small sample forecasting has been a difficult problem in statistics. The gray prediction theory is more adaptive to the small sample estimate problem compared with other methods, but gray estimate model examine grade is unqualified, namely: P≤0.7, C≥0.65, gray forecasting model can’t be used. Support vector machine has been introduced for solving nonlinear function estimation problems. It is established based on the structural risk minimization principal rather than the minimized empirical error. Within this new approach the training problem is reformulated and represented in such a way so as to obtain a (convex) quadratic programming (QP) problem. The solution to this QP problem is global and unique, and it can well solve small sample, nonlinear, high dimension problems. A modified version of SVM for regression is called least squares support vector machine, namely, changed the restrictions from inequation to equation in the SVM method and made the error squares sum loss function as the empirical loss. In this way, we translate the problem into a linear matrix requesting problem. This method has the advantage to deal with small sample, complex algorithm and nonlinear and has nothing with the sample dimension.The main contributions of this paper are listed as follows:(1) The small sample forecasting has been a difficult problem in statistics for a long time. In the paper, we compare these methods by forecasting peak load and the incidence of infectious diseases, and we get the superior forecasting methods about peak load and the incidence of infectious diseases.(2) We apply LS-SVM method to the incidence of infectious diseases forecasting under small sample for the first time. By comparing with grey forecasting method, we get that the method is effective and advanced to forecast the incidence of infectious diseases.(3) Put forward a method using Particle Swarm Optimization algorithm to optimize the Grey Forecasting Model parameters and optimum input subset. By simulating and computing, we get an improved forecasting accuracy.

  • 【网络出版投稿人】 兰州大学
  • 【网络出版年期】2009年 01期
  • 【分类号】O212
  • 【被引频次】5
  • 【下载频次】417
节点文献中: 

本文链接的文献网络图示:

本文的引文网络