节点文献

非线性核主成分的神经网络台风强度集合预报建模研究

A Neural Network Ensemble Prediction Model Based on Nonlinear Kernel Principal Component Analysis for Typhoon Intensity

【作者】 肖慧

【导师】 金龙; 杨善朝;

【作者基本信息】 广西师范大学 , 概率论与数理统计, 2010, 硕士

【摘要】 以中国气象局出版的南中国海海域1980-2008年台风强度资料为基础,针对台风强度的非线性、时变性特点,仿照数值天气预报的集合预报思想,采用神经网络、遗传算法及非线性核主成分分析方法进行了台风强度客观预报建模新方法研究。论文在建立这种新的神经网络台风强度集合预报模型时,采用的神经网络基本模型是前馈网络模型,该模型具有自适应学习、非线性映射等多种优良性能。但由于该模型在实际预报建模应用中发现,模型存在网络结构难以客观确定、容易产生“过拟合”等问题。而遗传算法是近年来人工智能技术领域应用十分广泛的一种基于生物界自然选择和自然遗传的全局优化算法,为了提高神经网络的泛化性能,论文利用遗传算法全局性搜索的特点进行神经网络结构和连接权的优化,并通过选择、交叉和变异三种遗传算子,在神经网络遗传种群的个体间进行信息交换,不断产生新的优良种群,将进化到最后一代的遗传种群作为集合预报建模的全部集合个体,并对每一个集合个体给予相同的权重,将每个集合个体的预报结果作合成,从而建立一种新的神经网络台风强度集合预报模型。在预报因子的处理方法上,由于以往在台风强度预报的实际应用研究中,一般是先通过计算预报因子与预报量的相关系数来初选相关较高的因子,再对这些高相关因子采用逐步回归方法自动筛选出建模的预报因子组合,但该方法没有进一步利用被逐步回归剔除的大量剩余高相关因子所包含的预报信息。对于这部分剩余因子若全部加入神经网络模型则会造成模型输入数据维数过高,网络学习训练时间过长,易产生“过拟合”问题。为了充分发掘利用全部预报因子的有效预报信息,简化模型输入,论文尝试采用一种非线性主成分分析方法——核主成分分析方法对被逐步回归剔除的高相关因子进行特征提取,综合考虑各核主成分的累积方差贡献及其与预报量的相关关系,提取包含了剩余因子大部分数据信息的核主成分与用逐步回归方法选入的因子一起作为神经网络集合预报模型的模型输入。其中,核主成分分析方法是核方法与主成分分析方法相结合的一种特征提取方法,可以提取数据间的高阶非线性关系,该方法是把输入空间的数据通过非线性映射变换到特征空间中,再在特征空间中使用主成分分析,最后通过核函数将特征空间的点积运算转化为输入空间的核函数计算,进行非线性特征提取。通过对上述模型构造和因子处理方法的研究,建立了一种非线性核主成分的神经网络集合预报模型。选取南中国海海域1980-2008年台风强度资料的6、7、8、9月份处于该海域内具有48小时以上生命史的台风个例作为预报研究对象,以气候持续因子作为预报的初选因子,1980-1999年数据资料作为建模样本,2000-2008年数据资料作为独立预报样本,分别建立了4个具有24小时预报时效的台风强度集合预报模型进行预报试验。由于各月初选的高相关因子有28~31个左右,对这些高相关因子用逐步回归选入模型的因子一般为5~8个左右,因而进一步采用核主成分分析方法对这部分被舍弃的因子进行特征提取,选取包含了剩余因子大部分数据信息的核主成分与逐步回归方法选入的因子一起作为神经网络集合预报模型的模型输入,以此建立各月的神经网络集合预报模型。利用各月的预报模型分别对2000-2008年独立样本进行预报试验,预报结果统计表明,6、7、8、9各月独立预报样本的预报平均绝对误差分别为4.58m/ s、4.52m/ s、3.13m/ s、4.58m/ s。为了分析检验该神经网络集合预报方法的性能,进一步依据相同的初选预报因子,使用传统的逐步回归预报方法建立方程进行预报计算,逐步回归方法对各月独立预报样本的预报平均绝对误差分别为4.84m/ s、5.58m/ s、3.68m/ s、5.14m/ s。神经网络集合预报模型的预报误差比逐步回归预报方法分别下降了5.25%、19.12%、14.94%、10.81%(即平均相对误差)。综合以上的预报试验和对比分析结果表明,论文研究建立的这种非线性核主成分神经网络集合预报方法比传统的逐步回归预报方法有更好的预报效果,其原因主要是论文提出的这种新的预报建模方法,能够很好地将被剔除的预报因子通过非线性核主成分特征提取的数学计算处理,将这些以往被剔除的有用预报信息加入预报模型,使预报模型包含更多的有效预报信息,从而改进和提高了预报模型的预报效果,这种预报模型构建方法和预报因子处理技术对相关领域的预报建模研究具有较好的参考意义。

【Abstract】 A new objective prediction model has been developed for predicting typhoon intensitybased on neural network, genetic algorithm, Kernel Principal Component Analysis (KPCA) andusing the ensemble prediction theory of numerical weather prediction, due to the fact thattyphoon intensity is characteristic of nonlinearity and transientness. Typhoon intensity data weretaken from the“Typhoon Almanac”published by the China Meteorological Administration from1980 through 2008.To construct a new neural network ensemble prediction model for typhoon intensity, a back-propagation (BP) network is used as the basic model. The BP network has advantages ofadaptive learning, nonlinear mapping and so forth. But this network is difficult to determinedobjectively and yield“over-fitting”. So genetic algorithm is applied for optimize both of theneural network structure and connection weights with its global search characteristicsmeantime 3 genetic operators called selection crossover and mutation are used to exchangeinformation among individuals continuously until the best also the last generation of geneticpopulation in evolution process is reserved which worked as the member of ensemble predictionmodel than compound each forecast result of individuals with the same weight to set up a neuralnetwork ensemble prediction model thereby.Generally, for the treatment of predictors in the practical typhoon intensity prediction, thefactors, that have high individual correlation coefficients with the predictor, are treated usingstepwise regression method to select predictors for modeling, but the predictors, that areeliminated by stepwise regression method and have high prediction information, are discarded.While, if the eliminated factors with a number of prediction information that stepwise regressionselected are all used, too long training time may lead to over-fitting. So KPCA method is appliedfor feature extraction from the eliminated factors that linear regression equation selected, thanchoose few of the KPCA which contain the most prediction information with the factorsstepwise regression selected as input data for the ensemble prediction model. KPCA combinedKernel method and Principal Component Analysis, use PCA method after carried the data frominput space to feature space by nonlinear mapping, at last, change dot-product operation in thefeature space into kernel calculation in the input space, thus be able to extract non-linear relationship in data.According to the above research of model construction and treatment method for predictor,a neural network ensemble prediction model has been established base on KPCA. Take theclimatology and persistence factors as the primary factors to set up 4 typhoon intensity ensembleprediction model with 24 hours forecast aging based on the data of the typhoon intensity with 48hours life history in June July August and September from 1980 to 2008 respectively for test,which the data from 1980 to 1999 as modeling samples and the data from 2000 to 2008 asindependent prediction samples. As 28 to 31 factors selected by correlation coefficient eachmonth, but only 5 to 8 factors reserved after stepwise regression used, so KPCA method isapplied for feature extraction from the eliminated factors that stepwise regression selected, thanchoose few of the KPCA which contain the most prediction information with the factors stepwiseregression selected as input data to establish the neural network ensemble prediction model eachmonth thereby. The statistical results show that the mean absolute error of June July August andSeptember are 4.58 m /s 4.52 m /s 3.13 m /s 4.58 m /s respectively. In order to investigate theforecasting capability of this model, traditional linear regression prediction equation for the sameindependent prediction samples is discussed based on the same data, the corresponding error are4.84 m /s 5.58 m /s 3.68 m /s and 5.14 m /s . The error of ensemble prediction decreased5.25% 19.12% 14.94% and 10.81% than linear regression prediction respectively (the meanrelative error).The results show that the neural network ensemble prediction model based on KPCA ismore accurate than the traditional stepwise regression method. The reason is that, in the newmodel, the predictors that are eliminated, were treated using KPCA, and then their usefulinformation was added into the prediction model. Thus the new model contains more effectiveprediction information that can improve the forecast effect of the ensemble prediction model.Furthermore both of the way to model construction and the treatment technology for predictorhave a good reference significance for the research of prediction modeling in related fields.

节点文献中: