节点文献

基于支持向量机的时间序列组合预测模型

Time Series Combination Prediction Model Based on Support Vector Machine

【作者】 向昌盛

【导师】 袁哲明;

【作者基本信息】 湖南农业大学 , 农业昆虫与害虫防治, 2011, 博士

【摘要】 自然科学、社会科学等领域积累了大量的时间序列数据特别是多维时间序列数据,往往既受多个环境因子的影响(需采用回归分析),又自身隐含动态时序特征(需采用自相关分析),同时呈非线性(需采用非线性分析)。预测是认识和决策的依据,发展高精度的时间序列特别是多维时间序列非线性预测分析方法意义重大。当前时间序列分析方法主要沿经典时间序列分析和相空间重构两个方向发展。本文第一部分沿经典时间序列分析方向深入。经典时间序列分析的关键环节为:拓阶、定阶、变量筛选、回归模型选择,前三个环节实际上常与回归模型选择耦联在一起。早期经典的多维时间序列分析模型,如带控项的自回归滑动平均模型(Controlled Autoregressive Integrating Moving Average, CARMA)及其简化模型—带控项的自回归模型(Controlled Autoregressive, CAR)均属线性模型,因而其实际预测能力较弱。基于经验风险最小的人工神经网络如前馈神经网络(Back-propagation Neural Networks, BPNN)具有较好的非线性逼近能力,但存在易陷入局部最小、可解释性差、带有较强的经验性等缺陷。基于结构风险最小的支持向量机(Support Vector Machine, SVM)以统计学习理论为基础,较好地解决了局部最小、过学习、非线性等难题,泛化能力优异,因此,本文回归模型选用SVM作为基本建模工具。1、SLR-LSSVM组合预测模型。利用逐步线性回归(Stepwise Linear Regression, SLR)对因子进行线性筛选,获得保留因子后用最小二乘支持向量机(Least Squares Support Vector Machine, LSSVM)进行非线性建模预测,即为SLR-LSSVM多维时间序列组合预测模型。二代玉米螟百株幼虫虫量与8个气象因子关系的拟合与独立预测表明,SLR-LSSVM优于SLR-MLR、SLR-BPNN、MLR、BPNN、LSSVM等参比模型,说明因子筛选、基于结构风险最小的SVM非线性建模有助于提高预测精度。2、CAR-LSSVM组合预测模型SLR-LSSVM仅考虑了环境因子的影响,未考虑自身隐含的动态时序特征(未拓阶),且其变量筛选基于SLR是线性的。CAR虽同时考虑了环境因子影响与自身动态时序特征,但其拓阶、定阶是线性的(基于MLR),变量筛选也是线性的(基于SLR)。借用CAR的思想,本文发展了非线性的CAR-LSSVM多维时间序列组合预测模型:先基于LSSVM以均方误差(Mean Squared Error, MSE)最小原则实施模型非线性拓阶、非线性定阶,再基于LSSVM对定阶后自变量进行非线性筛选获得保留自变量,最后基于LSSVM以保留自变量建模预测。大豆食心虫虫食率与5个影响因子关系的独立预测表明,CAR-LSSVM预测性能明显优于MLR、SNR(基于LSSVM的非线性逐步回归模型)、LSSVM、SLR-LSSVM、CAR等参比模型,说明非线性地统一考虑环境因子影响与自身时序特征、非线性定阶与非线性筛选变量是必要的。3. GS-LSSVM组合预测模型CAR基于F测验线性定阶和CAR-LSSVM基于MSE最小原则非线性定阶的共同缺陷包括:一是由低阶到高阶逐步拓阶,过程繁琐。二是因变量连带自变量同时拓阶,既易造成信息冗余、变量筛选时间增加,又易造成拓阶提前终止,降低模型预测精度。本文基于地统计学(Geostatistics, GS)与LSSVM,建立了一种快速定阶、既反映样本集动态特征又体现环境因子影响的高精度非线性时间序列组合预测模型GS-LSSVM:先基于地统计学后效时间长度进行因变量快速、充分拓阶、定阶;然后采用主成分分析消除自变量之间的信息冗余;最后以一步预测法检验GS-LSSVM的有效性。小样本松毛虫发生面积一维时间序列实例独立预测表明,GS-LSSVM模型明显优于LSSVM、GS-BPNN等参比模型。晚稻第五代褐飞虱发生量与4个气象因子的多维时间序列实例独立预测表明,GS-LSSVM预测精度高于GS-BPNN等参比模型,且稳定性最好,定阶快速准确。GS-LSSVM既反映样本集动态特征又体现环境因子影响,并避免过拟合、避免局部最小缺陷,具有非线性、泛化能力优异等优点,在时间序列预测领域有较广泛的应用前景。4、ARIMA-DSVM组合预测模型随着时间的推移,训练样本将越来越大,LSSVM占用的训练时间相当长,更为重要的是,对给定的某一步预测,此前历史所有样本均参与训练不一定合适,且每一个样本对预测结果的影响不一样,动态s-SVM (Dynamic s-insensitive Cost Function Support Vector Machine, DSVM)根据“近大远小”的原理,依时间动态调整不敏感损失函数参数(ε)值,保证了距离预测点时间越近的数据对预测结果影响越大,距离预测点时间越远的数据对预测结果影响则越小。差分自回归滑动平均模型(Autoregressive Integrating Moving Average, ARIMA)线性预测能力优异。当研究体系是线性或非线性未知时,本文综合线性ARIMA与动态非线性DSVM发展了ARIMA-DSVM组合预测模型:首先采用ARIMA提取、预测时间序列中的线性组分,然后采用DSVM对ARIMA预测残差进行非线性动态修正。松毛虫发生面积一维时间序列实例独立预测表明,ARIMA-DSVM模型优于ARIMA、DSVM等参比模型。本文第二部分沿相空间重构方向深入。基于相空间重构与LSSVM的时间序列预测包括两个关键环节:相空间重构中时间延迟τ和嵌入维m的确定、LSSVM模型王则化参数γ和核函数宽度参数σ的确定。以往研究中,相空间重构(确定τ和m)与LSSVM建模预测(确定γ和σ)是分步进行的,通过相空间重构确定的τ和m并不总能保证LSSVM有最优的预测精度。因此,不基于任何先验知识、纯粹从数据驱动实施τ和m以及LSSVM参数的联合优化是颇具吸引力的选择。然而,多因素多水平的遍历搜索优化极为耗时。5、GA-LSSVM组合预测模型多因素多水平的遍历搜索寻优极为耗时,而遗传算法(Genetic Algorithm, GA)是一种启发式、快速、并行搜索算法。本文发展的GA-LSSVM组合预测模型以LSSVM为基本建模工具,以GA实现τ、m、γ和σ的联合优化。对Mackey-Glass、加噪Mackey-Glass等一维时间序列实例的独立预测表明,GA-LSSVM稳定有效。6、UD-LSSVM组合预测模型GA是一种启发式算法,易陷入局部最优。均匀设计(Uniform Design, UD)在实验范围内选择具有低偏差趋于均匀分布的好格子点集来安排试验点,可大幅度降低实验次数到允许范围。LSSVM基于结构风险最小,较好地解决了局部最小、非线性等问题,泛化能力优异。本文针对相空间重构的延迟时间、嵌入维、LSSVM参数联合寻优问题,结合均匀设计与自调用LSSVM发展了组合预测模型UD-LSSVM,并对Mackey-Glass、Lorenz、年太阳黑子数等时间序列实例进行了仿真预测,结果表明UD-LSSVM计算复杂度低、预测精度高且优于文献报道,是一种基于数据驱动、快速有效的延迟时间-嵌入维-支持向量机参数联合优合的组合预测模型。

【Abstract】 There are a great deal of time series data especially multi-dimensional time series data in natural science and social science. Time series which are affected by environmental factors have inherent dynamic and nonlinear features. It is of great significance to develop high precision time series analysis method especially for nonlinear multi-dimensional time series because prediction is the foundation for understanding and decision-making. There are mainly two development directions for time series analysis:classical time series analysis and phase space reconstruction.The first part of this paper studies the direction of classical time series analysis methodIt is the key to extend order, determine order, filter variables and select regression model in classical time series analysis, the formers are often coupled with selecting regression model together. The traditional classical multidimensional time series analysis methods are modeled linearly, such as controlled autoregressive integrating moving average (CARMA) and controlled autoregressive (CAR), but their prediction abilities are poor. The back-propagation neural networks (BPNN) which is based on the empirical risk minimization has good nonlinear prediction ability, but falls into local minimum easily and has poor interpretation and strong empirical defects. Support vector machine (SVM) which is based on statistical learning theory has solved the local minimum, overfitting, and nonlinear problems, and has the advantage of global optimization and strong generalization ability, so support vector machine is used as the basic modeling tool in this paper.1 Combination prediction model—SLR-LSSVMThis paper proposes a combination prediction model (SLR-LSSVM) which the impact factors are filtered by stepwise linear regression (SLR), and then the model is established based on least squares support vector machine (LSSVM). The simulation experiment is carried out on the second generation corn borer larvae occurrence which has eight meteorological factors, and the prediction results show that SLR-LSSVM’s performance is superior to reference models, which indicates that the proposed model based on the factors filtered and SVM can improve the time series prediction precision. 2 Combination prediction model—CAR-LSSVMSLR-LSSVM only considers the affects of environment factors, without considering time series’inherent dynamic feature (without extending order), and the variables are filtered by SLR. Although CAR considers the effects of environmental factors and dynamic features, its order is determined by multiple linear regression (MLR) and variables are filtered by SLR too. This paper proposes a combination prediction model (CAR-LSSVM) Firstly, the optimal order is determined by minimum mean squared error (MSE) with LSSVM, and then the retained variables are obtained by nonlinear filtering after extending the order, lastly, the prediction model is established on the retained variables by LSSVM and the CAR-LSSVM’s performance is tested on the data of the moth-eaten ration of Leguminivora glycinivorella Mats which has five factors. The prediction results show that CAR-LSSVM’s performance is superior to reference models, such as MLR, SNR, LSSVM, SLR-LSSVM, and CAR, which indicates that it is necessary to consider environmental factors, dynamic features, nonlinear determining order and nonlinear filtering factors together.3 Combination prediction model—GS-LSSVMCAR’s order determined by F test and CAR-LSSVM’s order determined by minimum MSE have common defects:one is that the optimal order is obtained from low to high gradually is time-consuming, another is that the optimal order obtained by extending with the dependent and independent variables together is easy to cause information redundancy, while variables filtering are time-consuming and determining order is terminated before obtaining the optimal easily may reduce model’s performance. This paper proposes a high precision and determinatiion order fastly combination prediction method (GS-LSSVM), which can reflect time series’dynamic features and the affect of environmental factors. Firstly, the time series structure are analyzed by semivariogram of geostatistics (GS) and the optimal order is determined by variable range fastly, secondly, the redundancy information and dimension are reduced by principal component analysis, finally, the model is established on LSSVM. GS-LSSVM is applied to predicting the Dendrolimus punctatus occurrence area and the fifth generation brown planthopper for late-season rice. The prediction results show that GS-LSSVM’s performance is superior to LSSVM, GS-BPNN and has the advantage of determining order fastly and accurately. GS-LSSVM not only reflects the dynamic features and the affect of environmental factors, but also has good generalization ability. Therefore GS-LSSVM has a broad range of applications in the time series prediction field.4 Combination prediction model—ARIMA-DSVMThe training samples will be larger as time passed, the training time of LSSVM will too long to be accepted. More importantly, all history samples are involved in training unreasonable and each sample impact on the prediction results is different. Dynamic insensitive cost function support vector machines (DSVM) can adjust insensitive loss function parameters (ε) dynamically whereby the recentε-insensitive errors are penalized more heavily than the distantε-insensitive errors. This paper proposes a combination prediction model (ARIMA-DSVM) to predict the time series which characteristic is unknown. Firstly, the linear component of time series is predicted by ARIMA, and then the ARIMA prediction errors are corrected by DSVM. The result on Dendrolimus punctatus occurrence area shows that ARIMA-DSVM’s performance is superior to reference models such as ARIMA and DSVM.The second part of this paper studies the direction of phase space reconstructionTime series prediction model based on phase space reconstruction and LSSVM includes two key steps:determining the time delay (τ) and embedding dimension (m) in phase space reconstruction, and selecting the regularization parameter (y) and the kernel function width parameter (σ) of LSSVM. In previous studies, phase space reconstruction and LSSVM parameters are determined independently, so the determined r and m can not always ensure that LSSVM has the optimal prediction precision. Therefore, joint optimization forτ, m,γand a is a very attractive choice, which is driven purely from data and need not any priori knowledge of the time series. However, a multi-factor and multi-level joint optimization by exhaustive search algorithm is very time-consuming.5 Combination prediction model—GA-LSSVMA multi-factor and multi-level joint optimization by exhaustive search algorithm is very time-consuming, while genetic algorithm (GA) is a heuristic algorithm, which has parallel search ability. This paper proposes a combination prediction model (GA-LSSVM) in which theτ、m、γandσare jointly optimized by GA. The simulation experiment are carried out on Mackey-Glass and Mackey-Glass with noise, the prediction results show that GA-LSSVM is a stable and effective time series prediction model.6 Combination prediction model—UD-LSSVMGA is a heuristic algorithm which falls into local optimal easily. Uniform design (UD) arranges the experimental numbers by selecting good lattice points to reduce the experimental numbers greatly, which tends to distribute uniformly with low bias. LSSVM based on the structural risk minimization can solve the local minimum and nonlinear problems, and has excellent generalization ability. This paper proposes a combination prediction model (UD-LSSVM) to solve theτ、m、γandσjoint optimization problem by UD and self-calling LSSVM. The simulation experiments are carried out on Mackey-Glass, Lorenz and yearly sunspot time series, and the results show that the UD-LSSVM reduces the computational complexity and obtains high prediction precision, and the prediction results are superior to the results reported in the literature. The results indicate that UD-LSSVM is a fast and efficient jointτ-m-SVM parameter optimization prediction model for time series based on data driven.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络