节点文献

统计建模方法的理论研究及应用

Theoretical Studies on the Methods of Statistical Modelling and Their Applications

【作者】 刘春波

【导师】 潘丰;

【作者基本信息】 江南大学 , 控制理论与控制工程, 2011, 博士

【摘要】 在当今信息时代,各种统计方法层出不穷,统计知识得到越来越多的应用。例如,统计的多尺度建模无论是在理论统计学还是在应用统计学中现都已成为热门课题,这无论对统计方法还是其在各个应用科学领域的发展都起着冲击作用;基于核的学习方法引起了数据分析领域的一场革命;广义可加模型高度的灵活性,为有效揭示数据间所隐含的各种关系提供了一种有效的方法。在化工领域,一个有效的过程模型的建立,对研究如何科学规划生产工艺,进而实现生产过程的优化意义重大。针对常规预测函数模型存在未将预测时域的优化从总体上考虑的不足,在统计的多尺度建模方面研究后,基于小波多尺度的特性而提出了基于小波基函数和Hammerstein模型的预测函数模型,其内部模型参数可以通过不断辨识,自适应的进行校正。利用小波的紧支局部性和多尺度分析特性,既保证了整体误差性能的优化,又突出了重要拟合点的逼近要求,并实现了优化变量的集结。理论分析和仿真应用表明,该方法有更好的跟踪性和抗模型失配性能。(1)针对如何提高核方法的建模精度的同时还要兼顾建模速度的问题,通过核方法研究,结合小波分析的理论,提出了小波融合核的建模方法。该方法具有小波多分辨率分析和核方法对输入维数不敏感的特点,理论上在保证建模精度的前提下,有更快的建模速度。在此基础上,分别通过一维函数和化工生产数据进行了仿真研究,仿真结果也验证了算法的有效性。(2)由可分Hilbert空间与L~2 ( R )的等价性,利用内积同构的线性算子,可以把L~2 ( R )中子空间的小波尺度函数折算为Hilbert空间中子空间的小波尺度函数。基于支持向量机核函数的条件和小波多分辨率理论,在Hilbert空间构造出Morlet小波核函数。通过仿真实验,与传统的RBF核函数相比较,该尺度再生核函数具有更高的精度和更好的泛化能力。(3)在应用融合核支持向量机建模以提高模型的泛化能力和精度时,为避免在进行核融合时,支持向量机稀疏性的缺失,提出了将数据映射到稀疏特征空间进行研究。通过仿真研究表明,所建模型在保证稀疏性的前提下,能提高建模精度,从而验证了算法的有效性,有良好的应用意义。针对谷氨酸发酵过程复杂,如何解决难以建立有效的模型来指导生产过程优化的现状的研究中,发现广义可加模型(GAM)能为谷氨酸的发酵过程提供行之有效的建模方法。利用该方法可以方便的分析不同的建模变量对谷氨酸产量的影响并从中得出与谷氨酸产量间的关系。研究中,基于15批次发酵实验数据,通过对不同影响因素的分析,最终选择三个显著影响因素(时间T、溶氧DO和氧摄取率OUR)来构建GAM模型,这一模型可以对谷氨酸的发酵过程解释97%。该模型的构建成功,为研究发酵过程中不同因素对谷氨酸产量的影响提供了基础。该模型不仅为根据在线数据预测谷氨酸产量提供了可行有效的方法,而且为发酵过程中在线故障诊断提供了新思路。在谷氨酸发酵过程故障诊断的方法研究中,提出了基于GAMs和Bootstrap方法的故障诊断方法。该方法能只依靠显著观测变量就可对发酵过程的状态是否正常做出判断,并能初步给出故障源相关的观测变量。该方法只有很少的参数需要确定和调整,在发酵过程中,一方面能及时的对故障状态进行报告,另一方面为排除故障源提供必要的参考信息,从而为发酵过程的正常运行提供了可靠的保障。总之,随着计算机技术的快速普及和广泛发展,面对着数据和信息爆炸的挑战,为迅速有效地将数据提升为信息、知识和智能,统计建模方法在工业领域的研究意义重大。

【Abstract】 In today’s information age, the statistical knowledge has been used more and more widely in practice with the development of statistic methods emerging in an endless stream. The multi-scale modelling of statistics has been a hot subject whether in theoretical statistics or applied statistics which impacts both the statistical methods and the development in all fields of science. The study methods based on kernel method have resulted in a revolution in the field of data analysis. The high mobility of generalized additive models (GAMs) provides a valid method which can reveal the implicit various relationships among data.The predictive functional model is proposed based on wavelet function and Hammerstein model on the basis of the multi-scale characteristic of wavelet, and the inner model parameters can adjust automatically depended on the repeated identification. The multi-scale analysis characteristics of wavelet and the properties of compact support can not only guarantee the optimization of the integrated error, but also highlight the approaching requirement of the important fit points as well as fulfill the aggregation of the optimized variables. Both the theoretical analysis and simulation applications showed that this method had the performance of more fast and less mismatching of the model.Based on the theories of wavelet analysis and kernel method, a wavelet fusion kernels modelling method was proposed. It performed multiple-scaled decomposition on sampled data series using wavelet transform firstly, the reconstructed approximate series and detail series were then regressed depending on kernel method, and the outputs were fused finally. This modelling method owning the traits of wavelet multiple solution analysis and kernel method’s insensitiveness toward the input dimension has faster modelling speed on the premise of ensuring satisfied modelling precision. On this basis, the simulations were carried out through one dimensional function and the data from the chemical process. The simulation results also showed the effectiveness of the proposed algorithm. When the Hilbert space and L~2 ( R ) are defined isomorphic, linear operators of inner product isomorphic can convert the scaling functions of wavelet subspace in L~2 ( R ) space into other scaling functions of Wavelet Subspace in Hilbert space. Morlet wavelet kernel function is proposed in Hilbert space based on the conditions of the support vector kernel function and wavelet multi-resolution theory. The simulation experiment results showed that the scaling reproducing kernel function had better accuracy and generalization capability as compared with the traditional support vector machine with radial base function (RBF) kernel function. When trying to apply the SVM modelling so as to improve the generalization ability and the precision of models, sample data were mapped to sparse feature space to prevent the loss of SVM’s sparsity when the kernels were fused. Through the simulation, the results showed that this modelling method could improve the modelling accuracy under the premise of guaranteeing sparsity, which verified effectiveness of the method and of significant practical application.The GAMs provide an effective way to model the fermentation process of glutamate (Glu). By this model, the effect of the different modelling variables on the production of Glu can be easily analyzed and the relationships between modelling variables and the production of Glu can be discovered. Three significant factors, fermentation time (T), dissolved oxygen (DO) and oxygen uptake rate (OUR), were finally selected based on the analysis of the effect of the different factors using the data from 15 batches of Glu fermentation to construct the GAM model. The constructed GAM model could capture 97% variance of the production of Glu during the fermentation process. This model was applied to investigate the single and combined effects of T, DO and OUR on the production of Glu, and the conditions to optimize fermentation process were proposed based on this model. Results showed that the production of Glu can reach a high level by control the concentration levels of DO and OUR during the fermentation process following the optimization conditions proposed by GAM-2. The successful construction of this mode provides the bases to study the effect of different factors on the production of Glu, and we can study of fermentation conditions with specific aims to optimize the fermentation of Glu. This study not only provides an effective way to predict the production of Glu using online data but also provide a novel idea for online-fault diagnosis during the fermentation of Glu. During the study on the fault diagnosis of the fermentation process of Glu, the online fault diagnostic method was proposed based on methods of GAMs and Bootstrap. This method can judge if the state of the fermentation process is normal only relying on the effective observation of the variables and also obtain the observed variables associated with the sources of the faults. This method only involved a few parameters to be determined and adjusted. During the fermentation process, on one hand, this method can provide report timely on the state of the faults, and on the other hand, provide necessary information on debug of the sources of the faults, which therefore provide a reliable way to guarantee the normal running of the fermentation process.All in all, with the rapid growth and extensive development of computer technology and the challenge of the data and information explosion, statistical modelling methods have great significances in the field of industry so as to elevate the data to information, knowledge and intelligence rapidly and effectively.

  • 【网络出版投稿人】 江南大学
  • 【网络出版年期】2011年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络