节点文献

数据校正和支持向量机的过失误差识别的研究

Research of Gross Error Detection in Data Reconciliation and Support Vector Machine

【作者】 孙少超

【导师】 黄道;

【作者基本信息】 华东理工大学 , 控制科学与工程, 2012, 博士

【摘要】 随着信息技术的发展,过程数据发挥着越来越重要的作用,它不仅可以执行一系列的监视任务,而且广泛用于过程控制,生产调度,成本控制,以及经营决策等等。然而过程数据往往包含了随机误差和过失误差,这些误差尤其是过失误差对工业生产危害很大。为此我对过失误差识别技术进行深入的研究。另外,随着支持向量机蓬勃发展,训练样本时常包含了过失误差,这也对支持向量机的广泛应用带来了阻碍。本文围绕着过失误差,研究了数据校正领域和支持向量回归领域的应对过失误差的方法,主要内容如下:1、提出了一种同步识别过失误差和数据校正的混合方法。该方法先用最大支撑树把MT方法和NT方法结合起来用于形成误差怀疑集合,再把前面获得的怀疑集合应用到MILP模型中完成测量值的校正和过失误差的补偿,这种混合方法通过彼此配合可以有效避免MT、NT、MILP方法的缺点发挥它们的优势,并通过仿真可以看到这个混合方法的过失误差识别能力和校正精度都优于其他方法,另外我们的方法更加适用于解决大规模问题。2、同时过失误差识别和数据校正的MILP模型在线性系统中取得很好的效果,但由于模型的复杂很难应用到非线性或者动态的对象中。为此,本文采用公式推导的方式证明了数据校正的MILP模型可以等价地表示成非线性规划模型,因而该模型不仅可以用混合整数线性规划算法求解而且可以用非线性迭代算法求解。仿真结果也证明这两种方法求解的结果是一致的。此外本文也对原有的模型进行改进,并将MILP方法成功地应用到CSTR动态系统中。3、随着信息技术的发展,工业过程中保留了大量的历史信息,如果能够有效的利用这些历史信息可以有效的提高过失误差的识别概率和数据校正的精度。为此我们提出了一种基于历史先验信息的新方法,该方法包括两个步骤:第一步利用先验信息和空间冗余进行过失误差的识别,第二步采用赋权的方法进行数据协调。为了评价不同信息对我们方法的影响,我们设计了两个标准来区分不同的信息,并将我们的方法应用到一个具有挑战的问题中,蒙特卡洛仿真实验显示了我们的方法的有效性。4、提出了一种ε不敏感光滑鲁棒支持向量回归机,这种鲁棒支持向量回归结合了鲁棒估计的概念和光滑技术。本文在Fair估计的基础上提出了一种ε不敏感鲁棒函数,并为了方便求解,我们利用光滑技术对该函数进行光滑处理得到了ε不敏感光滑鲁棒函数,把这种ε不敏感光滑鲁棒函数应用到支持向量机中,就获得了ε不敏感光滑鲁棒支持向量回归机。它具备良好的鲁棒性和稀疏性,而且由于ε不敏感光滑鲁棒函数二次可导,可以利用Newton-Armijo算法进行求解,采用这种算法二次收敛,而且收敛到全局最优解。仿真实验也显示了我们的方法学习速度快,并且无论过失误差存在与否都有很好的性能。5、一般说来,由于非凸的鲁棒损失函数的支持向量回归机相对于利用凸鲁棒损失函数的具有更好的鲁棒性,然而非凸规划的问题求解十分困难,为此我们用两个可导的凸函数构成一个鲁棒的非凸函数,并基于这种非凸损失函数形成了我们的平顶的支持向量回归,为了求解的方便,本文利用CCCP技术把非凸问题变成一系列的凸规划问题进行求解。由于该方法中避免了包含过失误差的数据成为支持向量,所以这种方法相对于标准支持向量回归稀疏性更好,而且这种方法也有很好的鲁棒性,最后把我们的方法应用到人工训练集合和真实数据集,试验表明我们提出的方法相对于标准SVR和weightedSVR就有优良的鲁棒性和稀疏性。最后,总结全文对未来的工作进行展望。

【Abstract】 Plant data are playing a more and more important role with the development of information technology. Plant data are used for not only a series of monitoring tasks but also other plant activities such as process control, process optimization, production scheduling, cost control and so on. However, these data usually contain random error and possibly gross error.These errors, especially gross error,cause great harm to the industrial production. In addition, support vector machine develops rapidlyin recent years in recent years. However, the existence of gross error hindered its widespread application.This dissertation focuses on gross error and research the method of dealing with gross error in the field of data recionciliation and support vector regression.1、A MT-NT-MILP (MNM)combined method is developed for gross error detection and data reconciliation for industrial application. In our new method,MT method and NT method are combined to generate gross error candidates by maximal spanning tree before data rectification. Then the candidates are used in the MILP model to improve the efficiency by reducing the number of binary variables. This combination is helpful to overcome the defect of MT、NT and MILP method.Simulation results show that our proposed method overcomes other methods and the method is effective especially in a large-scale problem.2、Mixed integer linear programming (MILP) approach for simultaneous gross error detection and data reconciliation has been proved as an efficient way to adjust process data with material, energy, and other balance constrains for linear system. However, under the MILP framework, it is difficult for model expansion. In this paper, it proves that the MILP model for data reconciliation is equal to a nonlinear programming model, therefore it could be resolved not only by the mixed integer linear programming but also by iterative method. The results obtained by applying the two algorithms are the same. That also verifies the correctness of the conclusion. Under the nonlinear programming framework, an extended model is presented. Besides we successfully apply MILP method for dynamic system (CSTR).3、With the information technology applied widely to process industry, a large amount of historical data used for obtaining the prior probabilities of gross error occurrence is stored in database. To use the historical data to enhance the efficiency of gross error detection and data reconciliation, a new strategy which includes two steps is proposed. The first step is that mixed integer program technique is incorporated to use the prior information to find out gross errors. The second step is to eliminate all detected gross errors and adjust process data with material, energy, and other balance constrains. In this step an improved method is proposed to achieve the same effect with traditional method through adjusting the covariance matrix.The new criteria is designed to differentiate different prior information. Performance of this new strategy is compared and discussed by applying the strategy for a challenging test problem. Simulation results show the effectiveness of our method.4、A novel support vector machine for regression, which combines robust estimators (Fair estimator is chosen in this paper) and the smoothing technique, is proposed and called as smoothε-insensitive Fair estimator support vector machine for regression (ε-SFSVR). In theε-SFSVR,a new type ofε-insensitive loss function, called asε-insensitive Fair estimator,is proposed by combining Fair estimator. With this loss function robustness could be improved and sparseness property may be remained. To enhance the learning speed,we apply the smoothing techniques that have been used for solving the support vector machine for regression, to replace theε-insensitive Fair estimator by an accurate smooth approximation. This will allow us to solveε-SFSVR as an unconstrained minimization problem directly. We also prescribe a Newton-Armijo algorithm that has been shown to be convergent globally and quadratically to solve ourε-SFSVR. Based on the simulation results, the proposed approach has fast learning speed and better generalization performance whether outliers exist or not.5、One robust non-convex loss function is constructed by combining two differentiable convex functions because non-convex loss functions own advantage over convex ones in robustness and generalization performance for support vector regression. With this non-convex loss function, a flatheaded support vector regression (FSVR) is proposed and then the concave-convex procedure is used to solve the FSVR by transforming the non-convex problem into a sequence of convex ones. The FSVR integrates the advantage of standard SVR and weighted SVR, in other words, it can not only obtain better sparseness property but also restrain the outliers of training samples. Experiments have been done on artificial and benchmark datasets and the results show the effectiveness of the proposed FSVR.Finally,the dissertation is concluded with a summary and prospect of future researches.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络