节点文献

稳健多元线性回归在地理数据处理中的应用

Application of Robust Multiple Linear Regression in Geographic Data Processing

【作者】 韩小慧

【导师】 葛永慧;

【作者基本信息】 太原理工大学 , 地图制图学与地理信息工程, 2012, 硕士

【摘要】 多元线性回归是建立地理统计分析模型常用的数学方法。统计学家指出,在生产实践和科学实验采集的数据中,粗差出现的概率约为1%-10%。为了减弱或消除粗差对参数估计的影响,G.E.P.BOX于1953年提出了稳健估计的概念。稳健估计理论是建立在符合于观测数据的实际分布模式上,而不是建立在某种理想的分布模式上,即在粗差不可避免的情况下,选择适当的估计方法,使参数的估值尽可能避免粗差的影响,得到正常模式下的最佳估值。稳健多元线性回归能有效地消除或减弱粗差对参数估计的影响,同时稳健估计方法消除粗差的范围因稳健估计方法本身和具体问题的观测值数量的不同而不同。本文用仿真实验的方法,确定了稳健多元线性回归中相对更为有效的稳健估计方法,并确定了这些稳健估计方法消除或减弱粗差的范围以及它们完全消除粗差时需要的最少观测值数量。本文提出了一种确定稳健估计方法消除粗差范围的途径和具体的计算方法。用仿真实验(1000次)的方法、以多元(2-5)线性回归为例对常用的13种稳健估计方法消除粗差(最大值8.0σ0)的范围进行了比较。得出的结论为:L1法、German-McClure法、IGGIII方案和Danish法是常用13种稳健估计方法中相对更为有效的稳健估计方法。当观测值中包含一个粗差时,二元、三元、四元和五元线性回归完全消除3.0~8.0σ0粗差影响的最小观测值数量分别是7、8、10和11。当观测值中同时包含两个粗差时,二元、三元、四元和五元线性回归完全消除3.0~8.0σ0。粗差影响的最小观测值数量分别是10、12、15和17。一元线性回归是应用最为广泛的参数估计方法之一。本文提出了一元线性回归的自变量在等差级数的基础上进行双向黄金分割,提高了两端点观测值的多余观测分量,缩小了观测值之间多余观测分量的差异。在不增加观测值数量和不改变观测值精度的前提下,提高了稳健估计方法消除或减弱粗差的能力。多元线性回归的系数求解通常是用最小二乘法,但在实际应用还会出现另外一种情况,即自变量间存在多重共线关系(multicolinearity),常会影响参数估计。在这方面,针对多种实际问题,Hore, Massy, Webster, Stein分别提出了回归系数的岭估计,主成分估计,压缩估计,特征根估计,以减弱多重共线的影响。本文针对此问题也总结了常用的诊断和消除多重共线性影响的方法。常见的多重共线性的诊断方法主要有:容许值,方差膨胀因子,特征根等。消除方法主要有:岭回归,主成分回归分析,偏最小二乘估计等。

【Abstract】 Multiple linear regression is commonly used mathematical method in establishing geographic analysis models. Statisticians said that the probability of appearing gross error is about1%~10%in the production practice and the collected data of scientific experiments. In order to eliminate or weaken the effects of gross errors on parameter estimation, G.E.P.BOX proposed the concept of robust estimation in1953. Robust estimation theory is based on the actual distribution, rather than the ideal distribution, of data. Appropriate methods are adopted to ensure that the estimated values of parameters are unaffected by unavoidable gross errors. Optimum estimated values are targeted under the normal mode. Robust multiple linear regression can efficiently eliminate or weaken the influence of gross errors on parameter estimation when gross errors exist in observations inevitably. The extents of gross errors eliminated by robust multiple linear regression are different with robust estimation methods themselves and different observations of specific problem. The current paper compares the capability of commonly used robust estimation methods in eliminating or weakening gross errors through simulation experiments. This paper confirms extent of gross errors eliminated (EGEE) by robust estimation methods for dealing with multiple linear regressions, as well as the minimum number of observations needed to eliminate gross errors in certain ranges completely.This paper presents a new approach to determine EGEE by robust estimation method and specific calculation method. Taking multiple linear regression (2-5) as examples, this current paper uses simulation experiments (1000times) to compare13frequently used robust estimation methods. Several additional efficient robust estimation methods are confirmed for dealing with multiple linear regressions. Finally, the minimum number of observations needed for eliminating completely gross errors (3.0-8.0σ0) is also confirmed. In summary, the L1method, German-McClure method, IGGIII scheme, and Danish method are comparatively more effective methods among the14robust estimation methods. When the observations contain one gross error, the minimum observed numbers of the binary, ternary, quaternary, and five-element linear regressions that fully eliminate the influence of gross errors (3.0-8.0σ0) are7,8,10, and11, respectively. When the observations contain two gross errors simultaneously, the minimum observed numbers of binary, ternary, quaternary, and five-element linear regressions that fully eliminate the influence of gross errors (3.0-8.0σ0) are10,12,15, and17, respectively.Simple linear regression is one of the most widely used methods of parameter estimation. This paper proposes a bidirectional golden section based on independent variables according to arithmetical progression, which increases the redundant observations of the observations at both endpoints and narrows the difference of redundant observations among the observations. Under the premise of not increasing the number of observations and changing observation accuracy, this method improves the capability of robust estimate method in eliminating or weakening gross errors.Usually, the solution for multiple linear regression coefficient solution is the least square method, but in actual application still appearing another case, the phenomenon of mult ico linearity among variables often seriously influences the parameter estimation. In this respect, for a variety of practical problems, Hore, Massy, Webster, and Stein introduced Ridge regression, Principal Component Regression, Shrinkage estimator, and Robust latent root estimator of regression coefficients, respectively, to weaken the effects of gross errors on multicolinearity. According to this problem, the paper summed up the commonly used diagnoses and methods of eliminating the influence of multicolinearity. Common diagnoses of multicolinearity mainly are Latent root, Variance inflation factor, Tolerance value, etc. Eliminating methods mainly are Ridge Regression, Principal Component Regression, Partial least squares estimate, and so on.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络