

Some Research on Regression Model with Censored Data

【作者】 陈中威

【导师】 卢一强;

【作者基本信息】 解放军信息工程大学 , 应用数学, 2010, 硕士

【摘要】 回归分析是处理变量之间相关关系的一种统计方法。理论上,回归函数一般都是未知的,回归分析就是根据回归变量和响应变量的取值对回归函数进行估计和推断,所用的方法很大程度上取决于回归模型的假定。其中应用最为广泛,相关理论最为丰富的是线性回归模型,但实际应用中线性回归模型的假设往往不能满足,常常需要使用非参数回归模型。随着非参数回归模型的出现,也衍生出了许多新的数据模型,如半参数模型、可加模型、变系数模型等。完全数据下的回归理论已经较为成熟,而在生物科学、临床实验、质量控制等领域常常会遇到各种形式的删失数据,本文旨在对删失场合下具有不同随机误差的线性回归模型、非参数模型以及半参数模型的估计理论进行研究。主要的研究成果如下:1.研究了协变量区间删失场合下线性回归模型的估计。提出了两种估计方法:一、在协变量分布参数的极大似然估计的基础上,构造出了区间协变量的条件均值,以条件均值代替真实值,利用最小二乘的方法得到线性回归参数的估计,并证明了估计的渐近无偏性和强相合性。二、提出了一个两步迭代估计方法。当回归参数已知时,通过最大化区间协变量的似然函数得到协变量分布参数的估计;当分布参数已知时,构造出区间协变量的条件期望,进而得到回归参数的最小二乘估计,数值模拟结果表明该估计方法效果良好。2.在因变量区间删失场合下,研究得到了非参数回归模型的局部线性估计。通过分析删失变量与真实值的误差对局部线性估计的结果产生的影响,对删失数据进行修正,然后在修正值的基础上对非参数的回归函数进行局部线性估计,数值模拟结果表明该估计效果令人满意。3.在因变量随机删失下,研究了半参数回归模型的估计。将基于经验分布函数的最近邻估计应用于半参数模型中非参数部分的估计,利用两阶段估计的思想分别得到了参数和非参数部分的估计。在较弱的假设条件下证明了参数部分的渐近正态性和非参数部分的强相合性。4.在随机误差为NA序列时,对随机删失场合下的半参数模型进行了研究。利用NA序列和式的收敛性质,得到了参数估计的o(n~-1/4)收敛速度和非参数部分的强相合性,并进一步证明了参数部分估计的渐近正态性。

【Abstract】 Regression analysis is a statistical method of dealing with statistical correlations of some variables. In theory, regression functions are usually unknown, and regression analysis is to estimate regression function according to the value of covariate and response variable. The choosing of regression method largely depends on the assumption of regression model. Linear regression model is the one which is oldest, mostly applied and has plenty of related mature theory. But we usually have to use nonparametric regression model because the assumption of linear regression model cannot be satisfied in actual application. Then there appears many new models, such as semiparametric model, generalized linear model, vary-coefficient model and so on.Regression theory under complete data is mature, but it always appears censored data in some fields, such as biology science,clinial experiment and quality control. This paper is to do some research on the estimation of linear regression model, nonparametric model and semiparametric model under different kinds of errors . To sum up, the works and innovations of this thesis could be summarized as follows:1. We use two methods to estimate linear regression parameters under interval -censored covariate. Firstly, baesd on the maximum estimator of the parameter in distribution of covariate, we construct the interval conditional mean of the censored covariates. Then the estimators of regression parameters are obtained and the asymptotic unbiasness and consistency of our proposed estimators are proved. Secondly, we propose a two-step iterative algorithm: When regression parameters are given, parameters in the distribution of covariates are obtained to maximize the likelihood function of interval-censored covariates; when distribution parameters are given, regression parameters are obtained based on conditional mean of covariates using least square method. Simulation shows the performance is good.2. We propose local linear estimators of nonparametric regression model under interval-censored covariate. We analyze the influence on the estimator by the deviation between censored data and the real value, then amend the censored data . Based on the amended data, we get the final local linear estimators. The good performance of our proposed estimators is illustrated by some simulation examples.3. For semiparametric regression model with randomly censored data, we propose the estimators of parametric and nonparametric part by the use of the nearest neighbor method based on distribution function and least squares methods, using the idea of two-step estimation. The asymptotic normality of parametric part and the strong consistency of nonparametric part are proved under some weak conditions.4. For semiparametric regression model with censored data under NA errors, we propose the estimators of parametric and nonparametric part. By the use of convergence property of sums of NA sequences, the o(n~-1/4) convergence rate of parametric part and the strong consistency of nonparametric part are proved. We also obtain the asymptotic normality of parametric part.
