节点文献

两类统计推断问题

On Two Types of Statistical Inference

【作者】 张彩伢

【导师】 林正炎;

【作者基本信息】 浙江大学 , 概率统计, 2008, 博士

【摘要】 本文主要对两类统计推断方法进行了研究.一类是有关随机偏微分方程的参数估计,另一类是有关分布的非参数检验.众所周知,偏微分方程可用于随时间和空间变化的复杂系统的建摸.许多领域中的数学模型都可以用偏微分方程来描述,而实际上,大量现象都是随机现象,因此,自然地将随机分析的方法引入到偏微分方程中,便有了随机偏微分方程(Stochasticpartial differential equations)(SPDE).人们利用SPDE建立有关利率,界面动力学,神经心理学和湍流学等随机模型时,不断地取得了显著的成效,显示了SPDE对于人类认识自然现象和社会现象基本规律的重要性.有关SPDE的概率理论的研究始于上世纪70年代,包括不同类型方程的解的存在性,唯一性以及解的性质等问题,Itǒ(1984),Rozovskii(1990)和Da Prato&Zabczyk(1992)等作了较详细的讨论.有关SPDE的统计推断研究,包括方程的参数估计问题等,则始于上世纪90年代初.Hüebner等(1993)最先讨论了两类特殊的SPDE的参数估计问题,后来,Hüebner&Rozovskii(1995)进一步将结果推广到以下抛物型SPDE其中A0,A1分别是满足一定条件的偏微分算子,θ是未知参数,WQ(t,x)是带有协方差算子Q的Wiener过程.Prakasa Rao(2000,2003)则对下列两种具有特殊偏微分算子形式的SPDE模型作了讨论和其中△=(?)2/(?)x2,b(θ)是形式已知的未知参数θ的函数,ε为干扰项.显然,模型(0.0.2),(0.0.3)所包含的参数函数形式比(0.0.1)更具有一般性.当上述方程(0.0.2)和(0.0.3)的解存在时,可以定义为如下形式其中{ei(x),i≥1}是算子Q的正交系,{Ui(t),i≥1)是U(t,x)的Fourier系数.事实上,对于SPDE的参数估计问题,可以从不同的角度进行讨论.按样本资料的性质,可以基于连续样本轨道,也可以基于离散样本观测点来给出参数的估计量;根据参数估计的方法,则有极大似然估计和贝叶斯估计等;有关估计量的渐近性,包括相合性和渐近正态性的讨论,根据实际问题的需要,可以从不同的情形进行研究.例如,Hüebner&Rozovskii(1995)是基于Fourier系数{Ui(t),i=1,…,N}在给定的时间区间[0,T]上的连续样本轨道对方程(0.0.1)中的参数θ给出了极大似然估计,并研究了当Fourier系数的个数N↑∞时估计量的渐近性.而PrakasaRao(2000,2003)则是基于Fourier系数Ui(t)在给定时间区间[0,t]上的离散观测{Ui(tj),tj=j△,0≤j≤n,i=1,…,N},对方程(0.0.2)和(0.0.3)中的参数θ给出了极大似然估计和贝叶斯估计,并研究了当离散观测点数n↑∞时估计量的渐近性.在本文的第一章,我们利用Fourier系数Ui(t),i=1,…,N在给定时间区间[0,T]上的连续样本轨道,对方程(0.0.2)和(0.0.3)中的参数分别给出了极大似然估计.在本章第二节,我们考虑一种新的情形,即当小干扰项ε→0时估计量的渐近性质.Prakasa Rao(2000,2003)在要求方程(0.0.2)中b(θ)<0和方程(0.0.3)中b(θ)>0的条件下得到了参数估计量的弱相合性和渐近正态性.我们除去了函数b(θ)满足符号条件的要求,证明了估计量的强相合性和渐近正态性.在本章的第三节,我们还对模型(0.0.3)的估计量性质作了进一步讨论.证明了当Fourier系数的个数N↑∞时参数估计量的强相合性和渐近正态性.本文研究的另一类统计推断问题是有关分布的非参数检验,包括拟合优度检验和比较多个多元总体分布的检验.有关这些课题的研究,无论是在理论上还是实际应用中都具有非常重要的意义.统计中的许多方法都依赖于正态性的假定.有关正态分布的拟合检验,已有大量的文献做了研究.本文第二章考虑的是一般分布的拟合检验.具体提法如下:设X1,…,Xn是一简单随机样本,有共同分布F(x),x∈Rd,检验假设问题其中F0(x)是形式已知或最多含有几个未知参数的分布函数.对于一元情形的一般拟合优度检验的研究,始于1900年Pearson提出的χ2检验,后来发展了许多方法,其中的理论都已比较成熟,并得到了广泛的应用.在本章第二节,我们将对各种类型的一元分布拟合优度检验作综合介绍.对于多元情形的一般拟合优度检验问题(0.0.5),直到最近的二,三十年才开始有文献讨论.Shorack&Wellner(1986)和Einmahl&Mason(1992)分别根据经验过程理论对(0.0.5)给出了检验;Justel等(1997)利用Rosenblatt(1952)的概率积分变换(Probability Integral Transfom)将一元情形基于经验分布函数(EDF)的Kolmogrov-Sminorv检验推广到了多元情形.在一元情形的拟合优度检验中,还有一类非常重要的检验,那就是Greenwood(1946),Weiss(1959),Hall(1986)和Balakrishnan&Kamnan(2004)等给出的基于一元间隔(Spacings)的各种检验.由于多元间隔(Multivariatespacings)概念一直没有给出明确的定义,这类基于间隔的检验在多元情形未曾得到推广.最近,Li&Liu(2007)利用统计深度函数(Statistical depth function)给出了多元间隔的定义,并利用多元间隔讨论了多元容许域(Tolerance region)的构造,但没有讨论有关多元拟合优度检验问题.研究表明,统计深度函数和连接函数已经成为多元分析中非常重要的工具.在本章第三节,我们借助统计深度函数,不仅将一元情形基于经验分布函数(EDF)的各种检验,包括Kolmogrov-Sminorv检和Cramer-von Mises型检验,都推广到了多元情形,同时给出了多元χ2拟合优度检验,并将Li&Liu(2007)提出的多元间隔概念应用到了多元拟合优度检验中.这些检验都是具有仿射不变性(Affine invariant)的非参数检验.同Justel等(1997)给出的检验统计量比较,我们给出的统计量更便于计算.在这一节,我们还讨论了如何将多元分布拟合检验问题转化为有关连接函数的检验.第四节对部分检验作了随机模拟,结果表明,检验具有良好的功效.在第三章,我们继续利用统计深度函数这一重要工具,对多个多元总体的分布比较问题进行研究.众所周知,当各总体都满足多元正态分布时,可以利用似然比检验的思想给出检验统计量.而当正态性不满足时,我们必须考虑非参数检验方法.关于多元两样本的位置参数检验,已有大量的文献作了研究,具体可参见Puri&Sen(1971),Brown&Hettmanperger(1987),Randles&Peters(1990),Choi&Marden(1997),Topchii等(2003)等学者的研究论文.对于多元多样本位置参数的检验,Puri&Sen(1971),Hettmansperger&Oja(1994),Hettmansperger等(1998)和Um&Randels(1998)等曾给出过符号检验,秩检验及中位数检验.考虑下面两类更具有一般性的问题:(1)已知各总体都属于同一位置-形状分布族(Location-scale distribution family),需检验相互之间的位置参数和尺度参数之间是否有显著差异;(2)各总体的分布形状未知,需检验各总体的分布函数之间是否有显著差异.至今,对这些课题研究的文献不多.在多元两样本情形下,Rousson(2002)和Liu&Singh(1993),Zuo&He(2006)分别对问题(1),(2)作过研究.在本文的第三章,我们考虑更具有一般性的多元多样本情形,利用统计深度函数对问题(1),(2)提出了三种非参数检验方法.设总体Xi是概率空间(Ω,F,P)上的d维随机向量,且有未知的连续分布函数Fi(x),x∈Rd,i=1,…,k.Xi={X1i,…,Xnii}是取自于总体Xi的简单随机样本,i=1,…,k,并设不同样本之间相互独立.考虑假设问题H0:F1(x)=F2(x)=…=Fk(x)=F(x)对(?)x∈Rd成立, (0.0.6)若已知各总体的分布函数形式相同,则问题(1)对应的备择假设等价于下列位置-尺度模型其中θ1,…,θk是Rd上的d维向量,且至少存在1≤i≤k,使得θi≠0=(0,0,…,0)’;σ1,…,σk是Rd上的d维向量,且至少存在1≤i≤k,使得σi≠1=(1,1,…,1)’,x/σi=(x1/σi1,…,xd/σid).若各总体的分布函数形状未知,则问题(2)对应的备择假设为H1:存在i≠j和x∈Rd,使得Fi(x)≠Fj(x).(0.0.8)Rousson(2002)利用投影和深度函数,将一元两样本问题的Wilcoxon检验和1967年Mardia提出的有关二元两样本问题的非参数检验推广到了多元两样本情形,对两个多元位置-尺度模型给出了检验.在第三章第二节,我们将借鉴Rousson(2002)的思想,把一元多样本问题的Kruskal-Wallis检验和1970年Mardia提出的有关二元多样本问题的非参数检验推广到多元多样本情形,对模型(0.0.7)给出检验.Liu&Singh(1993)基于数据深度定义了品质指数(Quality index)用它来衡量两个多元分布函数F(x)和G(x)差异大小,其中D(x;F)是x点关于分布函数F的深度.当D(·;·)具有连续的分布函数,且F(x)=G(x)时,有Q(F,G)=1/2.当F(x)和G(x)未知时,取Q(F,G)的估计量为Q(Fm,Gn),其中Fm和Gn分别是对应F(x)和G(x)的样本经验分布函数.当深度函数D(·;·)为马氏深度(MhD),且F(x)=G(x)成立时,Liu&Singh(1993)证明了统计量Q(Fm,Gn)的渐近正态性,有下列结论((1/m+1/n)/12)-1(Q(Fm,Gn)-1/2)(?)N(0,1),当min(m,n)→∞时.(0.0.9)并猜想对一般的深度函数也有该结果.Zuo&He(2006)称统计量Q(Fm,Gn)为Liu-Singh统计量,在深度函数满足一定的正则条件下,证明了Liu&Singh(1993)的猜想,并给出了当F≠G时,Q(Fm,Gn)的渐近分布.在第三章第三节,我们将基于Liu-Singh统计量对模型(0.0.8)提出新的检验,该检验具有仿射不变性.首先,利用Liu&Singh(1993)所给出的品质指数,定义一个新的参数来衡量k个多元分布函数F1(x),…,Fk(x)之间的差异程度.显然,当k个分布函数都相同,即原假设(0.0.6)成立时,λ=0.因此,对于一般的多元多样本问题,即模型(0.0.8),我们自然可取检验统计量为其中Fni是样本Xi={Xii,…,Xnii}的经验分布,Qij=Q(Fni,Fnj),i,j=1,…,k.当λn取值偏大时,我们拒绝原假设(0.0.6).在Zuo&He(2006)所给出的正则条件下,且原假设(0.0.6)成立时,我们证明了检验统计量λn有下面的渐近分布其中W是r个独立的X2(1)分布的随机变量的线性组合.特别地,当各个样本容量相等,即n1=n2=…=nk=:m时,有最后一节,我们通过随机模拟对本章所给出的三种检验的功效作相互比较.结果表明,广义的Kruskal-Wallis检验,最适合于含有位置参数变动,而尺度参数保持不变或变动相对偏小时的多元多样本问题的检验;当模型中含有尺度参数变动,位置参数保持不变或变动相对偏小时,本章提出的基于Liu-Singh统计量的检验是三种检验方法中表现最佳的,特别是当样本取自于重尾分布时,具有很强的功效.随着样本容量的增大,广义Mardia检验的功效得到显著的提高.本文收录了作者三年来所撰写的部分论文,参见文中的附表.最后,限于作者水平有限,文中难免会有不当或谬误之处,敬请诸位不吝批评和指正.

【Abstract】 Two types of statistical inferences are studied in this thesis, one is about parametric estimator for some stochastic partial differential equations (SPDE) and the other is about nonparametric tests for distribution function.It is well known that the partial differential equations (PDE) are often used to model complex systems that evolve over time and space. Lots of mathematical models can be performed by (PDE) in many areas. In fact, a great deal of phenomenan are random, the stochastic analysis methods are naturally used to PDE, and so SPDE appear. When SPDE are used in the study of interest rate modeling, interface dynamics, neuronal behavior in neurophysiology and in building stochastic models of turbulence etc, many problems or research resulits have been solved or improved, which shows the importance of SPDE in percieving the essence and law of natrure and social phenomenan.The research about SPDE dates back to the 70’s of last century. The probabilistic theory for SPDE including the existence, uniquness and properties of the solutions to different types of SPDE etc, was discussed in Ito (1984), Rozovskii (1990) and Da Prato and Zabczyk (1992) among others. In addition, the research about the statistical inference including parameter estimation for SPDE began in the 90’s of last century. Huebner et al. (1993) started the investigation of the maximum likelihood estimation (MLE) of parameters in two types of SPDE. Later on, Huebner and Rozovskii (1995) extended the results to the following class of parabolic SPDEwhere A0 and A1 are partial differential operators, Q is the covariance operator of the Wiener process WQ(t,x).Prakasa Rao (2000, 2003) investigated the parameter estimation for the following models under another casewhere△=(?)2/(?)x2, b(θ) is a known function on the parameter spaceθ,εis a noise. When the solutions to the equations (0.0.13) and (0.0.14) exist, they can be defined by the following formwhere Ui(t), i = 1,…are Fourier coefficients of U(t, x) ,{ei(x), i = 1,2,…} is a complete orthonormal system with the eigen values {qi} for the operator Q in L2[0,1].In fact, the parameter estimations for the same type of SPDE can be investigated from various aspects. According to the sample properties, we can propose the parameterestimator based on the continuous sample or the discretic observations of the Fourier coefficients. From the methods of paramter estimation, there are maximum likelihood estimation (MLE) and Bayes estimation. Furthermore, the asymptotic properties of the estimators can be studied under different cases. For example, Huebner and Rozovskii (1995) proposed the MLE of the parameterθin the equation (0.0.15) based on the continuoussample path on the time interval [0, T] of the Fourier coefficients Ui(t),i = 1,….They studied the asymptotic properties of the estimator as the number N of Fourier coefficients tends to∞. Howerver, Prakasa Rao (2000, 2003) proposed the MLE and the Bayes estimator of the parameterθ, based on the Fourier coefficients U(t) observed at discrete times tj = j△, 0≤j≤n, he studied the asymptotic properties of the estimators as the number n of discrete observations tends to infinity.In Chapter 1, based on the continuous sample on the given time interval [0, T] of the Fourier coefficients Ui(t),i = 1,…, N, we shall give the MLE of the parameters in the equations (0.0.13) and (0.0.14). In Section 1.2, we study the asymptotic properties of the estimator under a new case when the small noiseε→0. Under the condition b(θ) < 0 in equation (0.0.13) and b(θ) > 0 in equation (0.0.14), Prakasa Rao (2000, 2003) got the weak convergence and asymptotic normality of the estimators, respectively. Without these conditions, we estiblash the strong convergence and the asymptotic normality of the estimator. In Section 1.3, we get the same asymptotic properties of the estimator as the number N of Fourier coefficients tends to∞.The other type of statistical inference concerned with the nonparametric tests for the distribution, including the goodness-of-fit tests and the multivariate multi-sample tests. These two tasks are of great significance in theory and application.Many statistical methods usually hinge on the assumption of normality. A large number of test statistics are available for this purpose In Chapter 2, we will investigate the general goodness-of-fit tests. The problem could be expressed as follows: assume that {X1,…,Xn} is a random sample from a d-variate continuous population with unknown distribution function F(x), x∈Rd, how do we test the null hypothesiswhere F0(x) is some specified distribution function but possibly with some unknown parameters?Since Pearson proposedχ2 test in 1900, a great deal of methods about the univariate goodness-of-fit test have been developed and widely applied. We will introduce all kinds of univariate goodness-of-fit tests synthetically in Section 2.2.Only twenty to thirty years ago, did the investigation about the general multivariate goodness-of-fit test appeared. A general approach based on empirical process theory can be found in Shorack and Wellner (1986) and Einmahl and Mason (1992). Justel (1997) generalizedthe Kolmogrov-Sminorv test which is based on the empirical distribution function to the multivariate case, by using the probability integral transform given by Rosenblatt (1952). Among the univariate goodness-of-fit tests, the tests based on the univariate spacings,studied by Greenwood (1946), Weiss (1959), Hall(1984,1986) and Balakrishnan and Kamnan (2004) et al., is an important type of methods. However, the concept of multivariatespacings hasn’t been proposed until Li and Liu (2007) gave the definition of this concept with the depth function. The multivariate tolerance region based on the multivariatespacings was investigated, but the multivariate goodness-of-fit test has not been discussed yet.It has been proved that statistical depth function and copula are becoming more and more important tools in nonparametric multivariate analysis. In Section 2.3, with the statistical depth function, we generalize the tests based on EDF including the KolmogrovSminorvtest and the Cramer -von Mises test to the multivariate case. In addition, we propose a multivariateχ2 test and a new test based on multivariate spacings. All the tests are affine invariant and distribution-free. The test statistics can be computed more easily than the statistic proposed by Justel (1997). Furthermore, we will use the concept of copula to study the multivariate goodness-of-fit tests in Section 2.3. In Section 2.4, we do some simulations about some tests proposed in Section 2.3. The results show that the tests are powerful.In Chapter 3, we will study the multivariate multi-sample problem with the depth function again. Under the assumption of multivariate normality distribution, we can use the likelihood test; otherwise, we have to use the nonparametric tests. A great deal of literature about multivariate two-sample location problem can be found (see, Puri and Sen (1971), Brown and Hettmanperger (1987), Randies and Peters (1990), Choi and Marden (1997) and Topchii et al (2003)). For the multivariate multi-sample-location problem, Puri and Sen (1971), Hettmansperger and Oja (1994), Hettmansperger et al. (1998) and Um and Randels (1998) proposed the sign test, rank test and median test. For the following two common problems(1) all the diatribution functions belong to the same loction-scale distribution family, the difference not only among the location parameters and also among the scale parameters need to test;(2) the distribution functions are completely unknown, the difference among them need to test,few references can be found about these subjects even now. For the multivariate twosamplecase, Rousson (2002) and Liu and Singh (1993), Zuo and He (2006) invesitgated the problem (1) and (2), respectivly. In this chapter, we will discuss the general multivariate multi-sample problem and propose three nonparametric tests.Suppose that Xi={X1i,…,Xnii}is a random sample from a d-variate continuouspopulation with unknown distribution function Fi(x),x∈Rd,i=1,…,k.These samplesare assumed to be mutually independent. We wish to test the null hypothesisThe alternative hypothese corresponding to Problem (1) is as followswhereθ1,…,θk are d-variate vectors in Rd, at least one of which does not equal 0 =(0,0,…,0)’,andσ1,…,σk are d-variate vectors in Rd, at least one of which does notequal 1 = (1,1,…, 1)’, and x/σi=(x1/σi1,…,xd/σid).The alternative hypothesiscorresponding to Problem (2) isH1: there exist i≠j such that Fi(x)≠Fj(x). (0.0.19)By using the technique of projection and concept of depth function, Rousson (2002) genaralizedthe univariate Wilcoxon test and the bivariate Mardia’s test (1967) to the multivariatetwo-sample case. In Section 3.2, we will generalize the Kruskal-Wallis test for the univariate multi-sample problem and the test proposed by Mardia (1970) for the bivariate multi-sample problem to the multivariate multi-sample case. Liu and Singh (1993) defined a quality index based on the data depth to measure the difference between two multivariate distribution functions F(x) and G(x).where D(x; F) is the depth function of the distribution function F(x) evaluated at the point x. When the distribution of D(·;·) is continuous and F(x) = G(x), Q =1/2. As F(x) and G(x) are unknown, we can take the statistic Q(Fm,Gn) as the estimator of Q(F,G), where Fm and Gn are empirical distributions of F(x) and G(x), respectively. Under the null hypothesis F = G, and the depth function D(·;·) is Mahalanobis depth, Liu and Singh (1993) proved the following resultand conjectured that the same limiting distribution holds for general depth functions. Zuo and He (2006) called Q(Fm,Gn) the Liu-Singh statistic and proved this conjecture under some regularity conditions, and generalize the result to the case of F≠G.In Section 3.3, we will propose a new test based on the Liu-Singh statistic which is affine invariant. With the quality index proposed by Liu and Singh (1993), we define a new parameterto measure the difference among k multivariate distribution functions F1(x),…,Fk(x). When the null hypothesis (0.0.17) is true,λ= 0. So, for the general multivariate multisampleproblem, we can construct the statisticto test the hypothesis (0.0.17), where Fni(x) is the empirical distribution of the i-thsample Xi={X1i,…,Xnii}and Qij=Q(Fni,Fnj), i, j = 1,…,k. Under the regularity conditions given in Zuo and He (2006) and null hypothesis (0.0.17), we establish the asymptotic distribution of the test statisticλn as followswhere W is a linear combination of r independent and identically distributedχ2(1) random variables. Particularly, as n1=n2=…=nk=:m,we haveIn Section 3.4, we investigate the power behavior of the three methods proposed above by simulation. The results of simulation show that the generalized Kruskal-Wallis test H(λ) with large value ofλperforms best for the location model and the new test Q is the most competitive among three methods for the models including scale change when the distribution is heavy-tailed. With the increase of the sample size, all the performance of the tests are improved, especially for the Mardia’s test.This dissertation is composed of some of papers written by author in the past three years. Details are attached in the appendix. Due to the limited knowledge, errors are inevitable, so your criticism would be greatly appreciated.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2009年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络