节点文献

基于广义线性模型的森林植物多样性估测的研究

Estimating Forest Plant-diversity Based on Generalized Linear Models

【作者】 李树生

【导师】 郎奎建;

【作者基本信息】 东北林业大学 , 森林经理学, 2008, 博士

【摘要】 本文在系统分析国内外生物多样性理论发展的基础上,总结了具有代表性的生物多样性度量模型和预测模型,评述了生物多样性的概念和多种生物多样性指数的适用性,并详细地论述了广义线性模型建模的基本原理以及实现的方法,结合我国东北林区的实际情况,构造了关于森林植物多样性的联立方程组模型。以下是本文的一些主要观点和结论:首先,探索生物物种的丰富度和多样性是近几十年来生态学的核心问题,生物多样性的度量,仍将是今后生物多样性研究中一个非常重要的研究领域。由于生物多样性的定义有多达12种以上的不同解释,产生了众多的概念和评价方法,这势必造成诸多混乱和不便。从研究的实际出发,一个好的定义,不仅要包含物种数目,而且还要包括绝对多度或相对多度,加强在这方面的理解对今后生物多样性的度量和估测尤为必要。其次,本文综述了生物多样性及其度量方面的大量研究资料,经综合分析认为,生物多样性研究一般分为四个层次或水平。其中包括基因多样性、物种多样性、生态系统多样性和景观多样性。在尺度研究上,多样性指数可分为三个大类:α多样性指数、β多样性指数和γ多样性指数。α多样性指数是指一定区域内生物群落的物种丰富度,是以物种数目的多寡进行测量的。本文主要集中研究生物物种多样性,它通常用作一个群落的多样性指数,有着丰富的内涵,其中包括物种丰富度、均匀度、优势度、稀有度、相对物种多度等。一定区域生物群落多样性的测定是从物种丰富度开始,这也构成了许多生态模型的研究基础。第三,本研究数据来自我国东北林区几个林业局调查的标准地资料。为了模型的有效性验证(本文采用交叉验证),把一个大的样本数据集分成几个数据子集:①一个数据子集(205块)用作数据分析与建模,②其它的子集(98+64=162块)被保留下来,称为验证或检验子集,随后作为模型的有效性验证和检验用。调查因子主要包括优势树种或树种组、起源、胸径、树高及株数,此外还在每个标准地中又设置了1个5m×5m样方调查灌木层、4个2m×2m样方调查草本层。本文用不同的方法对森林群落水平的调查数据进行了研究。用各种多样性指数公式计算了主要群落内乔木层、灌木层和草本层的物种多样性,特别地分析了相对物种多度分布。在相对多度研究上选用了九种概率分布模型,这九种概率密度分布函数依次为:贝塔分布、Weibull分布、对数正态分布、泊松分布、二项分布、负二项分布、几何分布、对数分布和奈曼A型分布,并进行了严格的卡方检验,结果表明:其它八种分布均被遭到拒绝,只有贝塔分布获得了通过,且拟合的结果非常理想。因此在这套数据集中用它来描述森林群落的相对多度非常合适。第四,本文定义了拟最大物种数的概念。估计物种丰富度是生物多样性研究领域中的一个基本目标,由于物种丰富度作为对生物多样性十分敏感的指数,它通常是总体物种数的偏小估计,从而造成生物多样性指数是一个偏序集。在一个有限的样本中,如何能够估计最大物种数,这是一个有待解决的问题。在一些可获得的方法中,刀切法和自助估计法可以弥补物种数估计偏小的问题,原因在于刀切法和自助估计法比其它估计方法在估计变量时能够最大程度地减少偏差。此法利用模拟和再抽样技术可获得最接近总体的物种数。最后,也是本文的重点,主要集中在生物多样性广义线性模型的建模研究上。虽然回归分析及模型预测技术在生物多样性度量模型中获得了越来越多的应用,但是系统性地应用新的统计技术仍十分有限。应用现代回归分析方法研究物种或群落的生物多样性模型是非常有益的。以下是模型实现的一般过程。广义线性模型是线性模型的扩展,对数据的要求不必局限于服从正态分布,同时放松了对“方差一致性”的要求,这在传统的一般线性模型假设检验中是必不可少的,而在广义线性模型假设检验中,响应变量也可以服从其它分布(如:泊松分布,二项分布及多项分布等)。例如:一些不服从正态分布的计算数据却在广义线性模型的背景下,作为服从泊松分布的随机变量,却能够获得合适的分析结果。本文通过实例由浅入深地描述了森林群落多样性广义线性模型的建模过程。在似乎不相关模型的基础上,分析了森林植物多样性模型方程中相互依赖关系,引入三个内生变量Y1(乔木层多样性指数)、Y2(灌木层多样性指数)、Y3(草本层多样性指数),同时引入三个外生变量X1(优势林分)、X2(起源)和X3(龄组),则联立方程组模型用矩阵方程表示为:YB+xΓ二ε(其中,B和Γ是两个结构参数矩阵,ε是一个不可观察的残差矩阵)。根据方程中隐含的参数间约束条件,生成线性限制方程HA=L,这是多样性联立方程组模型求解必不可少的条件。虽然,普通最小二乘法(OLS)在单个方程参数的估计中被广泛应用,但是,如果用普通最小二乘法单独去估计每一个方程的参数而不考虑方程组中的其他方程时,势必忽略方程组间的相关关系,用此方法估计的参数不但有偏,而且是非一致的。在大多数情况下,生物多样性指数联立方程组模型间存在着相互依存关系,结构方程中的误差项也存在着依赖关系,这就要求探索其它的途径,选择合适的估计方法对联立方程组模型参数进行估计,比较实用的有广义最小二乘法、二步最小二乘法和三步最小二乘法。特别是,三步最小二乘法综合了各种限制因子和结构方程估计的优点,或将成为一种很有前途的估计方法。最终通过参数求解获得了森林植物多样性联立方程组模型。经方差分析的结果表明:该模型的自变量优势林分、起源、龄组的作用均显著。

【Abstract】 Based on systematic analysis of the development of theory of biodiversity in internal and external, this paper summed up biodiversity measurement models and estimating models which are representative, discussed the concept of biodiversity and the application of several biodiversity indices, also, discussed the basic principle of the Generalized Linear Models (GLM) and the implementing method,. In the end, through combing the theory with the situation of northeast forest area of China, the simultaneous equation models for forest biodiversity were designed and constructed based on GLM. The main idea and conclusions were as follow.Firstly, Searching for abundance and diversity of species has been in the heart of ecology for decades, and the problem of biodiversity measurement still continues to be a core area of biodiversity research. If one wants to measure "biodiversity" there are no less than 12 different definitions to choose from, all of them different. The major problem with biodiversity in general is that most people interpret the concept differently. It is far too vague and generalized to be of any practical use in a situation and needs to be better defined. From a practical viewpoint, the definitions of biodiversity should involve a formula that includes not only the number of species in a place, but the abundances (absolute or relative) of those species, as well. Increasing this understanding is important, because it is necessary for biological diversity measurement in the distribution, abundance and diversity of forest species before biodiversity measurement can be rigorously estimated.Secondly, I put forward a critical review on a set of biodiversity and biodiversity measurement. The main conclusion is that biodiversity is generally recognized on four levels: (a) Genetic diversity; (b) Species diversity; (c) Ecosystem diversity and (d) landscape diversity; The measurement of biodiversity on scale involves three aspects: alpha diversity, beta diversity and gamma diversity, this paper focuses on the Alpha diversity and Species diversity, Alpha diversity (a-diversity) is species richness within a particular area, community or ecosystem, and is measured by counting the number of species. Species diversity is usually used as an index of community, and is incorporated by species richness, evenness, dominance, species rarity, relative species abundance, etc. Species richness is a fundamental measurement of forest community and regional diversity, underlies many ecological modelsThirdly, Forest community-level survey data is analyzed in this paper in different ways, which focus on biodiversity index modeling. The data is derived from at several forest bureaus in the Northeast forest area of China (Heilongjiang province and Jilin province). In order to validate the models and the initial analysis, this paper adopts the Cross-validation method, which is the statistical practice of partitioning a sample of data into subsets such that the analysis is initially performed on a single subset, while the other subset(s) are retained for subsequent use in confirming and validating the initial analysis. The initial subset of data is called the training set (205 field plots); the other subset(s) are called validation or testing sets (98 plus 64 field plots), totally 367 field plots.In each field plot, The dominant stands (or dominant tree species groups), the age class, the DBH, the height of tree, and the number of trees were measured, also in each field plot, there are one sample plot (5m×5m) for shrub investigation, and four small sample plots (2m×2m) for herb investigation. I have calculated the species diversity for 3 layers (i.e. tree layer, shrub layer and herb layer) by means of various biodiversity index formulas and analyzed the relative species abundance using 9 models of the probability density distribution functions, such as, (3 Distribution (or Beta Distribution), Weibull Distribution, Lognormal Distribution, Poisson Distribution, Binomial Distribution, Negative Binomial Distribution ,Geometric Distribution, etc.. chi-square analyses were conducted on species distribution by using the chi-square test formulated by Pearson to test which distribution function is better, the result of chi square test made it possible to reject the other 8 distribution functions, theβdistribution function performs better than other probability density functions, it has a very close approximation, which can be used for the description of relative abundances of species in forest communities in this data set.Fourthly, this paper defined the concept of the quasi maximum number of species. Estimating species richness (i.e., the actual number of species present in a given area) is a basic objective of biodiversity studies field. Because the species richness is a kind of sensitive index for biodiversity and usually systematic underestimation (bias) for the number of species in a population, the index of biodiversity is a biased data set. The basic problem is that from a limited sample, how to estimate the maximum number of species that would be found in a complete survey, but that have not yet been observed. Of the available methods, the Jackknife and Bootstrap estimation can be used to compensate for the underestimation associated with simple richness estimation (or the sum of species counted in a sample). The reason is that in general Jackknife and Bootstrap estimation methods will provide estimators with less bias and variance at maximum than the other estimation methods. These methods through using the estimators with simulation and resampling technique can provide a maximum number of species, which is close to the population, in measurement of biodiversity.Finally, this paper primarily focuses on the model-building of generalized linear models in biodiversity index. Regression analyses and Predictive models are increasingly used in biodiversity measurement. However, systematic applications of novel statistical techniques are still limited. The modern regression approaches are particularly useful for the modeling of the biodiversity of species and communities. The procedure is as follows.Generalized Linear Models (GLM) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the Normal distribution, such as the Poisson, Binomial, Multinomial, and etc.. Generalized Linear Models also relax the requirement of equality or constancy of variances that is required for hypothesis tests in traditional linear models. Hypothesis tests applied to the Generalized Linear Model do not require normality of the response variable, nor do they require homogeneity of variances. Hence, Generalized Linear Models can be used when response variables follow distributions other than the Normal distribution, and when variances are not constant. For example, the count data can often be poorly represented by classical Gaussian distributions (i.e. Normal distribution) but the data would be appropriately analyzed as a poisson random variable within the context of the Generalized Linear Model.This paper demonstrates the GLM model-building procedures step-by-step for the biodiversity of forest community in the northeast of china. Based on seemingly unrelated model, this paper analyzed the equation’s relationship among forest plant-diversity index models, and introduced 3 endogenous variables: Y1 (Shannon’ biodiversity index of tree layer), Y2 (Shannon’ biodiversity index of shrub layer), Y3 (Shannon’ biodiversity index of herb layer), also 3 exogenous variables, X1 (dominant stands); X2 (Origin of forest) and X3 (age class group or cohort). The linear simultaneous equation model can be represented by the matrix equation, which is described as follow YB + xΓ=ε(where B andΓare two structural parameter matrices,εis a matrix of unobserved residuals). Y1, Y2, and Y3 as the independent variables can be put to another equation to construct simultaneous equation models, according to the latent relation between structural parameter matrices, the linear constrained equation HA = L is need This is very important and necessary constraint condition for obtaining estimates of the parameters of the structural equation. Generally, ordinary least squares method (OLS) is widely used for estimating a single equation, if estimating the parameters of the simultaneous equation model was done separately by OLS, it would be ignored the relationship between the equations, and should be yield biased and inconsistent estimates. In most cases, the species diversity index models are interdependent simultaneous-equation models, i.e., models which incorporate mutual causation, and there is dependence between the errors of the structural equations. Appropriate estimation techniques are usually required in this situation. The other techniques which can be used include generalized least squares (GLS), 2-stage least squares (2SLS) and 3-stage least squares (3SLS), 3SLS (combining constrained factor analysis and structural equation estimation) may be very useful. In the end, The final form of simultaneous equation models for forest plant-diversity was as follows,According to the results of variance analysis, there are significant effects among dominant stands, origin of forest and age class group.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络