节点文献

非正态及非线性重复测量资料分析模型及其医学应用

Models of Analysis for Nonnormal or Nonlinear Repeated Measurement Datas and Application in Medicine

【作者】 罗天娥

【导师】 刘桂芬;

【作者基本信息】 山西医科大学 , 流行病与卫生统计学, 2007, 博士

【摘要】 重复测量资料是指对同一受试对象的某个或某些指标进行多次观察或测量获得的数据,在医学研究领域极为多见,观测指标的类型也多种多样,表现为定量变量,分类变量及等级变量;例如,在Ⅱ期高血压病的疗效评价中,为患者定期检测血压(包括舒张压/收缩压等)值为定量变量资料;在乳腺增生患者的治疗中,定期记录患者治疗期间的变化,检测指标为是否有改善的二分类变量;在介入治疗冠心病患者出院随访研究中,分别检查并记录出院时、出院后3月、6月和9月的疗效,不同时段结局可以是痊愈、好转、有改善、变化较小或无改变等,表现为等级分类变量;在一些情况下,记录的反应变量为计数数据,如单位时间(年或月)内癫痫发作次数。据反应变量与自变量参数之间的关系,又可分为线性重复测量资料模型和非线性重复测量资料模型;如定期监测高血压病患者的血压值,探讨患者血压与时间变量及其它解释变量间关系,可以用线性模型来拟合,称其为线性重复测量资料模型:在药动学研究中,个体口服一定剂量药物后连续采集检测血样中药物浓度,描述药物在体内吸收、分布、排泄的药动学过程,大多情况下表现为非线性特征,如Ⅱ室模型;在HⅣ病毒动力学研究中,血液里病毒粒浓度的定量分析是检测HⅣ感染者“病毒”路径,描述感染特征的一种常规手段,采用系列微分方程描述免疫细胞的繁殖、感染和凋亡以及病毒颗粒的繁殖和清除等特征的变化等,均可收集到非线性重复测量资料,要描述解释变量与反应变量参数间的非线性关系,可构建非线性重复测量资料模型。复发事件数据指同一个体在一段时间里多次经历同一事件,例如一名冠心病患者在一段时间内经历多次冠心病的发作,一名癌症患者在化、放疗后再一次次经历复发等,该资料具有重复测量和生存分析数据的特性。上面提及的资料均不满足经典线性模型分析要求的正态性和线性条件。重复测量资料线性模型理论已经成熟,应用也较普及,线性混合效应模型被视为最理想的方法。它可假定方差-协方差具有某特定结构形式,用来说明异方差性和相关性,既不像单变量分析方法那样严格,也不如多变量方差分析那样对协方差完全无约束;分析观察时点可相等或不等,能充分利用含有完全随机缺失观察值的资料,建模灵活。但对于非正态及非线性重复测量资料模型分析理论及应用目前尚处于初级阶段,有待于进一步完善相关理论,在医学研究领域推广、普及和应用。线性混合效应模型允许反应变量来自指数家族任一分布,包括离散分布(如二项分布,泊松分布等)和连续分布(正态分布,beta分布和卡方分布等),用连接函数将反应变量的均数与个体的线性预测值联系起来,构建广义线性混合效应模型(当随机效应不存在时退化为广义线性模型)和非线性混合效应模型,用来处理非正态、非独立二分类,等级多分类及计数重复测量资料。脆弱模型是用以描述子组中个体“生存”情况与时间之间关联性的一种模型,将随机效应、变量间的联系及未观测到的异质性引入到生存分析模型中,为复发事件数据高效方便的分析提供了新思路。重复测量资料类型广泛,医学应用非常多见。本文深入全面地从反应变量的类型(定性、定量、等级变量)来探讨相应的统计分析模型,并进行比较分析;从反应变量与解释变量参数之间的关系,系统探索线性模型和非线性模型。其主要内容分七部分:第一部分介绍重复测量资料的特性及其方差协方差结构。第二部分介绍线性混合效应模型重复测量资料统计分析基础理论。第三部分介绍广义估计方程(GEE)理论及其在二分类、有序多分类变量和计数重复测量资料分析中的应用。广义估计方程是边际模型估计方法的一种,是在广义线性模型和纵向数据准似然估计的基础上发展起来的一种拟似然估计方法,可用于非独立重复测量数据分析:它是在未完全指明个体观测的联合分布,仅根据(单变量)边际分布似然和个体重复测量向量的“作业”相关矩阵进行参数估计的,是一种半参数方法。即便在时间依赖协方差矩阵误指时,GEE方法也可得出一致和渐近的正确估计,当反应变量表现为非连续型变量(如二分类、等级或计数资料)时,GEE方法是常用得最适方法之一。第四部分阐述广义线性混合效应模型(GLMMs)理论及其在二分类、多分类等级变量及其计数重复测量资料分析中的应用。广义线性混合效应模型是线性混合效应模型的自然延伸,该类模型可用于解决连续型和分类变量的纵向研究问题,GLMMs是唯一具有随机效应指数分布族的回归方法,采用一个连接函数将反应变量的均数与个体的线性预测值联系起来;它可以用随机效应拟合各类型相关数据结构模型:当随机效应不存在时,广义线性混合效应模型就退化为广义线性模型。第五部分介绍非线性混合效应模型(NLMEs)理论及其在药物代谢动力学、二分类、等级变量及其计数重复测量资料中的应用。非线性混合效应模型不仅能识别与估计个体间和个体内的变异,而且也考虑了解释变量与反应变量参数的非线性关系,允许固定效应和随机效应进入模型的非线性部分:反应变量可以服从正态分布、二项分布或泊松分布;常用于处理药代动力学、非线性生长曲线研究,也可以直接拟合二分类、等级及计数重复测量资料的非线性模型;近年在工农业、环境和医学界备受关注。第六部分介绍条件脆弱模型理论及其在医学复发事件数据分析中的应用。脆弱模型是Cox比例风险模型的延伸,目的是解释由不能被观测的协变量引起的异质性,脆弱对基线风险函数有乘积效应,即以乘法算子对子组内每一个体的危险率产生影响。脆弱值大的子组比脆弱值值小的子组要在更短的时间内经历事件的发生。一般可认为同一子组内个体有相同的脆弱,因此也称为共享脆弱模型,生存时间被认为是在共享脆弱的条件下独立:脆弱被认为是服从某种分布的随机效应,常认为服从gamma分布。条件脆弱模型将解释观测异质性的随机效应和反映事件相依性的基本事件分层(变化的基线风险)联系起来,把复发事件数据过程的关键特征都包含在模型中,是复发事件数据拟合的理想模型。第七部分通过对非正态、非独立和非线性资料分析方法的介绍,进一步阐述了广义估计方程、广义线性混合效应模型和非线性混合效应模型在医学研究二分类、有序多分类、计数变量以及非线性重复测量资料,脆弱模型对复发事件数据等方面的分析,探讨了SAS软件和R软件分析方法与软件实现,提出了实际应用中有关模型构建、参数估计、软件实现等方面的建议与评价,为非正态、非独立和非线性资料分析应用提供了新思路。文中主要采用SAS9.1.3分析软件GENMOD、GLIMMIX和NLMIXED过程对医学分类及非线性重复测量资料进行了对比分析,采用免费软件R2.4.0实现了临床研究中复发事件数据的分析:运用模型理论与实例分析相结合、方法研究与软件实现相结合的思路,系统介绍了非正态、非线性重复测量资料在模型分析与软件中的应用,结合实例,摸索与总结出具体应用的技能与经验,系统阐述了非正态、非线性资料分析模型及原理,为医学资料的分析提供了方法学基础,也为理论模型与软件应用的结合提供了条件,尤其在淡化抽象的统计理论,以基于理论而又高于理论的思路,突出各种方法的实际应用方面打开了新局面,为正确运用广义估计方程、广义线性混合效应模型、非线性混合效应模型和脆弱模型提供可靠性高、准确性好、信息量大的、解决实际问题可行性强的多元统计方法提出了新观点。

【Abstract】 Repeated measurements datas(RM) refers to datas in which the response of each experimental unit or subject is observed on multiple occasions or under multiple conditions . The data is very common in medical field, and the responses may be all variety of variables, including quantitative variables, qualitative variables and ranked variables; for example ,in clinical medical study about II phrase hypertension patients,the blood pressure(systolic pressure and diastolic pressure) are measured regularly,the response is quantitative variable; in clinical curative study of galactophore hyperpasia patients,recording the curative preformance of the patients during treatment , the outcome variable is dichotomous response (improvement ,no improvement); in the trace study for interposition theraty of coronary heart disease , recording the effects at the time of leaving the hospital and three month,six month and nine month after leaving the hospital,the outcomes are recovery,mend,amend,little change and no change, the response is ordinal categorical variable; in other cases ,the response may be recorded as counts data,such as the number of falling sickness at unit time(year or month).According to the relationship of coefficeents of responde and covariates, RM data can be divided into linear and nonlinear ;for example ,regularly examine the blood pressure of hypertension patients ,the relationship of the outcome variable and time variable or other covariates can be fitted by linear model, then this is a linear RM data .But ,in pharmacokinetics analysis, serial blood samples are collected from each of several subjects following doses of a drug and assayed for drug concentration, in order to gaining insight into within-subject parmacokinetic processes of absorption, distribution and elimination. In most cases, the pharmacokinetic processes are nonlinear model. In HIV dynamics study, with the advent of assays capable of quantifying the concetration of viral particles in the blood, monitoring of such "viral load" measurements is now a routine feature of HIV-infected individuals, using a system of differential equations whose parameters characterize rates of production, infection and death of immune system cells and viral production and clearance. The above two example belong to nonlinear RM data ,the relationship of parameters between response and covariate is nonlinear. Recurrent event data means the subject experiences the same type of event more than once, such as heart attacks of coronary heart disease patients, recur of cancer patient, these data have the character of RM data and survival data. Above cases all don’t satisfy normality and linear condition of traditional linear models. We must find another statistical model to analysis.Methods of linear modeling of RM data are well developed, their applications are very popular. Linear mixed effect model among those mehods is the most ideal method. It assume the variance-covariance to have some kind of structure, which be used to recognize heterogeneous variance and within-subject collrelation. It isn’t not only more strict than univariate variance analysis, but also less strict to covariance structure than multivariante variance, can analysis balance and unbalance data ,can exploit miss data ,can model flexibly the linear model. However the theories and applications of unnormality and nonlinear model for repeated measurement data are still at the initial stage, need to be completed and popularized. An natural extention of linear mixed effect mode allow response come from an exponential family of distributions , including dispersed distribution (binomial, poisson, et all) and continuous distribution (normal ,beta and chi-square).The mean response and linear predictor are connected by link function, these is generalized linear mixed model.The types of repeated measurement datas are widespread, they are common in medical applications. Hence it is necessary to study deeply and completely all kinds of models of RM datas. The paper describe and be compared with all kinds of statistical models (linear and nonlinear models)of RM datas from two aspects ,one is different types of response (quantitative , qualitative and ordered variable),the other is the relationship of the parameters between response and covariates. The content include seven parts:First part introduce the character and variance-covariance pattern of RM data;Second part introduce the theory of linear mixed effect model;Third part introduce the theory of generalized estimate equation and applications in binary,ordered and count data:The generalized estimating equations(GEE) methodology for the analysis of RM is a marginal model approach, The GEE approach is an extension of the generalized linear model and of quasilikelihood to longitudinal data analysis, it ia a regression model for correlated repeated measurement data. The method is semiparametric in that the estimating equation are derived without full specification of the joint distribution of a subject’s observations . instead, we specify only the likelihood for the (univariate) marginal distributions and a "working" covariance matrix for the vector of repeated measurements from each subject .The GEE method yields consistent and asymptotically normal solutions, even with misspecification of the time dependence, when the response is categorical variable (binary , ordered or count data), GEE is one of the most appropriate models.Forth part introduce the theory of generalized linear mixed effect model and applications in binary,ordered and count data:Generalized linear mixed effect modes are natural extention of linear mixed effect modes which can be used for both continuous and discrete longitudinal data. GLMMs are a unified approach to exponetial family regression methods with random effects. The mean response and linear predictor are connected by link function,when there isn’t random effect, the model be changed a generalized linear model. All types of correlation structure in the repeated measurement data is induced by introducing a random effect.Fifth part introduce the theory of nonlinear mixed effect model and applications in nonlinear RM data ,binary,ordered and count data:Nonlinear mixed effect model not only recognize and estimate variability both between and within individuals,but also allow both fixed and random effects enter the model nonlinearly. The depentent variable is normal, binomial,poisson distribution,the applications of the most common are Pharmacokinetic, nonlinear growth curve and fit directely nolinear model for categorical response repeated measurement datas; These year ,the model enjoyed widespread attention within biological,agriculture and medical reseach community.Sixth part introduce the theory of frailty model and applications in recurrent event datas: Frailty modes are extensions of Cox proportional hazards model,which aims to account for heterogeneity caused by unmeasured covariates. The frailty has a multiplicative effect on the baseline hazard function ,namely the individual harzard rates are influenced by frailty in the multiplication way. The subgroup which frailty value is bigger than other will be susceptible to experience the event in a earlier time. Individuals in a cluster are assumed to share the same frailty which is why this model is called shared frailty model .The survival times are assumed to be conditional independent with reaponse to the shared frailty. The frailty is a random efect which is assumed to obey some kinds of distribution. The frailty distribution most often applied are the gamma distribution. The conditional frialty model combines a random effect to incorporate unobserved heterogeneity with event-based stratification(varying baseline hazards) to invorporate event dependence .They will be the ideal model for the repeated event data.Seven parts By introducing the analysis methods for nonnormal,nonindependent and nonlinear datas, deeply expound the application of generalized estimate equation, generalized linear mixed effect model and nonlinear mixed effect model in categorical data (dichotomous,ordinal,count )and of frailty model for recurrent event data, discuss the analysis methods and realization for SAS software and R software, put forward some suggestion and appraise for model construction, parameter estimate ,soft realization et all in medical applications, provide new ideal of analysis for nonnormal, nonindependent and nonlinear datas.The paper analysis and contrast GENMODE,GLIMMIX and NLMIXED procedures of SAS9.1.3 at the application in medical categorical and nonlinear repeated measurement data , conduct recurrent event data using free software R2.4.0;Using the idea of combining the model theories with example analysis , linking method researches and software realization , systematicly introduce the application of model analysis and software realization for nonnornal ,nonlinear repeated measurement datas , summarize skills and experiences about practical application by analyzing examples, systmeticly explain models of analysis and theories for nonnormal,nonindependent and nonlinear datas, proving the methodology basis for analyzing the medical datas , providing the conditions for theory models and software applications, especially light abstract statistical theory , using the idea of being base on theory but be higher than theory ,open some new situation for the practice applications of all kinds of methods ;proving some new , reliable , exact ,informative,feasible viewpoint for multivariable statistical methods to resolve practical problem by using correctly generalized estimate equation,generalized linear mixed effect model,nonlinear mixed effect model and frailty model.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络