节点文献

单病例随机对照试验(N-of-1)数值资料的模拟研究

The Simulation Studies of the Numerical Data in Single Case of Randomized Controlled Trials (N-of-1)

【作者】 陈新林

【导师】 陈平雁;

【作者基本信息】 南方医科大学 , 流行病与卫生统计学, 2014, 博士

【摘要】 研究背景单病例随机对照试验(N-of-1trials),简称为N-of-1试验,是一种基于单个病例进行双盲、随机、多周期二阶段交叉设计的随机对照试验,一般安排两种干预和3个或3个以上周期,每个周期形成一个二阶段交叉设计,随机分配每个周期两个阶段的干预,相邻阶段间有一个洗脱期;相邻周期之间亦安排一个洗脱期。该试验常用于评价某种药物与对照药物的疗效。N-of-1试验可揭示个别患者对干预的异同,了解病例存在的特殊规律;N-of-1试验更加关注个体化的治疗效果,符合循证医学的要求,并为个体病人的决策提供了强有力的证据。N-of-1试验越来越多运用于社会学、教育学,特别是医学领域包括风湿病、儿科风湿病、关节炎疼痛、慢性神经性疼痛、失眠、心脏病、慢性阻塞性肺疾病、小儿肿瘤等疾病。但至今N-of-1试验仍没有得到全面广泛的应用,其原因除了跟特定的设计方案有关外,如需要相对稳定的症状或疾病,较短的药物半衰期,还和统计分析方法的复杂性存在一定关系。N-of-1试验数值资料的分析方法主要包括:视觉/图形分析方法,非参数检验,参数检验(Z检验、t检验、配对t检验),时间序列,Meta分析方法,固定/随机效应模型,和贝叶斯分层模型等。大部分研究都用简单的方法,如非参数检验和参数检验对N-of-1数值资料进行分析。Zucker DR等提出了N-of-1试验的混合效应模型,考虑每次观察之间的相关性对资料进行分析,比较两组治疗的效果。混合效应模型具有较大的灵活性,比如可增加其它协变量(如研究中心,医疗环境和参与者特征),探讨协变量对治疗效果造成的影响。然而这些方法各有优劣,如何对N-of-1试验的资料开展分析并没有达成共识。哪种方法用于分析N-of-1试验的资料具有更高的统计效能,并不清楚。研究目的本研究拟考虑时间点的相关(协方差结构)、不同的残留效应和效应差,对服从正态分布的N-of-1试验的数值资料进行模拟研究,侧重比较配对t检验(模型1)、差值的混合效应模型(模型2)、混合效应模型(模型3)、汇总资料Meta分析(模型4)四种检验方法的统计性能。研究方法本课题用SAS9.1.3软件编程进行模拟研究。开展3周期6阶段(3-cycles),4周期8阶段(4-cycles) N-of-1试验的模拟研究。根据不同时间点的相关(协方差结构)和残差效应,并考虑随机分配处理因素的基础上,产生N-of-1试验的六维或八维数据,构建以下四个模型:配对t检验(模型1),差值的混合效应模型(模型2),混合效应模型(模型3),Meta分析(模型4)。使用Ⅰ类错误(type I error),功效(power),平均误差(Mean Error, ME, Bias),平均误差百分比(percent error, PE)、平均绝对误差(Absolute Mean Error,AE),均方误差(Mean Square Error, MSE)比较各个模型的优劣。下面以3周期6阶段的N-of-1试验为例,说明模拟研究的具体步骤:(1)产生六元正态分布数据:按照方差协方差矩阵产生样本量为n的六元正态分布数据。6组数据的均数分别为μA、μB、μA、μB、μA、μB,μA为A干预效应,μB为B干预效应。例如3、2、3、2、3、2。μA-μB表示两种干预间的效应差值(简称效应差,effect difference)。(2)随机分配A、B干预方法,随机分配每个阶段的研究对象:产生n个3组2分类的随机数字。n对应研究对象,3组对应N-of-1试验的3个周期。2分类用1、2表示,分别对应A、B干预。例如第1个3组2分类为1、2、1、2、2、1,对应6个阶段,表示第1个对象接受的干预顺序为ABABBA,对应的效应值为μA、μB、μA、μB、μA、μB、μA。计算每个阶段接受A干预的人数。假设第i个阶段接受A干预人数为q,当q<n/2,依次把后面对象的B干预替换为A干预,使得替换后A干预的人数等于B干预的人数,或比B干预人数少一个;当q>n/2,则依次把后面对象的A干预替换为B干预,使得替换后A干预的人数等于B干预的人数,或比B干预人数多一个,最后使n个对象在每个阶段得到平均分配。(3)增加残留效应:假设前一次干预效应只对下一次干预造成影响,而不会对后面的其它干预造成影响。残留效应按照一定的残留比例影响下一个干预效果。A、B干预的残留效应表示为λA,λB。比如某对象的治疗顺序为ABBAAB,第一阶段没有残留效应;第二阶段接受B干预,除了B效应外,还有前一次治疗A的残留效应λA。第三阶段接受B干预,除了B效应外,还有前一次治疗B的残留效应λB;以此类推。(4)分析模型:①配对t检验(模型1):将所有对象同一周期的两个阶段按配对处理,用配对t检验比较两种干预之间有无统计学意义。②差值的混合效应模型(模型2):计算每个周期A、B干预之间的差值,拟合模型yih=μ+τh+γi+εih。使用SAS的Proc mixed过程步进行计算,3个周期的方差协方差矩阵G设定为双对称(compound symmetry,CS)或白相关(first-order autoregressive,AR)结构。③混合效应模型(模型3):将6个阶段的效应值纳入拟合yij=a+μ*group+τj+λAZA+λBZB+γl+εij。使用SAS的Proc mixed过程步进行计算,方差协方差矩阵G设定CS结构或AR结构。④Meta分析(模型4):对n>1的N-of-1试验,将一个研究对象当作一项研究,求出每个研究对象接受两种干预的均数和标准差,使用Meta分析。(5)模型评价:原假设H0:μ=0,备择假设H1:μ≠0。效应差的Ⅰ类错误:当效应差为0时,H0但被拒绝的概率。检验功效:效应差不为0时,模型能够发现差别的概率。计算效应差的Ⅰ类错误、功效、平均误差(ME),平均误差百分比(PE)、平均绝对误差(AE)和均方误差(MSE)。(6)参数设置:分别设定样本量为1、5、10、20、30(即n=1、5、10、20、30)。方差协方差矩阵使用双对称结构或自相关结构。双对称结构的设定:方差为1,协方差ρ分别为0.0、0.2、0.5、0.8,对应CS1、CS2、CS3、CS4结构。自相关结构的设定:方差为1,自相关系数为0.5和0.8,对应AR1、AR2结构。残留比例的设定:①残留比例一致,分别设定残留比例为0%、10%和20%。②残留比例不一致,A、B的残留比例分别设定为0%和10%、5%和10%、10%和0%、10%和5%。③残留比例递增,5%递增幅度为1%,10%递增幅度为1%。干预效应的设定:B干预效应为2.0,A干预效应分别为2.0、2.2、2.4、2.6、2.8、3.0、3.2等。即效应差值分别为0.0、0.2、0.4、0.6、0.8、1.0、1.2等。使用SAS9.1.3软件的PROC ttest、PROC mixed过程步对数据进行拟合。每种情况都用4种模型(配对t检验、差值的混合效应模型、混合效应模型及Meta分析)重复模拟1000次(m=1000)。研究结果无论样本量为多少,三周期N-of-1试验和四周期N-of-1试验的模拟结果一致。随着周期数的增加,相当于增加了样本量,模型的功效随之增加,而模型的AE、MSE随之降低。(1) n=1N-of-1试验的模拟结果配对t检验(模型1)和差值的混合效应模型(模型2)一致,因此有完全相同的结果。模型1、模型2和模型3的Ⅰ类错误都围绕0.05变化。①当A、B干预不存在残留比例、或残留比例偏向B干预(0%VS.10%,或5%VS.10%)时,模型1和模型3的估计值接近真实值(ME和PE接近0);而模型3效应差估计值的变异度(MSE)大于模型1,模型3的功效低于模型1,且模型3有较高的AE。②当A、B干预残留比例高达20%、或残留比例偏向A干预(10%VS.0%或10%VS.5%)时,模型1效应差的估计值小于真实值,即ME远小于0;而模型3效应差的估计值接近真实值(ME接近0)。模型3的功效接近甚至超过模型1,模型3的AE和MSE接近或小于模型1。Price JD等开展三周期单病例N-of-1试验,使用5mg多奈哌齐和安慰剂治疗非渐进遗忘综合症,结果显示HVLT保留分、HRSD和SF-36得分在模型1的P值均较小,而且这些指标的95%置信区间均小于模型3。Hackett A等开展三周期N-of-1试验,比较L-精氨酸胶囊或安慰剂胶囊治疗鸟氨酸酶缺乏症的效果,结果显示血浆谷氨酰胺在模型1的P值小于模型3,而且模型1的95%置信区间小于模型3。这两个实例很好的验证了n=1N-of-1试验的模拟结果。(2) n>1N-of-1试验的模拟结果①残留比例一致和残留比例递增模型1和模型3的Ⅰ类错误围绕0.05变化。模型2的Ⅰ类错误明显小于0.05;模型4的Ⅰ类错误不稳定。模型1和模型2效应差的估计值偏离正常值,ME的绝对值约为残留效应的一半。无论残留比例是多少,模型3效应差估计值的均数约等于效应差(ME约为0)。模型4效应差估计值远远偏离真实值(ME>0)。功效最大的是模型1,其次是模型3、模型4,功效最小的是模型2;当残留比例等于20%时,模型3的功效接近甚至超过模型1。模型1的AE和MSE最小,其次为模型3、模型2,最大的是模型4。②残留比例不一致模型3的Ⅰ类错误基本围绕0.05变化。其它模型的Ⅰ类错误不稳定、变化较大。功效最大的是模型1,其次是模型3,功效最小的是模型2。当残留比例偏向A干预(10%VS.0%;10%VS.5%),模型3的功效接近甚至超过模型1。模型3的ME、PE最小,其次是模型1和模型2,模型4的ME、PE最大。模型1的AE、MSE最小,其次是模型3、模型2,模型4的AE、MSE最大。为了解螺旋藻对慢性疲劳的治疗效果,Baicus C等开展了多病例的三周期N-of-1试验,结果显示模型1的P值最小,其次是模型2、模型3;模型4的95%置信区间最大,其次分别为模型2、模型3,模型1的95%置信区间最小。该实例很好的验证了n>1N-of-1试验的模拟结果。结论n=1N-of-1试验的数值资料建议采用配对t检验,当A、B干预残留比例高达20%、或残留比例偏向A干预(10%VS.0%,或10%VS.5%)时,也可以使用混合效应模型。n>1N-of-1试验的数值资料,(1)当残留比例一致和残留比例递增时,推荐使用配对t检验;当A、B干预残留比例高达20%,也推荐采用混合效应模型。(2)当残留比例不一致时,推荐使用混合效应模型。

【Abstract】 BackgroundSingle case of randomized controlled trials (N-of-1) is referred to as N-of-1trials. N-of-1trials are multicycle (≤3cycles), double-blinded controlled cross-over trials based on individuals. The two periods in each cycle are randomly assigned to different treatments for each subject with a washout period. N-of-1trials are designed to test the effect difference of two treatments which are conventionally labeled as Group A (test group) and Group B (control group).N-of-1trials can reveal the similarities and differences of the individual patient, and understand the special rules of some cases. N-of-1trials are more concerned about the treatment effect of the individual patient, meet the requirements of evidence-based medicine, and provide strong evidences for the decision of the individual patient.N-of-1trials have been increasingly utilized in social, educational sciences, biomedical, clinical areas, and notably in medical area including rheumatism, pediatric rheumatism, arthritis pain, chronic neuropathic pain, insomnia, heart disease, chronic obstructive pulmonary disease, and pediatric oncology. However, N-of-1trials have not been widely used. One reason was that N-of-1trials required relatively stable symptoms or diseases, medications with short half-lives, and rapid measurable responses. Another important reason was related to the difficulties about the statistical analysis of the data.To analyze the numerical data of N-of-1trials, a number of methods have been proposed, such as visual analysis, non-parametric tests, parametric tests (z-test, two samples t-test, paired t-test), time series, Meta-analysis of summary data (short for meta-analysis), and mixed effects model and the Bayesian hierarchical model. Among them, non-parametric tests and parametric tests were used frequently. Zucker DR used mixed-effects model to analyze the data of N-of-1trials, which had greater flexibility. However, mixed-effects model was not used widely.These methods had their advantages and disadvantages. There wsa no consensus how to analyze the data of N-of-1trials. It remained unclear which method should be adopted to provide more robust inferences for the numerical data of N-of-1trials.ObjectiveA simulation study of3-cycles and4-cycles N-of-1trials was conducted to compare the performance of four methods (paired t-test, mixed effects model of difference, mixed effects model and meta-analysis of summary data) under various variance-covariance structures, different carryover effect and difference effect.MethodsThe simulation about3-cycles and4-cycles N-of-1trials was conducted using SAS9.1.3software. N-of-1trials were set with sample sizes of1,5,10,20and30respectively under normally distributed assumption. The data were generated based on different variance-covariance matrix and different carryover effect. Type I error, power, ME (mean error), and mean square error (MSE) of effect differences between two groups were used to evaluate the performance of the four models.We assumed μA and μB as A and B intervention effect respectively, μA-μB represented the difference between the effects between the two intervention (effect difference). Taking3cycles N-of-1trials as an example, we illustrated the process of the simulation.(1) Generating6-dimensions normal distribution data:The6-dimensions normal distribution data were generated by a multivariate normal random number generator (a SAS Macro) based on mixed effect model.(2) Randomly assign the two interventions and subjects:Two groups (A or B) of each cycle in each subject were randomly assigned using "Proc Plan" in the SAS software. All the subjects in each period were also randomly allocated into Group A or Group B to assure that half of the subjects in each period receive Group A or Group B. The actual response value of each subject was produced according to the allocations. For example, the allocation sequence of the first subject in six periods was BABAAB.(3) Adding the carryover effect (residual effect). We assumed that carryover effect was caused only by the previous period, and was equal to a certain percentage of the previous treatment effect. The first period did not have carryover effect, while other five periods had the carryover effect.(4) Analysis model:①Paired t-test (Model1):In N-of-1trials, each cycle with two periods which were assigned to Group A or Group B was considered as a pair. Paired t-test was used to compare the effectiveness between the two interventions.②Mixed effects model of difference (Model2):The difference of the two groups in the same cycle was calculated. Model2could be formulated as yih=μ+τh+γi+εth. The variance-covariance matrix were set as compound symmetry (CS) or first-order autoregressive (AR).③Mixed effects model (Model3):Considering all response values in six periods, Model3was set as yij=α+μ*group+τj+λAZA+λBZB+γi+εij.The variance-covariance matrix were set as CS structure or AR structure.④Meta analysis (Model4):Each subject of N-of-1trials was considered as a separate trial (study). A typical method to analyze n>1N-of-1trials was to use meta-analysis. Meta-analysis combined summary data from each subject to form a weighted average using the method of Der-Simonian and Laird:(5) Assessment of modelsType I error, power, ME (mean error), percent error (PE) of ME, mean absolute error (AE), and mean square error (MSE) were used to assess the performances of the models. Percent error of ME was absolute of ME divided by the true effect difference.(6) Parameter setting The sample size was set to1,5,10,20and30respectively. The variances of the variance-covariance matrix (compound symmetry structure) were all set to1. Covariances were all set to0.0,0.2,0.5,0.8corresponding CS1, CS2, CS3and CS4structure. The setting of carryover effet:①equal carryover rate:Carryover rates were set to0%,10%and20%.②unequal carryover rate:Carryover rates of A, B intervention were set to0%and5%,0%and10%,10%and0%,10%and5%.③increasing carryover rate, from5%of carryover effect with1%increasing carryover rate, from10%of carryover effect with1%increasing carryover rate. B intervention effect was2.0. A intervention effect were2.0,2.2,2.4,2.6,2.8,3.0, and so no.PROC ttest, PROC mixed were used to fit the data Using SAS9.1.3software. The simulation was repeated1000times (m=1000).ResultsThe results of simulation study of3-cycles and4-cycles N-of-1trials were consistent with each other. As the number of the cycle increased (just as sample size increased), power of four models increased, and their AE, MSE decreased.(1) The results of n=1N-of-1trialsPaired t-test (Model1) and mixed-effects model of difference (Model2) had identical results. Model1, Model2(mixed-effects model) and Model3yielded type I error near to5%(the nominal level).①When there was no carryover rate for A and B interventions, or0%VS.10%, and5%VS.10%of the carryover rate for A and B interventions, the estimated values of all models were close to the true value (ME and PE were close to0). MSE of Model3was greater than that of Model1. Therefore, Model3had lower power than Model1. Model3has a higher AE than Model1.②When there was20%carryover rate for both A and B interventions, or10%VS.0%, or10%VS.5%of the carryover rate for them, the estimated value of Model1was less than the true value (ME was not equal to0), while that of Model3was close to the true value (ME was equal to0). Power of Model3was close to and even larger than that of Model1. AE and MSE of Model3were close to and even smaller than those of Model1. Price JD etc. conducted N-of-1trials about non-progressive amnestic syndrome. The results showed that P-value of Hopkins verbal learning test (HVLT), Hamilton rating scale for depression (HRSD) and SF-36in Model1was less than those of Model3, with smaller95%confidence intervals for Model1. Hackett A, etc. carried out N-of-1trials of ornithine enzyme deficiency. The results showed that plasma glutamine in Model1had smaller P-value and95%CI than Model3. These two examples were consistent with that of simulation study.(2) The results of n>1N-of-1trials①Equal and increasing carryover ratesPaired t-test (Model1) and mixed-effects model (Model3) yielded type I error near to5%(the nominal level). Type I error of mixed-effects model of difference (Model2) was obviously less than0.05, while that of Meta-analysis (Model4) were slightly larger than0.05.The estimated values of Model1and Model2deviated from the true values. The absolute value of ME was equal to about half of the carryover effects. No matter how much the carryover rate was, the estimated values of Model3were approximately equal to the true effect (ME=0). The estimated values of Model4were far away from the true effect. Power of Model1was the largest, followed by Model3and Model4. Power of Model2was the least. When there was20%carryover rate for both A and B interventions, power of the Model3was close to and even larger than that of Model1. AE and MSE of Model1were the least, followed by Model3and Model2. AE and MSE of Model4were the largest.(2) Unequal carryover ratesMixed-effects model (Model3) yielded type I error near to5%. Type I errors of other models were unstable, and usually larger than0.05.Power of Model1was the largest, followed by Model3and Model4. Power of Model2was the least. When there were10%VS.0%, or10%VS.5%of carryover rate for A and B intervention, power of Model3was close to or exceeded that of Model1. ME and PE of Model3were the least, followed by Model1and Model2. ME and PE of Model4were the largest. AE and MSE of Model1were the least, followed by Model3and Model2. AE and MSE of Model4were the largest.N-of-1trials were performed on four physicians who complained of chronic fatigue. Each physician received three pairs of treatments comprising4weeks of Spirulina platensis (test group) and4weeks of placebo (control group), with2weeks washout time. Severity of fatigue was measured on a10-point scale daily during the second half (weeks3and4) of each period. The results showed that95%CI and P value of Model1was the smallest. The second smallest95%CI was Model3, followed by Model2and Model4. The results were consistent with that of simulation study.ConclusionPaired t-test was recommended to analyze the numerical data of n=1N-of-1trials. When there was20%carryover rate for both A and B interventions, or10%VS.0%, or10%VS.5%of the carryover rate for them, mixed-effects model could be used as an alternative.For n>1N-of-1trials,(1) when there were equal and increasing carryover rates, paired t-test was recommended. When there was20%carryover rate for both A and B interventions, mixed-effects model could be used as an alternative.(2) when there were unequal carryover rates, mixed-effects model was recommended.

节点文献中: