节点文献

现代测量理论在慢性病患者生命质量测定量表体系共性模块研制中的应用

Application of Modern Test Theory in Development of the General Module of Quality of Life Instruments System for Chronic Disease

【作者】 潘海燕

【导师】 丁元林; 万崇华;

【作者基本信息】 南方医科大学 , 流行病与卫生统计学, 2011, 博士

【摘要】 [背景]慢性病生命质量量表的开发研究是近年来健康相关生命质量研究领域的研究热点,是对慢性病患者进行生命质量评价的一项基础性和关键性的工作。目前,慢性病生命质量的量表虽已有多种,但在量表开发方面普遍存在以下问题:(1)量表研究各自为政,缺乏系统性;(2)国外专家开发的相关量表没有完全体现中国文化背景,急需开发具有中国特色的慢性病量表;(3)量表评价筛选多建立在经典测量理论基础上,现代测量理论鲜见应用于生命质量测定领域。鉴于此,本课题组从2003年即开始进行慢性病生命质量量表体系的研究,并申请了国家自然科学基金课题(30360092),课题组在借鉴现有的慢性病量表基础上,以共性模块与特异性模块结合的量表开发方式,系统、独立地开发了我国慢性病患者生命质量测定量表体系(Quality of life instruments for chronic disease,QLICD)。该体系包括一个可以用于各种慢性病患者生命质量测定的共性模块(QLICD-GM)以及在此基础上开发的8种慢性病的特异测定量表。在量表的开发工作受到高度关注的同时,量表及其条目的筛选与评价方法研究成为基础性工作。以往研究慢性病生命质量量表评价与筛选方法多建立在传统的经典测量理论(Classical test theory,CTT)基础上,该方法简便易懂,比如对量表的信度、效度和反应度、克朗巴赫a系数等系列指标进行计算评价。CTT是一套完整的测量理论与统计分析方法,是占据测量学统治地位的测量理论。但是该理论存在样本依赖性、测验平行假设难以实现及难以保证测验结果拓广的有效性等明显不足之处,使该理论的深入发掘与应用受到一定限制。在CTT研究缺陷的基础上,研究者提出了用现代测量理论(Modern test theory)来指导量表的开发。项目反应理论(Item response theory,IRT)和概化理论(Generalization theory,GT)是两种重要的现代测量理论。IRT具有下列特点:深入微观领域,将被试特质水平与被试在项目上的行为关联起来并将其参数化、模型化,可以精确估计测量误差;对被试潜在特质的估计不依赖特定的测验题目;参数的估计独立于被试样本;测验信息函数的概念代替了CTT的信度理论等。上世纪70年代以后,IRT得到充分发展,解决了经典测量理论未能解决的许多问题。项目反应理论在生存质量研究中的应用开始于20世纪末期,Haley和McHorney等用IRT分别评价了SF-36躯体功能的一维性,Cella和Chin-hung讨论了IRT在健康状况评价中的应用,使IRT深入到生存质量中。2004年于香港召开的国际生存质量会议有多数议题是与IRT在生存质量中的应用有关系的。目前中山大学也在开展IRT在残疾人生存质量量表中的应用研究。虽然目前IRT在国外发展很快,也有专家应用于研究生命质量相关量表的评价研究,但是在国内用于生命质量的研究较少。GT运用了实验技术和方差分析的基本原理,将经典测量理论与方差分析结合起来。提出了相对误差、绝对误差、概化系数、可靠性指数等一系列新的指标,取代了经典测量理论的信度、效度等传统指标,在研究测量误差方面具有更大的优越性,更加侧重于测量评价误差与决策需要间直接的关系,能够从宏观领域,不同的侧面针对不同测量情境估计测量误差的多种来源,以提高测验质量。GT理论相关的研究在我国还处于起步阶段,目前在面试、考核等领域有一些应用,少见将其应用到慢性病生命质量研究领域的报道。采用项目反应理论和概化理论两种现代测量理论方法相结合来分析评价慢性病生命质量量表研究尚未见报道。考虑到两种现代测量理论的诸多优势,及其在生命质量量表开发研究中的应用潜力,本研究拟采用项目反应理论、概化理论相结合从微观和宏观两个层面对QLICD-GM (V1.0)进行分析评价并与经典测量理论进行研究比较。[目的]1.尝试用项目反应理论和概化理论两种现代测量理论方法分析评价慢性病患者生命质量测定量表体系共性模块(QLICD-GM V1.0)。对共性模块进行微观和宏观层面的评价,为进一步修订模块条目,改进模块的结构提出建议;2.将项目反应理论、概化理论和经典测量理论三种测量方法进行比较,指出各自在慢性病生命质量量表研究中的优势与不足,为进一步研究开发其他类型疾病共性和特异性量表提供科学的方法借鉴。[内容]1.用项目反应理论,对QLICD-GM (V1.0)条目进行逐一的分析刻画,拟合其难度参数、区分度参数及信息量函数,结合项目特征曲线图,筛选出信息量较高的条目,剔除信息量过低的条目;2.用概化理论分G研究和D研究两个阶段进行评价。在G阶段,从宏观(量表不同领域)分析,反映不同误差来源的变异对总变异的影响;在D阶段,计算不同数量的条目下体现不同侧面影响的概化系数、可靠性指数和各种误差,对模块的信度进行评价,对不同领域条目的数量提供参考性建议,为不同的决策提供理论的依据。3.总结对比项目反应理论、概化理论和经典测量理论三大测量理论在生命质量研究中各自的优缺点及提出应用注意事项。[方法]1.调查方法以昆明医学院附属医院和云南省人民医院为主要调查点,调查包括高血压、冠心病、慢性胃炎等8种疾病在内的慢性病患者。要求患者有一定的读写能力。调查者以医生的身份出现,对共性模块的量表进行简单的解释和说明后将QLICD (V1.0)发给患者填写,等其完成后收回量表并检查有无漏项。调查分两次,入院时进行一次,出院之前进行一次重复调查。2.项目反应理论用Semejima等级反应模型对慢性病生命质量测定量表体系第一版QLICD-GM (V1.0)的每一个条目进行分析刻画,首先进行单维性假设的检验,然后从微观层面分析每个条目的信息量、信息函数并计算条目的难度、区分度,绘制其概率函数曲线和项目特征曲线。3.概化理论从宏观层面分析评价QLICD (V1.0)共性模块的整体有效性和可信性,并从不同的侧面和领域进行分析。根据资料的特点和设计方案类型,选用随机双面交叉(嵌套)设计的G研究和随机双面面交叉(嵌套)设计D研究方法,以患者作为测量目标,以不同的共性模块条目作为一个测量侧面,运用实验设计和方差分析的基本原理进行评价。将G研究中测量的效应或者变异的来源分为七个部分,一部分是被调查的不同疾病的患者p,第二部分是三个不同领域的各个条目i,第三部分是不同的测量时间t,其他部分是患者和条目、时间的交互效应P×i、p×t、i×t、p×i×t。采用两因素析因设计的ANOVA程序进行处理。D研究阶段,分3个领域,分别计算生理、心理和社会功能领域各自的变异分量估计值的相对误差、绝对误差、概化系数和可靠性指数等指标。4.提出项目反应理论和概化理论在慢性病生命质量量表研究中的应用注意事项及优缺点,对比经典测量理论,为以后进行新的共性和特异性量表的研制和开发提供方法学借鉴。5.统计学方法用数据库软件Excel、Foxpro进行数据的录入管理,采用统计分析软件SPSS15.0、MULTILOG7.03等对资料进行统计分析。[结果]第一部分项目反应理论1.单维性本研究分别按生理功能、心理功能和社会功能三个领域进行IRT分析。结果:治疗前,生理功能:第一特征与第二特征值之比2.6,基本满足单维性;心理功能5.7,完全满足单维性;社会功能社会影响侧面2.3,社会功能社会支持侧面:3.0,满足单维性的要求。治疗后,生理功能:第一特征与第二特征值之比2.9,基本满足单维性;心理功能:6.0,完全满足单维性;社会功能社会影响侧面2.9,社会功能社会支持侧面3.26,满足单维性。两次调查的单维性检验结果说明本量表可以采用项目反应理论进行分析。2.难度与区分度共性模块30个条目,3个领域(躯体功能、心理功能和社会功能)进行分析。两次慢性病生命质量测定量表共性模块不同领域的难度和区分度结果显示,time1条目第一次测定难度在-2.88~2.27之间。time 2条目S04、S05的难度最小值小于-3.0,PH5条目的难度最大值大于3.0,除了这3个条目其他所有条目的难度范围均在-2.93~2.93之间。说明QLICD量表体系共性模块难度适中。另外,30个条目的区分度都在0.63-1.88之间,均大于0.3,每个条目从1-4级呈单向递增,说明慢性病生命质量测定量表共性模块30个条目的区分度均较好。每个条目呈单向递增,均不存在逆反阈值。3.条目信息量平均信息量范围为0.37-0.99,其中生理功能领域信息量平均为0.38,心理功能领域平均信息量是0.80,社会功能领域平均信息量为0.48。其中,生理功能领域的平均信息量最小,心理领域平均信息量均较高,社会功能领域的11个条目中,SO1、SO3、SO6的信息量偏低,不能直接入选。根据每个条目的信息量,结合条目特征,从30个条目中选出24个好的条目。其中,信息量为0.47以上的条目有17个,直接入选。为保证共性模块各领域的完整性,保留PH2、PH6、PH7、PH8、SO1、S09、SO11。4.项目特征曲线图形显示,生理功能领域PHI-PH8条目概率曲线的概率值比心理领域的均要小,峰值普遍偏低,有少数项目的峰值接近重合,说明不同选项的区分能力不是太强,对于第一版的共性模块生理领域条目的选项还需要进一步研究改进。心理领域PS1-PS11,峰值之间层次感强,峰值范围相对较大,说明选择的概率较大,而且信息量均在0.47以上,这11项可以直接纳入量表中。社会功能领域SO1-SO11的概率曲线中,SO1、S03和S06、S09的区分能力偏低,其余曲线峰值均相对较高。第二部分概化理论1.总量表概化全域总的概化全域中,G研究表明:研究对象的变异效应α2(p)最大,为4.82,在总方差中占的比重为68%,说明被试研究对象的贡献最大,与预期构想的结果较吻合,拟合结果较理想。条目因素所占的比重较小,说明不同的条目具有较高的一致性,时间因素t的变异α2(t)仅为0.01,比重占0.14%,说明两次调查的时间因素对总的结果并未产生很大的影响,患者对两次调查总的反应性比较好。总的概况全域D研究表明:当尝试总量选取不同条目数(20、25、30、35、40)时,被试者与题目之间的交互作用、被试与时间交互作用、被试与时间、条目之间的交互作用及相对误差σ2(δ)和绝对误差σ2(△)均小于1,并且被试样本观测分均值估计和被试总体全域分均值的误差变异都比较小,概化系数Eρ2和可靠性指数Φ均大于0.9,说明QLICD-GM(V1.0)的测量信度和效度比较高。同时当概化全域中的题目样本容量逐步增大的时候,除了被试者变异分量没有发生变化以外,其余各种效应的变异分量都逐渐减小,概化系数和可靠性指数都增大。即使题目的样本容量为20题时,概化系数也是0.9905>0.9,但是当样本量逐渐增加,从35到40例的时候,概化系数无明显变化,仅提高0.0001。因此,如果要达到较好的信度,实际工作中建议共性模块选用35个左右的条目就比较好。2.生理领域生理领域G研究结果表明:研究对象的变异效应最大,为14.61,在总方差中占的比重为81%,生理领域的8个条目的相对误差范围是0.2203~0.2698,绝对误差取值范围0.2313~0.2894,均小于0.3,概化系数和可靠性指数均大于0.98,说明拟合效果较理想,生理领域各个条目的信度均较好,该结果与基于经典测量理论的重测信度、分半信度、克朗巴赫α系数结果是一致的。3.心理领域心理领域G研究结果显示,研究对象、条目和测量时间之间交叉作用的变异效应最大,占到48.96%的比例,而研究对象的效应仅占40%,与生理领域的结果有所不同。D研究结果显示,随着条目数的增加,概化系数和可靠性逐渐增大,当条目数达到11时,概化系数达到0.9886,条目数从11增加到13条时,概化系数增加到0.9897,13条之后调高的幅度较小,说明心理领域,条目数11-13较好,可以适当增加条目,使量表的信度更高。可靠性指数均大于0.95,说明心理领域各条目信度比较好。4.社会功能领域社会功能领域,患者与条目之间的交叉作用变异效应最大,占37.14%,其次为患者、条目和时间的交叉效应,占33.9%,再次患者效应为27.10%。条目拟合效果尚可,但是不同患者与条目的交叉作用太大,社会领域共性模块的部分条目需要进一步修订,使不同疾病类型的患者能够对条目保持较高的一致性的反应。[结论]1.项目反应理论和概化理论分析均可以较好地拟合应用于慢性病生命质量量表体系的开发研究。能够综合评价生命质量量表共性模块,具有较大的开发潜力和较好的应用前景;2.经典测量理论分析表明,QLICD-GM(V1.0),总的信度、效度和反应度均较好,难度和区分度适中;3.项目反应理论和概化理论结果表明,在慢性病共性模块3个领域中,项目分析生理功能领域的条目拟合结果相对信息量偏差,概率曲线偏低,说明条目不能够直接进入下一步新版本的研究中,需要进行适当的修订,但是该领域的概化系数和可靠性指数均较大;心理功能领域条目信度、效度、信息量、概化系数和可靠性指数等均较大,相对和绝对误差均较小,11个条目建议可以直接入选到下一个版本,社会功能领域项目拟合结果尚可,部分条目的信息量偏低需要调整。4.项目反应理论和概化理论两种方法相较于经典测量理论各自有其优点和不足之处,可以与经典测量理论方法相结合开发共性模块和特异性量表新的版本。

【Abstract】 [Background]The Study on Quality of Life Instruments for chronic disease is a hot spot, also a basic and critical work. There are varied kinds of health related QOL measuring scales at present time. However, the available specific instruments for different types of chronic disease have several problems:(1) Available instruments have been developed by different research groups, leading to a multitude of assessment tools for the same disease. As a result, many investigators are at loss as to which ones to use for their studies, hampering the research progress in chronic diseases and related research areas. (2) Developing instruments for each individual disease independently is not only inefficient, but also of limited comparability. Such an approach focuses on individual symptoms rather than a core that offers a common structure that applies to different diseases. Further, given the number of diseases, it is not practical to develop an instrument(s) for each individual disease. (3) Measuring scales developed abroad do not reflect the Chinese cultural background sufficiently; (4) Modern test Theory few applying in the QOL measurement field. In order to overcome these problems, our team workers have been devoted into the study of Quality of Life Instruments System for chronic disease since 2003. Supported by the natural science foundation of China, by combining a general module and disease-specific modules, we have developed the Chinese QOL instruments system called QLICD (Quality of Life Instruments for Chronic Diseases). This system includes a general module (QLICD-GM), which can be used with all types of chronic disease patients, and specific modules for different diseases, with each module being used for only the relevant disease. The work receive widely cited and comment.At the time of the scale development being highly concerned, the research on the screening and evaluation methods of scales and items is the groundwork. Most of the previous researches have been based on the traditional classical test theory (CTT) which is simple and easy for understanding, such as the calculation and evaluation on scale reliability, validity and responsiveness, Cronbach coefficientαand other indexes. CTT is a complete set of test theory and statistical analysis method, which occupies the dominant position of surveying. But the theory has obvious defects such as sample dependence, difficult achieving of test parallel assumption and difficult guarantee of the validity of the test results extension, etc., which limits the further development and application of the theory to a certain extent. Based on CTT research defects, the researchers propose to guide the development of scales with modern test theory.Item response theory (IRT) and generalization theory (GT) are two important modern test theories. IRT has the following features:go deep into the micro field, associate the trait level of the subjects with the actions of the subjects on items, and create their parameters and models, which can accurately estimate the test error; the estimate on the latent trait of the subjects is independent of specific test items; the estimate on parameters is independent of the tested samples; the concept of test information function replaces the CTT reliability theory. Since 1970s, IRT has been sufficiently developed and solved many problems that the classical test theory fails to solve. The application of item response theory in the research of Quality of Life began in the late 20th century, during which Hale and McHorney et al. respectively evaluated the one-dimensional nature of SF-36 physical function with IRT, Cella and Chin-hung discussed the application of IRT in health evaluation, which brought IRT into the quality of life. Many topics of the International Quality of Life Conference held in Hong Kong in 2004 were associated with the application of IRT in the quality of life. At present, SUN YAT-SEN UNIVERSITY is also carrying out the research on the application of IRT in the quality of life scale of persons with disabilities. Although IRT is currently developed rapidly in foreign countries and also some experts use it to evaluate and research quality of life related scales, its domestic application in the research of quality of life is very little. Using the basic principles of experimental technique and variance analysis,GT combines the classical test theory with variance analysis, and put forwards relative error, absolute error, generalization coefficient, reliability index and other new indexes to replace the reliability, validity and other traditional indexes of classical test theory, which enables it to have more advantages in the research of test error. It focuses more on the direct relationship between test & evaluation error and decision-making needs, and has the ability to estimate the various sources of test error aiming at different test situations from macro field and different aspects, so as to improve the test quality. GT related research in China is still in its infancy. It is currently applied in the interview, assessment and other fields, with little application in the research of quality of life for chronic diseases, not to mention adopting the combination of IRT and GT the two modern test theory methods to analyze and evaluate the research of Quality of Life Instruments for Chronic Diseases. Considering the numerous advantages of the two modern test theories and their application potential in the research and development of quality of life scale, this research plans to adopt the combination of IRT and GT to analyze and evaluate QLICD-GM (V1.0) from the micro-level and macro-level, and compare with CTT.[Aims]1. This paper tries to adopt IRT and GT the two modern test theory methods to analyze and evaluate the General Module of Quality of Life Instruments for Chronic Diseases (QLICD-GM V1.0). Evaluate the general module from micro-level and macro-level, so as to make proposals for further modification of module items and improvement of module structure.2. Comparing IRT, GT and CTT, this paper points out their respective advantages and disadvantages in the research of Quality of Life Instruments for Chronic Diseases, so as to provide scientific methodological references for further researching and developing the generality and specificity scales of other types of diseases.[Contents]1. Analyze QLICD-GM (V1.0) items one by one with IRT to fit into the difficulty parameter, discrimination parameter and information quantity function; Combining with item characteristic curve, choose items with higher information quantity and eliminate items with too low information quantity。2. Conduct the evaluation in two stages G and D with GT. In the stage G, analyze from macro-level (different fields of the scale) to reflect the influences of the variation with different error sources on the total variation; in the stage D, calculate the generalizability coefficient, reliability index and various errors which reflect different influences under different number of items, evaluate the reliability of module, provide references for number of items in different fields and provide theoretical basis for different decisions。3. Summarize and compare the advantages and disadvantages of IRT, GT and CTT in life quality research and propose the application notes.[Methods]1. Investigation Methods Collect data in the Affiliated Hospital of Kunming Medical College and Yunnan People’s Hospital among chronic patients with hypertension, coronary heart disease, chronic gastritis or other diseases (total 8 kinds of diseases). The patients must have certain ability to read and write. The investigators act as doctors and distribute QLICD (V1.0) to the patients for completion after briefly explaining the general module scale. After the patients complete, the investigators take them back and check if there is any omission. The patients will be investigated twice, one conducted on admission and the other before leaving hospital.2. Item Response Theory Analyze each item of QLICD-GM (V1.0) with Semejima Graded Response Model. Firstly test the one-dimensional assumption, then analyze the information quantity and information function of each item from the micro-level, calculate the difficulty and discrimination of each item, and draw its probability function curve and item characteristic curve.3. Generalizability Theory Analyze and evaluate the overall effectiveness and credibility of QLICD (V1.0) general module from macro-level and analyze from different aspects and fields. According to the characteristics of data and design type, choose G and D research methods of random double-sided-cross-over (nested) design, taking the patients as objects of measurement and different general module items as a side of measurement, conduct the evaluation with the basic principles of experimental design and variance analysis. The effects of measurement or variation sources in G research are divided into seven parts. The first part is p, the investigated patients with different diseases; the second part is i, each item in the three different fields; the third part is t, different measuring times; other parts are p×i, p×t, i×t, p×i×t, the interactive effects among patients, items and time. Use the two-factor factorial design ANOVA procedure for processing. There are three fields in D research stage to calculate the relative error, absolute error, generalizability coefficient and reliability index, etc. of the estimated variance components in the physiological, psychological and social function fields.4. Propose the application notes and advantages and disadvantages of IRT and GT in the research of Quality of Life Instruments for Chronic Diseases, comparing with CTT, provide methodological references for future research and development of new generality and specificity scales.5. Statistical Methods Use database software Excel and Foxpro to input and manage data, and use statistical analysis software SPSS15.0 and MULTILOG7.03, etc. to conduct statistical analysis on the data.[Results]Part I Item Response Theory1. One-dimensional Nature In this research, IRT analysis is conducted respectively in three fields:physiological function, psychological function and social function. The results:before the therapy, physiological function:the ratio of the first characteristic value and the second characteristic value is 2.6, basically meeting the one-dimensional nature requirement; psychological function:5.7, completely meeting the one-dimensional nature requirement; social impact of social function:2.3 and social support of social function:3.0, meeting the one-dimensional nature requirement. After the therapy, physiological function:the ratio of the first characteristic value and the second characteristic value is 2.9, basically meeting the one-dimensional nature requirement; psychological function:6.0, completely meeting the one-dimensional nature requirement; social impact of social function:2.9 and social support of social function:3.26, meeting the one-dimensional nature requirement. One-dimensional test results in the two surveys indicate that this scale can be analyzed adopting Item Response Theory.2. Difficulty and Discrimination Thirty items of general module are analyzed on three fields (physical function, psychological function and social function). The results of difficulty and discrimination in different fields of general module of Quality of Life Instruments for Chronic Diseases in the two tests indicate that the time 1 difficulty of item is-2.88~2.27; the time 2 minimum difficulty of items SO4 and SO5 is less than-3.0; the maximum difficulty of PH5 item is more than 3.0. Except these three items, the difficulty range of all other items is-2.93-2.93. Thus, it indicates that the difficulty of the general module of QLICD is moderate. In addition, the discrimination of 30 items is 0.63-1.88, more than 0.3, each item appears one-way increasing from level 1-4, which indicates that the discrimination of 30 items of general module of QLICD is good. Each item appears one-way increasing, without reverse threshold value.3. Item Information Quantity The average information quantity range is 0.37~0.99, of which the average information quantity of physiological function field is 0.38; that of psychological function field is 0.80; that of social function field is 0.48. Thus, that of physiological function field is the least, and that of psychological function field is relatively higher; among the 11 items of social function field, the information quantity of SO1, SO3 and SO6 is too low to be directly chosen. According to the information quantity and characteristics of each item,24 good items are chosen from the 30 items, of which 17 items have the information quantity over 0.47 and are chosen directly. To ensure the integrity of each field of general module, PH2, PH6, PH7, PH8, SO1, SO9, SO11 are reserved.4. Item Characteristic Curve The curve shows that the probability values of probability curve of physiological function field PHI-PH8 items are less than those of psychological function field items, and the peak values are generally low, the peak values of a small number of items nearly coincide, which indicates that the discrimination of different options is not strong and the options of items in physiological field of the first version of general module need to be further revised. Peak values PS1-PS11 of psychological field have obvious hierarchy and a relatively large peak range, which indicates that the probability of selection is relatively large, with the information quantity all above 0.47, so these 11 items can be directly chosen into the scale. As to the probability curves SO1-SO11 of social function field, the discrimination of SO1 and SO3, SO6 and SO9 is a little low, the peak values of other curves are relatively higher.PartⅡGeneralizability Theory1. Total Universe of Generalizability In the total universe of generalizability, the G research indicates that:the variation effectα2(p)of the research objects is the largest, which is 4.82, accounting for 68% of the total variance, which shows that the contribution of the research objects is the largest, matching the expected results; the fitting results are ideal. The proportion of item factor is small, which indicates that different items have very high consistency. The variationα2 (t) of time factor t is only 0.01, accounting for 0.14%, which indicates that the time factor in the two investigations has little influence on the overall results and the overall reactivity of the patients to the two investigations is good.The total universe of generalizability D research indicates that:When choosing different number of items (20,25,30,35,40), the interaction between subjects and items, between subjects and time, among the subjects, time and items, and the relative errorσ2(δ)and absolute errorσ2(△) are all less than 1. Besides, the error variance of the estimated mean of the subject samples observed scores and the mean of all subjects’universe scores are small; both the generalizability coefficient Ep2 and reliability indexΦare more than 0.9, which indicates that QLICD-GM (V1.0) has good reliability and validity. When the sample size in the universe of generalizability increases gradually, all variance components of other effects gradually decrease, except that the variance components of the subjects have no change. Even when the same size is 20, the generalizability coefficient is 0.9905>0.9. But when the sample size gradually increases from 35 to 40, there is no obvious change in generalizability coefficient, only increasing by 0.0001. Thus, to reach good reliability, in practical work, it is suggested to choose about 35 items for general module.2. Physiological Function Field Physiological field G research results indicate that:the variation effect of research objects is the largest,14.61, accounting for 81% of the total variance. The relative error range of the eight items of physiological field is 0.2203-0.2698, absolute error range 0.2313-0.2894, all less than 0.3. Both the generalizability coefficient and reliability index are more than 0.98, which indicates that the fitting result is ideal and the reliability of each item of physiological field is good. This result is consistent with the results of the test-retest reliability, split-half reliability and Cronbach coefficientαbased on CTT.3. Psychological Function Field Psychological field G research results indicate that:the variation effect of the cross effects among the research objects, items and measuring time is the largest, accounting for 48.96%, while the research objects’effect only accounts for 40%, different from the results of the physiological field. D research results indicate that with the increasing of the number of items, the generalizability coefficient and reliability index increase gradually. When the number of items reaches 11, the generalizability coefficient will reach 0.9886; when the number of items increase from 11 to 13, the generalizability coefficient will increase to 0.9897; the increasing amplitude is small after the number is more than 13, which indicates that in the psychological field, the number 11-13 is better; the number of items may be appropriately increased, so that the scale reliability will be higher. All reliability indexes are more than 0.95, which indicates that the reliability of all items in psychological field is very good.4. Social Function Field In the social function field, the variation effect of the cross effects between the patients and items is the largest, accounting for 37.14%, followed by the cross effects of the patients, items and time, accounting for 33.9%, thirdly the patients’effect, accounting for 27.10%. The item fitting result is acceptable, but the cross effects between different patients and items are too large, some items of the general module in social field need to be further revised, so as to enable patients with different types of diseases to keep high and consistent response to items.[Conclusions]1. Both the IRT and GT analysis can be better fitted and applied into the development and research of Quality of Life Instruments for Chronic Diseases. IRT and GT analysis are able to comprehensively evaluate the general module of quality of life instruments, with great development potential and good application prospects。2. The CTT analysis indicates that the total reliability, validity and responsiveness of QLICD-GM(V1.0) are good, the difficulty and discrimination are moderate。3. The results of IRT and GT indicate that among the three fields of QLICD-GM, the relative information quantity of item fitting result in physiological function field is a little poor, and the probability curve is a little lower than that of other two fields, which indicates that the items can not be directly fitted into the research of the next new version, and some of the items should be revised appropriately, but both the Eρ2 andΦof this field are relatively large; in the psychological function field, the item reliability, validity, information quantity, Eρ2 andΦare relatively large, both the relative error and absolute error are small, the 11 items can directly be fitted into the next version; the item fitting results of social function field are acceptable, some of the items have low information quantity, which needs to be adjusted。4. Comparing with CTT, IRT and GT the two methods have their own advantages and disadvantages. IRT and GT can be combined with CTT to develop the new version of general module and specificity scale。

节点文献中: 

本文链接的文献网络图示:

本文的引文网络