节点文献

基于HSK数据对核等值法与其他等值方法的比较研究

A Comparison on the Method of Kernel Equating and Other Equating Methods Based on HSK Data

【作者】 罗莲

【导师】 谢小庆;

【作者基本信息】 北京语言大学 , 语言学及应用语言学, 2008, 博士

【摘要】 对测验的试卷进行等值具有重要的意义。等值处理可以提高测验分数报告和解释的精确性,保证评价标准的稳定性,从而保证测验的质量。核等值法(the kernel method of test equating,KE)是一种新的等值方法。核等值法将基于经典测验理论(Classic Testing Theory,CTT)的线性等值法和等百分位等值方法纳入到统一的框架之中。核等值法通过转换给定考生总体在X卷上的观察分分布,得到Y卷上的观察分分布,因此其本质是观察分等值。一般说来,核等值法有五个步骤,分别是前平滑处理、估计分数概率、连续化、等值、计算等值标准误。核等值法已经在美国教育测验服务中心(Educational Test Service,ETS)得到了应用。在试卷难度相近和考生样组水平相近的假设下,核等值框架下的新方法与CTT方法等值结果存在哪些差异?核等值法框架下的不同方法等值结果是否存在差异?差异程度如何?它是否可以用于HSK考试的等值?为了回答这些问题,本研究基于HSK考试,构建了虚拟的测验,在最大程度上消除误差,根据一定的等值标准,将核等值框架下的新方法与传统的CTT等值方法进行了对比。本研究比较的基于CTT的锚测验设计下的等值方法包括:Tucker、Levine、Braun-Holland、链式线性方法、经过及未经过平滑的链式频数估计等百分位方法、经过及未经过平滑的频数估计等百分位等值方法;基于核等值框架下的等值方法包括:核链式优化值等值法、核链式线性大h值方法、核后分层优化h值方法、核后分层大h值等值法。核框架下每种方法都包含平滑及未平滑两种处理。比较的结论是:在试卷难度有差异且考生样组水平也有差异情况下,在以随机组等百分位等值方法作为标准时,两种框架下的等百分位等值方法有较好的表现,但小样本上链式方法表现欠佳;核等值法与一些基于CTT的等值方法具有一一对应的关系,线性方法无需进行平滑就可以得到与对应的传统线性方法相同的结果;核等值框架下大样本上核链式方法与核后分层方法、核链式等百分位方法与核链式线性方法、核后分层等百分位方法与核后分层线性方法之间都有较大差异;在小样本上,核链式方法与对应后分层方法、核链式等百分位方法与线性方法、核后分层等百分位方法与线性方法之间大部分时候差异较小,但是经过平滑后可能差异增大。由于现在的HSK考试比1989年时的考试难度大而且考生水平也提高了,因此当样本较小时,可采用CTT框架下经过平滑的频数估计等百分位方法或者核框架下经过平滑的核后分优化h值方法,避免使用链式方法;当样本较大时,可采用的方法有:CTT框架下频数估计等百分位方法以及链式等百分位方法、核框架下的核后分层优化h值方法以及链式优化h值方法。研究还讨论了不同的等值标准和统计指标。根据这些不同的标准,等值方法的比较得到了不同的结论。

【Abstract】 The equating of test scores derived from different test forms is significant. When equating is being carried out, test scores could be reported and explained more accurately. Also, equating keeps the evaluation criterion stable so that the quality of tests could be controlled.The kernel method of test equating is a new equating method. It integrates the linear methods and equipercentile methods based on classic testing theory into one frame. It converts the scores of the given testees’ population on test form X into that of the observed score distribution on test form Y, so it is an observed score equating method. The kernel method of test equating has five steps, including presmoothing, estimation of test score probability, continuation, equating and calculation of standard error of equating. Kernel equating has been in use at Educational Testing Service (ETS) for some time.Is there any difference between the new methods under the KE frame and the traditional equating methods based on CTT? To what extent are KE methods different from those corresponding CTT methods? Is there any difference between methods under KE frame? Shall the KE equating methods be used in HSK equating? In order to answer these questions, this study constructs new test forms based on real HSK data to remove error, and comparison has been done in line with some equating criteria.Sixteen methods have been compared in this study, including 8 CTT methods—Tucker, Levine, Braun-Holland, Chain linear equating, presmoothed and unpresmoothed chain equipercentile method, frequencey estimation method, and 8 KE methods—KE chained with optimal bandwidth (CE optimal), KE chained with large h bandwidth (CE linear), KE poststratification with optimal bandwidth (PSE optimal), KE poststratification with large bandwidth (PSE linear)—each method under KE has two treatments, either presmoothed or unpresmoothed.The result shows that KE methods approximate their corresponding methods based on CTT under NEAT design. With random group equipercentile method as a criterion, the equipercentile methods under both CTT and KE frames perform well, but the chained equating method should be avoided for small samples; kernel linear mehods could produce the same results as the CTT methods without presmoothing. For large samples, the CE and PSE methods, the corresponding methods with optimal and large h values yield different results, and the differences are significant from zero. For small samples, the corresponding methods might produce similar results without presmoothing. Presmoothing plays an important role in the equating of smaller samples.Since the present test forms of HSK are more difficult than that of 1989, and the testing groups are higher achieving than before, this study makes the following proposal: the frequency estimation equipercentile and presmoothed PSE with optimal h value are better choices for small samples; the frequency estimation equipercentile methods, the chained equipercentile methods, PSE and CE with optimal h values work better for large samples.In this study, different equating criteria and statistic indexes are also discussed, and it is found that the comparisons based on different equating criteria might lead to different conclusions.

  • 【分类号】H19
  • 【被引频次】2
  • 【下载频次】403
节点文献中: 

本文链接的文献网络图示:

本文的引文网络