节点文献

论语言测试效度的辩论方法

On Argument-Based Lauguage Test Validation

【作者】 邓杰

【导师】 邹申;

【作者基本信息】 上海外国语大学 , 英语语言文学, 2011, 博士

【副题名】辩论逻辑与效度解释

【摘要】 本文主要研究语言测试效度的辩论方法,或者说,基于辩论的语言测试效度验证方法(Argument-based Approach to Language Test Validation),总体上包括两个方面的内容:辩论逻辑和效度解释,具体又分为五个方面进行讨论。第一,本文研究了几个较有代表性、影响较大的辩论框架的辩论逻辑,着重分析了其中的逻辑缺陷及其产生的根源。所研究的框架都声称其逻辑结构为哲学研究领域的辩论模型——Toulmin模型,但在应用Toulmin模型时又都对其基本结构进行了修改。修改后的模型,其中的推理过程有可能陷入一个永无止境的“死循环”(an endless loop),辩论过程也有可能变成为一个自相矛盾的自我辩论过程,并且“声明”(Claim)也不再是声明而实为假设。没有声明的模型,本质上已不再是辩论模型。不过,修改版中虽有假设,却并不是假设检验模型,因为其中没有接受或拒绝假设的条件判断机制。进一步研究还发现,逻辑错误的产生主要是因为误解和误用了Toulmin模型中的“反驳”(Rebuttal)所致。第二,本研究提出了一个累进辩论结构(Progressive Argument Structure),并强调通过累进辩论的方法对测试效度进行解释。累进辩论结构不仅修正了相关辩论框架的逻辑错误,还将科学调查中的数据分析(Data Analysis)手段纳入到了理性辩论的逻辑推理过程之中。效度验证往往会涉及各种各样的复杂数据,多数情况下仅凭主观逻辑推理难以得出合理结论。通过在模型结构中增加一个条件判断要素和一个数据分析要素,就可以在进行逻辑推理前对理由(Warrant)是否充分进行判断。如果有充足的理由,则按Toulmin模型结构进行推理,否则即进行数据分析,以产生新的、更有说服力的证据数据。这种设计使得模型具有了递归机制(a recursion mechanism)。递归的结果会产生一系列声明,并且这些声明形成一个层级结构,一个声明的形成以前一个声明为基础,最后声明是所有前期声明层级累进的结果。这正是“累进”的真实含义之所在。第三,本研究提出了以目标构念为中心、以环节效度为基础的累进效度观(a progressive view of validity)。累进效度观强调每个测试环节产生的数据都应该充分体现测试的目标构念,效度也就是数据准确体现构念的程度。效度本质上是一个程度问题,但也存在“有效”和“无效”之分。程度高达可以接受的合理水平即为有效,低至不可接受的水平层次即为无效,有效和无效是对测试的定性评价和基本态度,不能因为效度是一个程度问题而对测试究竟是否有效含糊其辞。累进效度是各个环节效度逐级累进的结果,前任环节是后继环节的基础,一个环节失效,则整体无效。但是,效度累进不同于百分比累积,累进效度不大于最薄弱环节的效度。此外,累进效度辩论不必局限于分数的解释和使用,效度在设计之初即已存在,测前具有预期效度,测后具有实际效度。为了确保测试具有理想的实际效度,测前每个环节都应具有理想的预期效度。每个测试环节都应该进行相应的效度验证,对该环节的效度作出合理解释、做出恰当决策、并预计相应后果。第四,本研究提出了话语信息认知处理能力构念观。本文强调考察能力构念,仅停留在能力结构或认知过程的宏观分类上是不够,还应具体到语篇、深入到语义,从更微观的层面考察考生生成和理解话语信息的能力。并且,为了更好地从语义理解的准确率和速度、语义生成的质量和数量的角度考察语言能力,还必须解决语义的认知量化与计算问题。为此,本文首先在系统论、信息论和控制论思想的指导下,建构了话语信息认知处理系统框架和话语信息认知处理能力模型;然后以计算机面向对象理论为指导,借鉴计算机认识世界事物的方式分析语义的结构形式和计算单位,实现对语义的认知量化和统计计算;最后在语义认知量化的基础上,提出信息最大化命题方法,通过最大化计算、抽样加权、归类整理、题目编写等几个环节,为命题效度辩论提供测试内容证据。第五,通过两个实例,介绍信息最大化命题方法和累进效度辩论法在命题实践中的应用。命题方法实例基于一个150词的短小语篇,编写4道多项择阅读理解题。辩论法实例特别针对选项可猜性这一测试效度的反面解释进行辩论,其主要目的在于介绍如何通过理性辩论与科学调查相结合的方式,对命题效度进行证伪辩论,同时兼顾调查了我国高考命题对选项可猜性的控制情况。此实例调查了3套高考试卷,共计74道多项选择题、259个选项。结果发现,调查卷的选项可猜性比较严重,我国高考命题有必要采取更有效的措施,加强对选项可猜性的控制。由于涉及面广,本研究未能针对各个测试环节深入拓展,信息能力构念研究和信息最大化命题方法也有待于在实践中进行进一步检验。

【Abstract】 Argument-based approach to validating language tests can be traced back at least to the 70s and 80s, from which time more attention began to be drawn to the importance of both verification of positive explanations and falsification of rival hypotheses of test validity. In recent years, argument-based approach has been widely accepted and used in our validation practice. However, as to how to go about arguing, there is no unanimous consensus; on the contrary, hot debates can be found in the recent publications. Following the applications and debates, two aspects of validity arguments are increasingly becoming more of our concern: the logic of argument and the interpretation of validity.Firstly, the present thesis analyzes the logic errors of three most influential argument-based validation frameworks, Assessment Use Argument—AUA AUA (Bachman, 2005, Bachman & Palmer, 2010), Evidence-Centered Design—ECD (Mislevy et al, 2003) and Interpretive Argument—IA (Kane, 1990, 1992, 2004). Because all the three models claim that their argument structure is the Toulmin structure of argument, a comparative study between these models and the Toulmin model (Toulmin, 2003) is carried out. The results show that all have modified to the basic structure of the Toulmin model before applying it to their frameworks and the modifications have caused serious logicality problems: 1) the reasoning process is an endless loop; 2) the argument is a typical paradox; 3) the claim is in fact a hypothesis. When there is no claim, the model is no longer an argument model. But even though there is a hypothesis, the model is not a hypothesis testing model either, because there is no conditional mechanism to decide whether to accept or reject the hypothesis.Further studies show that the causes of the logic errors are similar too. Due to a misunderstanding and misuse of the Toulmin Rebuttal, all counterclaims, including counter explanations and rival hypotheses, are regarded as Toulmin rebuttals. As matter of fact, the Toulmin Rebuttal refers to“the sorts of exceptional circumstance may in particular cases rebut the presumptions the warrant creates”(Toulmin, 2003, p.99, emphases added), which is just like the significance level (α) in the hypothesis testing. By nature, rebuttals belong to low probability events that can be and has to be ignored, but the modified versions emphasize that rebuttals be either verified or falsify before making a claim. This is exactly what causes the logicality problems.Secondly, the thesis proposes a new argument model called the Progressive Argument, which not only possesses a logical reasoning mechanism, but also incorporates scientific inquiry into rational reasoning. As is often the case of rational reasoning, the data must be simplistic and the warrant must be self-evident, in order for the claim to be plausible and easily accepted. But test data is often sophisticated and hardly any conclusion can be drawn without scientific inquiry. In face of complicated test data, data analysis has to be done so that more evidential data can be generated to authorize the logic reasoning process.To that end, the progressive argument embeds in its base structure two more elements in the Toulmin model, a Conditional to direct the reasoning procedure and an Analysis to carry out data analysis. Every time before starting the rational reasoning, the Conditional is invoked to decide whether there are sufficient warrants to authorize the reasoning step. If the condition is satisfied, the process is led into a Toulmin reasoning procedure; and if not, the process is directed into a data analysis procedure to generate new evidence. By including an Analysis element, the model possesses a recursion mechanism, which means that the justification of a claim may involve a recursive use of the Progressive Argument and the claim is based on the progression of all the sub-claims of the recursion steps. This is the reason why the argument is given the name Progressive Argument.Thirdly, this thesis proposes a construct-centered, stage-based progressive view of test validity, shortened as Progressive Validity. According to this view, test validity refers to the progression of the validity of all its stages; stage validity is defined as the extent to which data produced at the stage is an accurate representation of the target construct of the test; and validation is the process of providing evidence to justify claims about stage validity or test validity.The progressive view stresses that data produced at every stage should be representative of the target construct and all stages should be centered on the same construct. That is to say, when collecting data to validate a stage or the test, the evidence must be construct-centered. It also stresses that test validity lays its foundation on stage validity. For a test to be valid every stage has to be valid in the first place; if one stage is invalid, then the whole test is invalid. However validity progression is not like percentage accumulation, the validity of a test is no more than the lowest stage validity. Validity is a matter of degree by nature, but at the same time a stage or test can also be either“valid”or“invalid”. If the degree is high enough so that it is acceptable, then the stage or test is valid; or on the contrary, if the degree is too low to be acceptable, then the stage or test is invalid. By saying that a stage or test is valid or invalid, we do not mean to propose an absolute assertion, but rather a qualitative evaluation that contains our fundamental attitude towards the test.Another point that is of critical importance in the progressive view of validity is that validation should not be limited to score interpretation and use. Validity begins to emerge from the onset of test design. Before the test is administrated, there exists expected validity; after the administration, actual validity comes into being. To guarantee that actual validity is desirable, expected validity has to be justified—plausible interpretations need to be achieved, appropriate decisions to be made, intended and unintended consequences to be anticipated.Fourthly, this thesis advocates, from the perspective of cognitive processing of discourse information, an information processing view of language use ability. It is stressed that it is not enough to have only macro-level classifications of language ability components or cognitive processes, micro-level discourse and semantic analyses play a far more substantial role in language use. To attain a more accurate measure of language use ability, item-writers need to consider to what degree the candidates can process information contained in the test with expected accuracy and speed; raters need to carry out in-depth analyses of the quality and quantity of the discourse generated by the candidates.Information processing requires a practical solution to quantifying and computing the semantic items in specific discourses. Inspired by Systematic, Informatics and Cybernetics, a system framework and an ability model of cognitive processing of discourse information have been constructed and under the guidance of Object-Oriented Knowledge Representation Theory, the semantic structure and semantic unit for computing semantic items are proposed, on the basis of which an algorithm of cognitive quantification of discourse information and an item-writing method called Information-Maximization Item Development (IMID) have also been developed.Fifthly, this thesis includes two application examples to illustrate how to apply IMID and the Progressive Argument in test development stage. In the first example, four multiple-choice items were created by using IMIN method. Each item has four options and all the items are based on the same 150-word passage. The second example is an empirical study designed to develop falsification arguments against option guessability for the purpose of taking control of multiple-choice item-writing quality. The example investigates the listening and reading comprehension parts of 3 NME papers, with a total of 74 items, 259 options. The findings show that more effective measures need to be taken to better control option guessability.Due to the wide range of issues covered in this thesis the present study has to refrain from digging deeper into the different stages of language testing. In the meanwhile, the progressive argument model, information ability model and IMID method all await further research and feasibility test.

  • 【分类号】H09
  • 【被引频次】6
  • 【下载频次】1022
节点文献中: 

本文链接的文献网络图示:

本文的引文网络