

Statistical Methods of Two-stage Cluster Sampling on Quantitative Sensitive Questions Survey and Its Application

【作者】 朱宏儒

【导师】 高歌;

【作者基本信息】 苏州大学 , 流行病与卫生统计学, 2011, 硕士

【摘要】 目的:在抽样调查中,若关心的变量或特征是涉及个人隐私或不被社会舆论认可的敏感性问题,则采用直接调查的方法就会使部分被调查者出于保护自我隐私的心理而产生一定程度的不合作甚至拒绝回答或虚假回答,从而使调查结果难以反映总体的真实特征。1965年,Warner通过引入随机化装置,成功实现了在不暴露应答者隐私的情况下获得人群中某敏感性问题的发生比例,开创了随机应答技术(Randomized Response Technique,RRT)的先河。几十年来,随机化回答技术经过不断的发展,不断地被改进并出现了一些新的调查方法。然而,在本项目组研究之前,国内外对敏感问题抽样调查设计研究,主要局限于简单随机抽样,实际应用也主要局限于小范围特殊人群小样本的简单随机抽样调查,或将复杂抽样方法的调查资料误用简单随机抽样调查的有关公式来统计分析,且对于敏感问题抽样调查的信度与效度评价也极少研究。本文选定了加法模型、乘法模型、无关联问题模型三种数量特征敏感问题随机应答技术,旨在探讨在二阶段整群抽样条件下应用随机应答技术调查数量特征敏感问题的统计方法,科学估计北京市艾滋病高危人群——男同性恋人群的有关总体特征,并通过应用实例和计算机模拟调查,对本文研究的调查方法及其统计公式进行信度评价,为大规模复杂抽样条件下进行数量特征敏感性问题的调查提供科学的、可靠的调查方法及其统计量计算公式,为制订艾滋病、性病预防控制规划、措施提供科学的调查数据。方法:本文根据数理统计学的基本理论、方法,全概率公式以及随机应答技术理论,推导在二阶段整群抽样条件下应用加法模型、乘法模型、无关联问题模型三种RRT调查数量特征敏感性问题时总体均值的估计量及其方差的计算公式。于2010年8至10月,采用二阶段整群抽样方法,随机抽取北京市6个区30个男同性恋活动场所,对其1523名男同性恋者应用RRT加法模型进行男男性行为情况的调查,使用本文推导出的数量特征敏感性问题二阶段整群抽样调查的有关公式对此调查资料作统计计算,且首次通过对抽样过程的蒙特卡洛法计算机模拟调查来评价本文所研究统计方法的可靠性。结果:本文推导出数量特征敏感问题加法模型、乘法模型及无关联问题模型在二阶段整群抽样调查条件下总体均值的估计量及其方差的计算公式。应用本文提供的数量特征敏感问题二阶段整群抽样的调查方法及统计公式,调查计算得北京市男同性恋人群:首次发生男男性行为的平均年龄为20.24岁;每月发生男男性行为的不同男性性伴的平均个数为2.09个;每月发生男男性行为的平均次数为4.72次。蒙特卡洛计算机模拟抽样调查结果与实际调查结果的差别,经假设检验P值均大于0.1,无统计学意义。结论:本研究将抽样技术的理论和随机应答技术的理论相结合,首次推导出在二阶段整群抽样条件下应用RRT模型调查数量特征敏感性问题时总体参数的估计量及其方差的计算公式,具有创新意义;并成功应用于北京市男男性行为发生情况的调查;蒙特卡洛计算机模拟抽样调查结果表明本文研究的调查方法及其统计公式信度较高,在复杂抽样条件下应用随机应答技术调查敏感性问题具有广泛的应用前景。

【Abstract】 Objective:If a question in a sampling survey is sensitive or highly personal, it is likely to lead either to refusals or to untruthful answers by using the traditional method of direct interview because of the respondent’s concern about revealing their privacy, which makes it difficult to acquire the real character of the population. By ingenious use of a randomizing device, Warner (1965) showed that it is possible to estimate the proportion without the respondents revealing their personal status with respect to the sensitive questions and thus introduced a new method for the sensitive questions survey—randomized response technique(RRT). Over the past few decades, a number of modifications of Warner’s method as well as several other new methods have been emerged in the literature of randomized response. But, before our research project, most of the RR procedures available in the literature are developed and studied with the restriction that the sample is selected by simple random sampling. In the applications of RRT on sensitive questions, the formulas for simple random sampling are abused when the sample is selected by stratified sampling, cluster sampling or other relatively complicated sampling methods. What’s more, the study on assessing the reliability and validity of the investigation on sensitive questions with RRT is seldom reported. In this regard, we select three RRT methods of Additive model, Multiplicative model, and Unrelated model, and aim to explore the feasibility of the methods to investigate quantitative sensitive issues with the sample selected by two-stage cluster sampling, and to estimate the population character of MSM of Beijing city. Meanwhile, the reliability of the methods is assessed by the application example as well as simulative sampling by computer.Method: Total probability formula and the theory of RRT was employed to deduce the formula for the estimator of the population proportion and its variance when the three RRT methods are applied to investigate quantitative sensitive issues with the sample selected by two-stage cluster sampling. In the following survey, from August to October, 2010, 30 chambers of MSM from 6 districts of Beijing city are randomly selected by two-stage cluster sampling, and all the 1523 MSMs from these chambers are surveyed by Additive model of RRT. Monte-Carlo simulative survey is performed to evaluate the reliability of the methods above.Results: In the condition of two-stage cluster sampling and three RRT models above-mentioned, the formulas to calculate the estimator of population’s parameter and its variance are conducted. And the results of the three RRT models are consistent on the whole. The results of our application sample are: in Beijing city, the average age when MSM had sex with a man is 20.24; the average of sexual partners of MSM per month is 2.09; and the average of sexual behavior between men is 4.72 for every MSM of Beijing city per month. The difference between Monte-Carlo simulative survey and application sample is not significant in statistical test (P>0.1).Conclusion: With the RRT models and the formulas we deduced, we provide the method for the first time to calculate the estimator of the population parameter and its variance in quantitative sensitive issue survey under the situation of relatively complicated sampling method such as two-stage cluster sampling. The survey about MSM of Beijing city by two-stage cluster sampling and RRT Additive model is performed successfully and the result of Monte-Carlo simulative survey show that our survey methods and formulas are reliable. RRT has an extensive application in sensitive issue investigation on a large scale.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2012年 06期