节点文献

候选基因多态性与慢性苯中毒的关联分析研究

Population-based Association Analysis of Candidate Polymorphism with Chronic Benzene Poisoning

【作者】 金如锋

【导师】 夏昭林; 朱怡良; 王中民;

【作者基本信息】 复旦大学 , 劳动卫生与环境卫生学, 2010, 博士

【摘要】 候选基因多态性与慢性苯中毒的关联分析研究苯主要通过其代谢产物引致接触者的血液毒性和遗传毒性。慢性苯中毒的发生存在着遗传易感性。代谢酶基因、DNA损伤修复基因、细胞周期调控基因的多态性与慢性苯中毒遗传易感性的关系,本课题组已进行了多次单独的研究探讨。我们在研究中注意到以下几个问题:①以往的研究结果是建立在对苯的代谢、DNA损伤修复、细胞周期调控分别进行分析的基础上得到的。由于苯中毒的发生与以上3个方面均可能有关,如果仅分析某一机制的作用,则可能使研究结果产生偏倚。②单个SNP的作用可能是非常微弱的,研究者在进行统计分析时,需要考虑SNP的交互作用。但是,当样本量固定,而研究的SNP个数逐渐增多,在考虑交互作用时,则估计的参数呈指数形式增加,进行Logistic回归会导致非常大的估计偏差。③在所研究的候选基因多态中,哪些对苯中毒发生的影响比较大,哪些有潜在的影响,目前还不明确。应该选取重要的变量来预测苯中毒发生的概率,从而提高预测的准确性。针对上述问题,本次研究在职业流行病学调查和实验室研究的基础上,综合分析了候选基因多态与慢性苯中毒的关联,包括基因的主效应、基因与基因的交互作用、基因与环境因素的交互作用。分析策略主要为:①在单个位点分析时控制多重检验的阳性错误发生率(FDR),并筛选SNP用于Logistic回归模型,以分析主效应和2阶交互效应;②控制吸烟等混杂因素的影响,分析单体型与慢性苯中毒之间的关联;③采用数据挖掘中的多因子降维法(MDR)进一步分析高阶交互作用;④采用随机森林(Random Forest)分析各个SNP影响苯中毒发生的重要性,并根据重要性评分,筛选相对重要的SNP以利今后进一步研究。为探讨候选基因多态性与慢性苯中毒的关联,我们开展了病例-对照研究,病例组为152名苯中毒工人,对照组为152名调查时正在从事接苯作业而没有慢性苯中毒表现的工人。采用Dosmeci等的接触评估方法,对研究对象的累积接触水平进行了评估,并调查了他们吸烟、饮酒等情况。职业流行病学调查结果表明,病例组和对照组的一般特征如性别、民族、年龄、工龄和累积接触评分在两组人群中的分布无统计学差异,表明病例组和对照组均衡可比。根据苯中毒的发生机制,检测了3类基因的SNP。代谢酶基因:CYP2E1、MPO, NQO1、GSTT1、GSTM1、GSTP1、EPHX1、EPHX2、UGT1A6、UGT1A7、SULT1A1、CYP1A1、CYP2D6,共33个SNP。DNA损伤修复基因:hMTH1、hOGG1、hMYH、XPD、APE1、XRCC1、ADPRT、XRCC2、XRCC3、共14个SNP。细胞周期调控基因:p53, p21、mdm2、gadd45α、p14ARF,共13个SNP。通过排除未检出变异基因型的以及严重偏离HWE的SNP,最后共有51个SNP用于关联研究(代谢酶基因、DNA损伤修复酶基因、细胞周期调控基因分别有33个、10个、8个SNP)由于SNP个数较多,在进行Logistic回归时考虑交互作用的策略如下:首先筛选出单变量检验P值小于0.05的SNP作为自变量,环境因素为协变量;再用Forward法建立Logistic回归模型分析各SNP的主效应、2阶交互作用;最后,将以往研究得出的无主效应但有交互作用的SNP及其交互项也纳入到Logistic回归模型(Backward法)中,分析其在新的模型中是否还存在交互作用。Logistic回归分析结果如下。①未检测到主效应的有以下40个SNP。代谢酶基因:CYP2E1所有3个SNP; GSTT1缺失多态;GSTM1缺失多态;MPO的所有4个SNP; NQO1 rs113134、rs1800566; EPHX1 rs2234922、rs1051741、rs2854451、rs3738047; EPHX2 rs751141; UGT1A6的所有7个SNP; UGT1A7 rs11692021; SULT1A1 rs9282861; CYP1A1 rs4646421、rs4646422、rs1048943。DNA损伤修复基因:XPD rs13181; APE1 rs1130409; XRCC1 rs25487; ADPRT的rs1136410; XRCC3的rs861539。细胞周期调控基因:p53的所有3个SNP;p14ARF rs3731217、rs3088440; gadd45αrs581000、rs532446;②检测出有主效应的有如下11个SNP。代谢酶基因:GSTP1 rs947894、EPHX1 rs1051740; CYP1A1rs4646903:CYP2D6*10 rsl065852和rsl 135840。DNA损伤修复基因:hMTH1 rs4866、hOGG1 rs1052133、hMYH rs3219489、XPD rs1799793、XRCC1 rs1799782。细胞周期调控基因:p21 rs1059234。③检出的交互作用有如下10种:CYP2E1rs3813867与EPHX1 rs3738047; EPHX1 rs3738047与饮酒;GSTP1 rs947894与饮酒;CYPlAl 4646903与CYP2D6 rs1135840; hMTH1 rs4866与XRCC1 rs1799782; hOGG1 rs1052133与hMYH rs3219489; hMYH rs3219489与吸烟;XRCC1 rs1799782与APE1 rs1130409; APE1 rs1130409与饮酒;hOGG1rs1052133与XPD rs1799793.与以往研究结果不同的是:①本次研究新检测出了有主效应的2个SNP:GSTP1 rs947894和hMYH rs3219489。②本次研究新检测出的交互作用有4种:CYP2E1 rs3813867与EPHX1 rs3738047; XRCC1 rs1799782与APE1 rs1130409;CYPlAl rs4646903与CYP2D6 rs1135840; hMTH1 rs4866与XRCC1 rs1799782。③以往研究中有交互作用,在本次研究未得到验证的有:NQOlrs1800566与吸烟或饮酒的交互作用。本次研究的单体型分析考虑了吸烟、饮酒、苯接触强度等协变量的影响。与以往研究相同的是,均检出CYP2D6*10的单体型与慢性苯中毒存在关联,携带CC单体型的个体与携带TC单体型相比,易感性增加。而以往研究的单体型与苯中毒存在关联,本次研究没有得到验证的有:EPHX、UGT1A6、CYP1A1、XRCC1等单体型。对代谢酶基因多态性研究的MDR模型中,我们发现了1个3阶交互作用,即CYP1Al rs4646903、CYP2D6 rsl065852、CYP2D6 rsll35840的3因子组合。有3种组合被归入高危组:①CYP2D6 rs1135840的CC基因型、CYPlAl rs4646903的TC+CC基因型、CYP2D6 rsl065852的CT+CC基因型的个体;②CYP2D6rs1135840的CC基因型、CYPlAl rs4646903的TT基因型、CYP2D6 rsl065852的CT+CC基因型的个体;③CYP2D6 rs1135840的CG+GG基因型、CYPlAlrs4646903的TT基因型、CYP2D6 rsl065852的CT+CC基因型的个体。其它5种组合属于低危组。对DNA损伤修复、细胞周期调控基因多态性研究的MDR模型中,未发现3阶以上的高阶交互作用。hOGG1 rs1052133和XPD rs17997933因子组合,是预测效果最佳的2阶交互作用,hOGG1 rs1052133的GG基因型和XPD rs1799793的GG基因型的个体属于高危组,其它3种基因型组合属于低危组。对全部3类基因多态性研究的MDR模型中,最佳的因子模型与对代谢酶基因多态性研究的MDR模型一致。随机森林法对代谢酶、DNA损伤修复酶和细胞周期调控基因多态性影响慢性苯中毒的重要性排序结果显示,重要性从大到小排列的前25个SNP或环境因素依次为:EPHX1 rs1051740、hOGG1 rs1052133、CYP2D6 rs1065852、CYP1A1 rs4646903、CYP2D6 rs1135840、p21rs1059234、XPD rs1799793、p53 rs17878362、hMYH rs3219489、hMTH1 rs4866、GSTP1 rs947894、EPHX1 rs1051741、CYP2E196-bp插入、MPO rs7208693、EPHX1 rs3738047、XRCC1 rs25487、SULT1A1rs9282861、吸烟、UGT1A6 rs6759892、UGT1A7 rs11692021、XRCC1 rs1779782. NQO1 rs1800566、XPD rs13181、UGT1A6 rs2070959、GSTT1。重要性评分最高的EPHX1 rs1051740是如何通过交互作用对慢性苯中毒产生影响的,还需要进一步研究。除了p53 rs17878362外,排在第2至第11位的9个SNP中,均在Logistic回归分析中检测到,且大部分无主效应,但存在交互作用。对于Logistic回归分析中由于样本量的限制而未能检测到效应的SNP,随机森林的结果能进一步检测它们的重要性,从而提示哪些SNP最有可能与慢性苯中毒有关联,以筛选出这些相对重要的SNP,再加大样本量进行研究根据本次研究结果,发现对代谢酶基因、DNA损伤修复基因、细胞周期调控基因的多态性与慢性苯中毒关联的综合研究结果与以往的研究结果基本相同,这证明我们已有的研究结论是比较可靠的。本次研究得到的单体型关联分析的结果、高阶交互作用分析的结果、以及各候选基因多态性相对重要性的比较,是对已有研究结果的进一步补充,有利于进一步阐明候选基因多态性与慢性苯中毒的关系。综合本次研究结果,发现与慢性苯中毒发生关系密切的代谢酶基因、DNA损伤修复基因、细胞周期调控基因的多态性与个体慢性苯中毒存在关联。基因-基因、基因-环境交互作用是影响个体慢性苯中毒遗传易感性的重要方式。这些研究结果可为更好地做好接苯工人的健康监护工作和筛选有效的易感性生物标志提供理论依据。本次研究结果中,也有很多SNP未发现有主效应或交互作用,一方面可能这些SNP的作用确实比较微弱或者缺乏,另一方是可能是样本量较小的原因。因此本课题组的下一步工作将是进行SNP的筛选(例如,在随机森林方法对各SNP重要性排序的提示下,依靠生物学意义和统计学变量筛选方法开展筛选工作),以及扩大样本量进一步研究。

【Abstract】 Benzene-induced hemotoxicity and gentoxicity depends mainly on its metabolites. Chronic benzene poisoning (CBP) is associated with genetic polymorphisms. The association of genetic polymorphisms of toxicant-metabolizing enzymes genes, DNA repair genes, cell cycle control genes with CBP have been extensively studied by this group.There are some challenges in our studies. The first issue is that the results of our previous studies were achieved respectively on polymorphisms of toxicant-metabolizing enzymes gene, DNA repair gene, cell cycly control gene. These results may be biased due to confounding factors. The second issue is that interaction effects should be considered in the model while the main effect of single SNP may be smaller or absent. However the parameters to be estimated in Logistic regression model will increase exponentially with the number of SNP increase linearly. This will result in inaccurate parameter estimates for interaction effects. The third issue is that we are uncertain about the relative importance among all of polymorphism we have studied, though the selection of important variable is of great significance to achieve high prediction accuracy in estimating the probability of CBP.To resolve these issues, this study have analyzed the association of candidate polymorphisms with CBP comprehensively based on occupational epidemiological investigation and laboratory studies, including main effect, gene-gene interaction, gene-environment interaction. The statistics strategy is shown as follows. We controlled the false discovery rate (FDR) to adjust for multiple testing in single-locus method. Followed by SNP selection, Logistic regression model have been fitted to analyze main effect and 2-order interactions. Furthermore, we conducted the haplotypes associaton analysis with CBP while controlling confounding factor, such as smoking. Multifactor dimensionality reduction (MDR) was used to analyze high order interactions. Random forest was used to measure the relative importance of each polymorphism which may be associated with CBP. Based on variable importance, SNP could be screened for further studies according to variable importance.In order to elucidate the association of candidate polymorphism with CBP, a case-control study was designed and conducted.152 CBP patients and 152 workers without poisoning manifestations but occupationally exposed to benzene were investigated. The cumulative exposure level was estimated with method described by Dosmeci, and the lifestyles such as cigarette smoking and alcohol consumption were also explored. The results of occupational epidemiology showed that there was no statistic difference for distribution of sex, race, age, workage and cumulative exposure level in case and control groups, which indicated that it was comprehensive equilibrium between the case and control groups.According to mechanism of benzene toxicity, we have dected 3 pathway genetic polymorphisms. For toxicant-metabolizing enzymes genes,33 SNPs including CYP2E1, MPO, NQO1, GSTT1, GSTM1, GSTP1, EPHX1, EPHX2, UGT1A6, UGT1A7, SULT1A1, CYP1A1, and CYP2D6 were detected. For DNA repair genes,14 SNPs including hMTH1, hOGGl, hMYH, XPD, APE1, XRCC1, ADPRT, XRCC2, and XRCC3 were detected. For cell cycle control genes,13 SNPs including p53, p21, mdm2, gadd45a, and p14ARF were detected. Finally,51 SNPs (33 SNPs of toxicant-metabolizing enzymes,33 SNPs of DNA repair gene,8 SNPs of cell cycle control gene) were retained for further statistical analysis after excluding SNPs which had not variant allele or deviated form Hardy-Weinberg Equilibrium.Because of the larger number of SNP involved in the previous studies, interaction effect was introducd into Logistic regression model as the follows. Firstly, these SNPs with P value less than 0.05 were introduced into Logistic regression model as indepent variables while environmental factors treated as covariate. Then a Logistic regression model with forward stepwise was fitted to detect main effects and 2-order interactions. Finally, a backward stepwise Loggistic regression model was fitted again when we introduced these SNPs with interaction effect but no main effect in our previous studies.The results of Logistic regression are shown as follows. Main effects have not be detected for 40 SNPs. For toxicant-metabolizing enzymes gene, there were 3 SNPs of CYP2E1, GSTT1, GSM1,4 SNPs of MPO, NQO1 rs1131341 and rs1800566, EPHX1 rs2234922, rs1051741, rs2854451, rs3738047, EPHX2 rs151141,7 SNPs of UGT1A6, UGT1A7rsl 1692021, SULT1A1 rs9282861, CYP1A1 rs4646421, rs4646422, and rs1048943. For DNA repair gene, there were XPD rs 13181, APE1 rs 1130409, XRCC1 rs25487, ADPRT rs 1136410, XRCC3 rs861539. For cell cycle control gene, there were 3 SNPs of p53, p14ARF rs3731217, rs3088440, gadd45a rs581000, rs532446.Totally 11 SNPs were detected with main effects. For toxicant-metabolizing enzymes gene, there were GSTP1rs947894, EPHX1 rs1051740, CYP1A1 rs4646903, CYP2D6*10 rs1065852 and rs1135840. For DNA repair gene, there were MTH1 rs4866, hOGG1 rs1052133, hMYH rs3219489, XPD rs1799793, XRCC1 rs1799782. For cell cycle control gene, there were p21 rs1059234。10 types of interaction effects were detected as follows: CYP2E1 rs3813867 with EPHX1 rs3738047, EPHX1 rs3738047 with alcohol consumption, GSTP1 rs947894 with alcohol consumption, CYP1A1 rs4646903 with CYP2D6 rs1135840, hMTHl rs4866 with XRCC1 rsl799782, hOGG1 rs1052133 with hMYH rs3219489, hMYH rs3219489 with cigarette smoking, XRCC1 rs1799782 with APE1 rs1130409, APE1 rsl 130409 with alcohol consumption and hOGG1 rs1052133 with XPD rsl799793.Compared to our previous results, there were some differences. In the present study, we had 2 SNP showing main effect, i.e. GSTP1 rs947894 and hMYH rs3219489 which were not included in our previous results. Furthermore, we have detected 4 types of interactions that were not included in our previous results, i.e. CYP2E1 rs3813867 with EPHX1 rs3738047, XRCC1 rsl 799782 with APE1 rs1130409, CYP1A1 rs4646903 with CYP2D6 rs1135840, hMTHl rs4866 with XRCC1 rs1799782. The previous positive results of interactions between NQO1 rs1800566 and cigarette smoking or alcohol consumption disappeared in the present study.After controlling confounding factor such as smoking, the haplotype associaton analysis with CBP detected an association between haplotype of CYP2D6*10 with CBP the same as our previous results. The individuals with CC haplotype would be more susceptible to CBP than TC haplotype. Contrary to previous results, the haplotype of EPHX1, UGT1A6, CYP1A1, and XRCC1 did not show any association with CBP either in the present study. For toxicant-metabolizing enzymes genes, we detected a positive 3-order interaction based on MDR model. It is the 3-factor combination of CYP1A1 rs4646903, CYP2D6 rs1065852 and CYP2D6 rs1135840.3 types of individuals would be classified to high risk group, i.e. the individuals with CC genotype of CYP2D6 rs1135840, TC or CC genotype of CYP1A1 rs4646903, CT or CC genotype of CYP2D6 rs1065852, the individuals with CC genotype of CYP2D6 rs1135840, TT genotype of CYP1A1 rs4646903, CT or CC genotype of CYP2D6 rs1065852, and the individuals with CG or GG genotype of CYP2D6 rs1135840, TT genotype of CYP1A1 rs4646903, CT or CC genotype of CYP2D6 rs1065852 would be classified as high risk group. The other 5 types of individuals would be classified to low risk group.For DNA repair genes and cell cycle genes, we had not detected high-order interaction more than 2-order based on MDR model. The 2-factor combination of hOGGl rs1052133 and XPD rsl7997933 was the best 2-order interaction with the highest prediction accuracy. The individuals with GG genotype of hOGG1 rs1052133 and GG genotype of XPD rs1799793 would be classified to high risk group. The other 3 types of individual would be classified to low risk group.When 3 pathway genetic polymorphisms were considered, the best n-factor combination was the same as the result of toxicant-metabolizing enzymes gene, namely the 3-facotrs combination of CYP1A1 rs4646903, CYP2D6 rs1065852 and CYP2D6 rs1135840.According to the variable importance score analyzed by Random forest model while 3 pathway genetic polymorphisms were considering, the 25 top important SNPs or environmental factors are shown as follows, EPHX1 rs1051740, hOGGl rs1052133, CYP2D6 rs1065852, CYP1A1 rs4646903, CYP2D6 rs1135840, p21 rs1059234, XPD rs1799793, P53 rsl7878362, hMYH rs3219489, hMTHl rs4866, GSTP1 rs947894, EPHX1 rs1051741, CYP2E1 96-bp insert, MPO rs7208693, EPHX1 rs3738047, XRCC1 rs25487, SULT1A1 rs9282861, cigarette smoking, UGT1A6 rs6759892, UGT1A7 rs11692021, XRCC1 rs1779782, NQO1 rs1800566, XPD rs13181, UGT1A6 rs2070959, and GSTT1. For the most important SNP, EPHX1 rs1051740, its role of CBP should be further studied. Except for p53 rs17878362, the top 2 to top 11 important SNP had been detected in Logistic regression, and most of them did not shown main effects but interaction actions. For these undetectable SNP in Logistic regression limited by small sample size, Random forest could be used to test its relative importance and SNPs would be screened for further study. Based on our present study, we found that the results of association of polymorphisms related to toxicant-metabolizing enzymes genes, DNA repair genes, cell cycle genes with CBP were consistent with our previous results. These results further supported our previous results. The present results on haplotype association, high-order interaction, and relative importance of SNPs were a beneficial supplement to our previous results.In conclusion, we found that the polymorphisms of toxicant-metabolizing enzymes genes, DNA repair genes, cell cycle genes were associated with CBP. Gene-environment interaction, gene-gene interaction were important mechanism to genetic polymorphism of CBP. Results in our study will provide theoretical evidence for health surveillance and screening effective biomarkers of susceptibility. In this present study, the main effects or interaction effects for most of SNPs were undetectable. The reasons may be attributed to the faint or absent effects of these SNPs on CBP, and limited sample size which would lead to low power of test. Our future research will focus on the screen of SNPs based on statistical variable selection method and biological mechanism, and expansion of the sample size.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2010年 11期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络