节点文献

基因表达谱芯片校正批次效应算法的比较及网络分析在精神分裂症研究中的应用

Evaluation of Six Batch Adjustment Methods in Expression Microarray Data and Application of Gene Co-expression Network in Schizophrenia

【作者】 陈超

【导师】 金力;

【作者基本信息】 复旦大学 , 遗传学, 2011, 博士

【摘要】 基因表达谱芯片作为一种高通量的基因组研究手段,在生物医学领域应用极其广阔。然而,每年有数以千计的基于芯片的研究,其数据都被“批次效应”所混杂。批次效应是指由于芯片在不同的实验批次处理而产生的系统误差。它在以前的芯片研究中鲜有提及。虽然批次效应可以通过缜密的实验设计缓解,但除非所有样本都可以在同一批次中处理完成,否则它不可能消除。我们首先从多个平台的实验数据中证明了了批次效应的存在,并且从多方面解析了该混杂效应对生物因素的严重影响。接下来我们从基因芯片的实验步骤入手,通过详细介绍基因芯片的实验过程,指出批次效应可能的来源。因为批次效应可以严重影响基因表达的实验结果,一系列校正批次的方法被发展出来。对目前比较流行的几种批次校正的方法,我们从方差比例,精度,准度,以及总体评价等方面进行了系统的比较,发现ComBat——一个基于经验贝叶斯的分析方法,多数指标优于其他五个算法,而且针对每个批次中含有样本量较小的数据时仍有优异的表现。我们推荐ComBat作为对不同批次的数据进行批次效应校正的最佳统计算法。另外我们还建议在比较重复样本和非重复样本之间关联的时候,有必要在探针水平进行标准化校正,从而降低非重复样本之间的被虚夸了的相关性。我们的另一部分工作是利用基因表达谱芯片数据探寻精神分裂症的发病机制。目前已经有很多基于基因表达谱芯片的精神分裂症的研究,发现了很多的候选基因,但几乎没有基因可以通过多重校正并且从不同的实验中重复出来。这可能是因为人类大脑基因表达的异质性或因为基因表达在病人中的改变较小.我们设想基于基因基因相互作用的网络或者通路会在病人大脑中的改变会更加一致,在这个研究中,我们利用基因共表达网络来分析不同来源的5组脑组织数据。首先我们对基因表达谱芯片数据进行了严格的质量控制,除了利用ComBat校正批次效应外,我们还通过MAS算法对探针质量进行控制,通过修改的RMA算法剔除单核苷酸多态位点对探针的影响,剔出种族差异对基因表达的影响等。之后我们通过基因共表达网络的方法构建基因网络,利用每组基因网络的特征向量,我们使用了两种不同的统计算法,校正年龄,性别,大脑pH值等变量后,挖掘是否存在某一组基因的表达水平变化与精神分裂症有强关联。结果发现在5组数据中,金属硫蛋白家族的部分基因,MT1E,MT1F,MT1G, MTIM, MTIX, MT2A的表达量在精神分裂症患者中都有显著的提高。如此一致的结果证明金属硫蛋白家族基因确实参与了精神分裂症发病的过程,或是病因,或是症状。金属硫蛋白富含半胱氨酸,在人体中的主要作用是通过结合重金属离子调节体内微量元素,以及神经受损后的免疫反应和氧化应激等。氧化应激已经被报道与精神分裂发病机制有关。已知重金属锌(Zn)在神经发育,情绪控制和保护细胞免受损伤等方面发挥作用。另外其他重金属,铜(Cu)也推测有精神分裂症有关。我们猜测重金属的调控失调,氧化应激和组织受损等可能参与在精神分裂症的发病机理之中。除此之外,我们还从遗传学和表观遗传学角度,分别利用eQTL的方法和DNA甲基化的数据对金属硫蛋白表达量变化进行了简要的分析。

【Abstract】 The expression microarray is a frequently used approach to study gene expression on a genome-wide scale. However, the data produced by the thousands of microarray studies published annually are confounded by "batch effects," the systematic error introduced when samples are processed in multiple batches. Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch. We first stressed batch effects exist and confounded with biological factor in varies microarray platforms, then by outlining the experiment procedures, we pointed out the potential sources of batch effects. A number of programs are now available to adjust microarray data for batch effects prior to analysis. We systematically evaluated six of these programs using multiple measures of precision, accuracy and overall performance. ComBat, an Empirical Bayes method, outperformed the other five programs by most metrics. We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples.We then utilized the gene expression microarray to explore thepathogenesis of schizophrenia. Differential gene expression in schizophrenia brain have been pursued by multiple studies, and they yielded a long list of interesting candidate genes, but barely findings can survive multiple testing correction in at least one study and replicated in other studies. This is largely due to strong heterogeneity of gene expression in human brain and maybe minor changes in patients. We hypothesized that coordinated gene expression networks or pathways may have stronger and more robust changes in patient brains. In this study, we used the weighted co-expression network analysis to evaluate expression data of five brain gene expression studies.We first filtered data by strict quality control criteria:besides ComBat for batch correction, we also filtered out probe sets containing SNPs, with detection in less than 90% of samples, and without sufficient annotation.We removed samples of non-Caucasians and outliers from clustering method.A random sample is chosen from replicates.Wefound one module contained genes, MT1X, MT1E, MT1F, MT1G, MT1X and MT2A, which belong to metallothioneins(MT) gene family, were consistently significantly correlated with schizophrenia in five datasets with varied sample sources, and different microarray platforms. This robust change indicated their role in schizophreniaetiology or pathology. MT as one gene family enriched cysteine residues to bind heavy metals, such as zinc, copper, cadmium, mercury, involved in reactive oxygen species protection and stress adaption. Meanwhile, oxidative stress has been suggested to contribute to the pathophysiology of schizophrenia, and zinc plays important roles in nerve development, mood control and preventing cell damage from oxidation; its supplement was considered as one schizophrenia treatment three decades ago. Those evidences are all related with MT’s function in central neuron system, indicating MT’s potential contribution in schizophrenia pathogenesis.We also tried to explore themechanism of MT’s expression alternation in schizophrenia from genetics and epigenetics views, by eQTL method and DNA methylation data, separately.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2011年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络