节点文献

miRNA靶标预测的系统生物学方法研究

Study on Systems Biology Methods of miRNA Target Prediction

【作者】 刘辉

【导师】 徐钊;

【作者基本信息】 中国矿业大学 , 通信与信息系统, 2009, 博士

【摘要】 miRNA靶基因的鉴定对miRNA工作机制的理解至关重要,但由于缺乏有力的实验手段和高准确度的预测算法,靶基因的鉴定已经成为miRNA研究的瓶颈问题。为此,本文提出了基于系统生物学的靶基因预测算法,旨在利用系统生物学思想,融合序列信息、基因表达信息、蛋白质间作用信息以及其它生物学先验知识,为解决miRNA靶标预测这一问题进行一定尝试。首先,本文对现有miRNA靶基因预测算法作了全面而深入的研究。通过剖析各算法的实现机制及评估部分算法的性能,总结现阶段靶基因预测算法存在的问题,并指明后续研究的努力方向。其次,针对基于序列的靶基因预测,本文提出了基于两级SVM结构的算法SVMicrO,并利用从最新发表的文献中收集的大量了阳性及阴性样本数据集对模型进行了训练和评估。本文分别定义了113及30个特征以构建Site-SVM和UTR-SVM;采用mRMR和SFS方法对特征进行了选取;引入了样本权重以及类别权重解决样本数据间的不平衡问题。基于高置信度靶标鉴定实验的结果,本文将SVMicrO与其它常用的算法进行了全面系统的比较,体现了SVMicrO相对较优的性能及较强的泛化能力。再次,考虑到miRNA具有降解mRNA的功能,本文对miRNA过表达实验的表达谱数据进行了研究,并结合SVMicrO的预测结果,建立了基于序列和微阵列数据的贝叶斯推论模型。其中,本文通过逻辑回归逼近将SVMicrO的预测结果映射到概率空间;采用混合高斯模型对miRNA过表达实验的表达谱数据进行建模,并用VBEM完成对该模型的参数估计。评估结果说明,通过序列信息和微阵列数据的融合,能够获得比使用单一信息更佳的性能。最后,针对miRNA的第一功能为翻译抑制的事实,本文从miRNA对蛋白质合成的影响以及蛋白质间调控作用角度入手,建立基于“作用通路→转录因子→调控基因集”因果模型的靶基因预测算法SysMicrO,为靶基因预测算法开辟了一条新的思路。本文首先建立了转录因子调控基因集以及转录因子上游基因网络数据库;其次利用GSEA分析miRNA过表达实验中有明显富集度变化的转录因子;最终追溯这些转录因子上游的基因并与SVMicrO预测结果进行交迭,给出预测结果。评估结果表明,SysMicrO不仅可以有效地对SVMicrO的预测结果进行筛选从而提高特异度,还可以给预测结果提供相关注释信息和生物学解释,为后续的分析和鉴定提供合理的建议。

【Abstract】 microRNAs(miRNAs) are a class of single-strand non-coding RNAs with approximately 22 nucleotides in length. miRNAs play important regulatory roles in post transcription stage by either degrading mRNA or inhibiting translation and thought to relate to many biological processes and deaseses. Identifying targets of miRNA is a key to understand the mechanisms of regulatory function of miRNA, howere, it become the bottle neck of miRNA research because of the laking of effective biological experiments and high accuracy prediction algorithms. Given this fact, this thesis works on a systems biological approach, which integrates sequence level information, gene expression information, protein interaction information and other biological prior knowledge, aiming to contribute to miRNA target identification.Firstly, an abroad and thorough research is carried out to survey a large number of existing computational miRNA target prediction algorithms. Basing on the investigation of the mechanisms of each algorithm the performance evaluation of several algorithms, shortages of existing algorithms are sumrized. The possible research directions are point out as well.Sencondly, a two-stages SVM based algorithm SVMicrO is proposed for target prediction in sequence level. A large amount of positive and negative samples are carefully derived from the most up-to-date literatures for building training and evaluating dataset. Based on statistical characteristics discovered by former researches as well as our understanding of miRNA:Target interactions, 113 and 30 noval features are extracted for constructing Site-SVM and UTR-SVM respectively. mRMR and SFS are used for feature selection. Sample weight as well as class weight is introduced into SVM to deal with the imbalanced dataset. To compare the performance, SVMicrO and several other popular algorithms are evaluated based on the results of high confidence target identification experiments. The simulation results show that the SVMicrO can produce better performence and generalization capacity.Thirdly, considering mRNA has the function of mRNA degradation, this paper perform a deep research on gene expression profiling in miRNA over-expression experiments and built a Bayesian inferring model based on microarray data and sequence level prediction. In this model, Logistic Regression model is used to map SVMicrO prediction result to probability space and a Gaussin Mixture Model, whose parameters are estimated by VBEM algorithm, is built to model gene expression profiling data. The evaluation results indicate that the proposed algorithm, that integrates tow types of information, outperforms sequence-based prediction and prediction based expression data alone.Fourthly, given the fact that the primary function of miRNA is translation inhibition, an algorithm called SysMicrO, which raises a new concept of miRNA target prediction algorithm, is proposed based on the noval causality hypothesis which is“Pathway→Transcription factor→regulation gene set”by considering the regulation relationship between miRNA and protein as well as protein and protein. The transcription factor regulated geneset and transcription factor upstream regulation network database are constructed as the first step for carrying out the algorithm. For predition, GSEA is used to detect the enriched genesets which indicate the related transcription factors are affected by the miRNA. Finally prediction results are listed out by calculating the intersection of transcription factor upstream genes and SVMicrO predicted results. The evaluation results show that not only can SysMicrO improve the specificity by screen out the SVMicrO prediction, but also can provide the annotation and biological explaination which is meaningful information for subsequent analysis and target identification.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络