节点文献

基于生物网络的疾病microRNA挖掘技术研究

Research on Technologies of Mining Disease Micrornas Based on Biological Network

【作者】 蒋庆华

【导师】 王亚东;

【作者基本信息】 哈尔滨工业大学 , 计算机应用技术, 2010, 博士

【摘要】 非编码RiboNucleic Acid (RNA)是生物信息学领域当前的研究热点。步入21世纪以来,非编码RNA的相关研究连续获得Science评选的年度十大科学突破,并在2006年获得了诺贝尔生理学或医学奖。MicroRNA是一类重要的非编码RNA,它的异常能导致人类疾病的发生、发展。通过生物实验的方法能够挖掘疾病microRNA,但是实验方法代价高、周期长。本文从生物信息学的角度提出四种疾病microRNA的挖掘方法,挖掘出潜在的导致该疾病发生的microRNA,从而为生物学、医学研究者有针对性地进行microRNA生物实验提供一定指导,进而为药物开发、临床诊断治疗提供一定的依据。本文的主要内容包括:(1)挖掘及分析已知的microRNA与疾病关系自2002年以来,越来越多的研究证明microRNA失调有助于疾病的发生发展,然而这些已知的疾病与microRNA关联关系分散在已发表的文献当中,目前还没有研究机构建立在线共享的数据库,收集、存储、管理这些数据;科研人员不易获取这些已知的microRNA与疾病的关联信息。因此我们先从文献中挖掘已知的疾病microRNA ,构建了全球首个microRNA与疾病关系数据库(miR2Disease),并对数据进行管理。对miR2Disease中的数据进行分析,发现多种疾病往往共享一些致病microRNA,拥有部分相似的发病机制;此外,总结出疾病microRNA失调的三种机制:首先,疾病microRNA常位于与疾病有关的基因座内,例如杂合子缺失的微小区域、微小扩增区域或断裂位点等脆性位点区域;其次,疾病microRNA失调是由异常的表观遗传信息改变所致;例如DNA异常甲基化、组蛋白异常修饰等等;最后,疾病microRNA失调是由参与microRNA生物合成的酶的功能异常所致。(2)提出基于布尔网络的疾病microRNA挖掘技术生物网络在挖掘编码蛋白的疾病基因方面发挥了重要作用,然后在疾病microRNA挖掘领域,至今还未提出基于生物网络的疾病microRNA挖掘方法。因此本文提出了构造布尔型的功能相关microRNA网络的算法,以网络的形式来研究microRNA。通过对网络的分析,我们发现布尔型microRNA网络像其他生物网络一样,网络的度服从幂分布,网络具有层次模块性等特点。我们进一步构建了phenome-microRNAome网络,在此网络上,对已知的疾病microRNA进行分析,发现“功能相关的microRNA失调倾向于导致表型相同或相似的疾病”这一规律。以此为理论基础,提出了基于布尔型生物网络的疾病microRNA挖掘算法,并验证了算法的有效性。(3)提出基于权重型网络的疾病microRNA挖掘技术基于布尔网络的疾病microRNA挖掘技术在构造布尔型microRNA网络只需根据靶基因重叠的显著性来确定二个microRNA之间的关联关系。当知道microRNA对靶基因的抑制强度信息时,可以利用该信息构建权重型网络。因此,我们提出了基于权重型网络的疾病microRNA挖掘方法,取得了很好的性能。(4)提出基于支持向量机的疾病microRNA挖掘方法为了直接从数据出发挖掘疾病microRNA,我们把疾病microRNA的挖掘问题转化为一个分类问题,提出了基于支持向量机的疾病microRNA挖掘方法,把数据挖掘、机器学习的思想引入到疾病microRNA的挖掘中并交叉验证了方法的有效性。(5)提出基于基因组数据融合的疾病microRNA挖掘技术统计数据表明近三年获得的生物医学数据超过过去四万年的总和,数据呈爆炸增长,面对浩瀚的生物学数据海洋,如何把这海量的数据转化为有意义的医学诊断和治疗信息并惠及人类自身的健康是21世纪生物医学信息学面临的严峻挑战。本章整合了多种生物数据资源,构建了全人类基因组范围的基因功能相关网络,在此网络基础上,提出了利用microRNA的靶基因与已知的感兴趣疾病的致病基因之间在网络上的功能关系来挖掘新的潜在的疾病microRNA的算法,将算法应用到结肠癌上验证了算法的有效性。

【Abstract】 Non protein-coding RNAs (ncRNAs) are a research hotspot in bioinformatics. Since we entered the 21st Century, the research on non-coding RNA has been voted consecutively as top ten scientific breakthroughs for several years, and it won the Nobel Prize in Physiology or Medicine in 2006. MicroRNA is an important class of non-coding RNA, and is closely associated with the development of human diseases. Disease microRNAs can be identified by biological experiments, but it is often expensive and time-consuming. In this dissertation, we proposed several technologies for mining disease microRNAs based on bioinformatics, which aim at identifying the most possible microRNAs that potentially cause disease development. The proposed methods will drive testable hypotheses for the experimental efforts to identify the true roles of microRNAs in human diseases and provide a basis for the drug development, clinical diagnosis and treatment. The main contents include:(1) Mining and analyzing known disease microRNAsSince 2002, Accumulating studies have shown that microRNA deregulation contributes to the development of disease, detailed information on these known microRNA–disease relationships are scattered in literatures and there is no online repository for these known microRNA–disease relationships. Researchers are difficult to obtain these known microRNA-disease associations. Therefore, we develop a manually curated database entitled‘miR2Disease’, which provides a comprehensive resource of microRNA deregulation in various human diseases and manages these data. By analyzing these data, we found that some disease often share similar pathogenesis. In addition, we found three types of mechanisms that can explain the deregulation of disease microRNAs: First, microRNA is often located in disease-related loci, for example, minimal regions of loss of heterozygosity, minimal amplicons, or breakpoint fragile regions; Secondly, microRNA dereguation is caused by abnormal epigenetic modifications; such as DNA methylation, histone abnormal modification, etc.; Third, microRNA deregulation may be caused by abnormalities of the enzymes that are involved in microRNA biogenesis.(2) An algorithm for identifying disease microRNAs based on Boolean network is proposedBiological networks have played an important role in mining protein-coding disease genes. However, In the field of disease microRNA identification, no biolgocial network-based approach was proposed to mine the disease microRNAs. Therefore, we for the first constructed a Boolean functionally related microRNA network. By analyzing the network, we found that microRNA network is like other biological network whose degree follows the power distribution and is of the hierarchical organization of modularity. We further constructed a phenome-microRNAome network. In this network, we analyzed the known microRNA-disease associations and found that the deregulation of functionally related microRNAs tend to cause phenotypically similar diseases. Based on this point, we for the first proposed an algorithm for mining disease microRNAs based on Boolean biological network, and verified its validity.(3) An algorithm for identifying disease microRNAs based on weighted network is proposedTo take full advantage of silencing score between microRNA and its target gene and phenotypical similarity score, we proposed an algorithm for identifying disease microRNAs based on weighted network. Experimental results showed that the algorithm for identifying disease microRNAs based on the weighted network outperformed the approach based on Boolean Network.(4) An algorithm for identifying disease microRNAs based on support vector machine is proposedIn order to predict disease microRNAs directly from data, we translated the identification of disease microRNA into a classification problem, and proposed a method to predict disease microRNA based on support vector machine. we for the first introduced data mining, machine learning into the identification of disease microRNAs. Cross-validation results proved that the method is cost-effective.(5) Identifying disease microRNAs based on data fusionStatistics show that biomedical data obtained in recent three years are more than total ones obtained in the past fourty thousand years. The data grow explosively. Facing the vast ocean of biological data, how we tranlated this mass of data into meaningful medical diagnosis and treatment information and benefit the health of human beings. It is the great challenges that biomedical informatics faces in the 21st century. In this dissertation, we integrated a variety of biological data resources to construct a genome-wide functionally related gene network. Based on this network, we proposed an approach to predict disease microRNA by the use of the functional associations between the microRNA target gene and the known causing genes that cause the disease of interest. The proposed approach is applied to the colon cancer and is proved to be effective.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络