节点文献

基于数据挖掘的药物靶标发现方法研究

Computational Methods for Drug Target Identification Based on Data Mining

【作者】 龚家瑜

【导师】 高大启; 李洪林;

【作者基本信息】 华东理工大学 , 计算机应用技术, 2013, 博士

【摘要】 药物研发过程漫长,投入大,风险高。新药研发的平均时间长达15年,平均耗费超过8亿美元。然而,药物的疗效和毒副作用问题使得药物的研发常在临床阶段失败,造成了巨大的损失。作为药物研发的源头,药物靶标的发现和识别对药物的研发成功率具有决定性的作用。随着生物信息学技术手段的日益丰富,以及蛋白质组学数据、化学基因组学数据的日益增长,计算化学生物学方法与传统实验技术结合,可为药物新靶标发现提供信息技术支撑,并为靶标识别预测提供新的思路。本文以药物靶标发现为目的,利用计算机技术对现有的药物靶标数据进行挖掘整合,构建药物靶标数据库和农药靶标数据库,同时发展几种药物靶标发现相关的方法和工具。本文主要包括以下五部分内容:1.根据现有药物以及在研药物的适应症,通过文本挖掘手段,并加以人工验证,构建了药物靶标数据库TargetBank。该数据库收录了4357条药物靶标蛋白记录,涵盖了23种治疗领域以及600多个临床适应症。同时该数据库还提供了对每条靶标蛋白记录作了详细的功能注释,为药物研发人员提供了参考。2.建立了世界上首个农药分子与靶标相互作用网络数据库PTID。该数据库通过数据整合收录了1342条农药记录,按照功能分为22个大类,并提供了环境毒性,生态环境影响等方面的注释信息。然后通过文本挖掘手段,收集了4245条与农药活性成分相互作用的蛋白靶标记录,构建了农药与其靶标间的相互作用网络,并对蛋白的序列,功能等方面做了详细注释。同时提供相似性搜索和序列比对等计算工具,为农药的研发提供了良好的数据支持。3.设计实现了基于等值面原理的分子溶剂可及表面生成程序。该方法采用了一种一维递归快速高斯滤波计算方法,在三维离散空间中快速构造等值面,然后利用等值面提取技术Marching Cubes获取分子溶剂可及表面的三角而片表示,并使用中心差分法计算三角面片的法向量,以获得更好的显示效果。实际测试结果表明,该算法速度较快,并准确性良好,可为基于结构的靶标识别方法提供药物靶标的分子表面表征。目前该程序已经整合入了国产大型药物设计图形软件包D3Pharma中。4.结合分子相似性方法和网络药理学的概念,发展了基于随机游走的多向药理学方法。该方法利用分子相似性将目标化合物引入已有的药物-靶标网络中,并使用基于二分图的随机游走网络推理算法来发现其多向药理效应,预测其新的靶标或毒副作用。通过对近期文献报道的药物脱靶事件进行预测,该方法被证明能够有效预测药物的多向药理特性。5.在课题组现有的分子相似性程序SHAFTS的基础上,建立了基于分子相似性的靶标发现及虚拟筛选平台ChemMapper。该平台收集了350万个化合物结构信息,其中40万个化合物包含了靶标注释信息。该平台利用三维分子相似性方法可对提问化合物分子结构进行靶标及毒副作用预测;也可针对商业化合物库进行虚拟筛选以及骨架跃迁研究。通过对现有药物阿司咪唑的毒副作用准确预测,以及对EGFR先导化合物的发现以及骨架跃迁的验证,ChemMapper可为化学基因组学,靶标发现,多向药理学及虚拟筛选等研究提供服务。

【Abstract】 The process of new drug discovery and development is time-consuming, costly and risky. The average cycle for new drugs development generally reaches15years, which consumes more than800million US dollars. However, due to the low efficiency and side effects, the drug development often fails in the clinical trial phase, causing huge lost. As the origin of drug development, drug target discovery and identification is critical to the success rate of drug development. With the development of bioinformatics technology as well as fast growing of proteomics and chemical genomics data, the combination of computational chemical biology and traditional experimental techniques provides information support to drug target discovery and novel methods for drug target prediction. Aiming at drug target discovery, this thesis will construct drug target database and pesticide target database by using computational methods to perform data mining and integration on the existing drug target database, and developed several methods and tools on drug target discovery. This thesis includes the following five parts:1. Based on the existing drugs and indication for drugs under development, we constructed a drug target database called TargetBank by means of text mining and manual validation. This database included4,357records of target proteins, covering23therapy domains and more than600clinical indications. This database also included detailed function annotation for each protein record, providing useful reference for drug development researchers.2. We constructed the first database of the interaction network for pesticides and targets, named PTID. This database contained1,342records of pesticides by means of data integration, which was divided into22categories by function, and provided with annotation on environmental fate and ecotoxicology. By means of text mining,4,245records of protein targets interacted with the pesticide are collected, and an interaction network between pesticides and targets was constructed with detailed annotation on protein sequence and function. This database also provided computational tools such as similarity search and sequence alignment, offering data support on pesticide research.3. We designed and implemented a molecular solvent-accessible surface generation program based on iso-surface theory. This method adopted a rapid one dimensional recursive Gaussian filter calculation, constructed iso-surface in the discrete three dimensional spaces and obtained the triangle representation of molecular solvent-accessible surface by using iso-surface extraction technique "Marching Cubes". Central difference method was applied to calculating the normal vector of triangles for better render effect. The test result indicated that this method was both fast and accurate, and could provide drug target surface representation for structure-based target identification. This program has been integrated into the D3Pharma software package.4. We developed a random-walk-based polypharmacology method, combining the molecular similarity methods and network pharmacology concepts. This method introduced the target compound into an existing drug-target network using molecular similarity, then discovered its polypharmacology effect and predicted its new targets or side effects by bipartite and random walk based network inference algorithm. Through prediction of recent reported off-targets events, this method was proved to be capable of predicting polypharmacology effect.5. Based on a molecular similarity evaluation program SHAFTS developed by our research group, this thesis implemented a molecular similarity based target discovery and virtual screening platform ChemMapper. This platform collected the structure information for more than3.5million compounds, with400thousands of them possessing target annotation. This platform can perform target or side effect prediction against a query structure based on three dimensional molecular similarity methods; it can also perform virtual screening or scaffold hopping research on commercial compound databases. Through the precise prediction of toxicity for the existing drug Astemizole, the lead compound discovery and scaffold hopping validation for EGFR, ChemMapper could facilitate the research on genomics, target discovery, polypharmacology and virtual screening.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络