节点文献

药物—靶标相互作用及药物对组合研究

Study on Predictions of Drug-target Interactions and Drug Combinations

【作者】 赵明珠

【导师】 魏冬青;

【作者基本信息】 上海交通大学 , 生物医学工程, 2013, 博士

【摘要】 药物研发是全球性的发展问题,过去几十年中,药物靶向治疗策略取得了相当大的成绩,确定药物治疗靶点,寻找针对靶点的特异性药物,是医药企业以及实验室研究的着力点,并且已经取得了相当大的成绩,为人类健康做出了不可磨灭的贡献。然而,近些年,新药研发速率不断下降,研发成本不断上升,究其原因:一是药物研发早期阶段筛选大量的药物候选物,仍然主要依靠耗时耗力的实验手段,后期发现药物的疗效不理想或者副作用导致研发失败;二是大部分人类疾病是由多因素引起的复杂疾病,而生物系统具有一定的冗余度和鲁棒性,单一药物对单一靶点的干扰不能引起系统表型的改变。随着不同组学技术的进步,累积了很多的的生物学数据,使得生物学数据库逐渐增加。生物信息学与计算生物学的发展,对于解开药物研发面临的困境,提供了一种有效手段。特别是在药物研发的早期阶段,虚拟筛选技术提供了一种高效而高通量的手段,为早期研发锁定目标、节约成本起到了重要作用。运用计算手段,整合多种数据资源,挖掘数据中隐含的关联信息,筛选可靠的药物-靶标关系和有效的药物组合恰逢其时。本文从公开数据库资源入手,针对医药领域一直关心的两大热点问题:药物-靶标关系预测以及药物组合研究,设计了不同的计算模型,并验证了模型的有效性。本文的研究工作主要包含以下三个部分:1、建立了一个化学相似性系综模型,在大范围的公开数据库内探索蛋白-配体的相互作用关系。研究共涵盖了53092个配体小分子和14732个人类蛋白,选用的蛋白不仅包含少量已知的药物靶标,而是包含已获得较多配体信息(大于5个配体)的人类蛋白;选用的配体也不限于少量的上市药物,而是包含药物、小分子化合物、离子等可作为蛋白配体的小分子,极大地丰富了化学相似性系综法的应用范围。在蛋白-配体相互关系的预测中,使用了GpiDAPH3和MACCS key两种不同编码类型的指纹表示配体小分子,预测结果的ROC曲线下面积AUC分别达到了0.6608和0.8344。可以发现,基于MACCS key指纹建立的化学相似性系综模型仍然保持了较好的预测效果,说明了化学相似性系综法具有较好的拓展功能。后来,为中药成分寻找蛋白靶标的研究,进一步说明了化学相似性系综模型对于预测新的药物-靶标关系具有一定效力。2、建立了一个基于化学倾向性信息的支持向量机模型。特征向量的构建抛弃了蛋白结构信息,完全使用蛋白已知配体的化学信息。332个特征分别取自配体的指纹信息以及已知的蛋白-配体相互关系,这样构建的特征向量被我们称之为化学倾向性特征向量。该模型对于预测蛋白-配体相互作用关系显示了出色的能力,超过了化学相似性系综模型的预测结果。五倍交叉检验和独立检验都显示出很好的结果,ROC曲线下面积AUC分别达到0.9914和0.9878。随后的特征选择,初步揭示了配体-蛋白(药物-靶标)关系确立的本质联系。最后,应用该模型筛选了精神分裂症靶点DAO的抑制剂,并进行了实验。实验结果进一步显示了该模型的优势,10个预测到的药物候选物中,有7个获得文献或实验支持,并且发现了4个新的DAO抑制剂,进一步印证了基于化学倾向性的支持向量机模型对于预测药物前体和靶标具有良好的效果。3、提出了一个新的计算方法,通过整合药物作用下的基因芯片数据,药物作用下的子网络以及现有的信号通路信息,构建了一个机器学习模型,用于预测药物组合。首先使用单独用药的基因表达数据,预测组合药物作用下的基因表达变化比率。根据用药前后的基因表达变化比率,定义现有PPI网络的权重,用jActiveModules筛选药物作用下的最优子网。最优子网中的基因被认为是药物干扰引起细胞系统响应的基因。以药物组合及单独用药下最优子网中的基因在信号通路中出现的频率构建特征向量,优化特征并建立模型。“留一法”交叉验证结果显示,ROC曲线下面积AUC的均值达到0.7941,说明该模型能够较好地实现药物组合与负样本的分类。以癌症为例的案例分析发现,预测到的前10个癌症药物组合中,有3个是数据库中已经存在的,有2个找到了文献支持,进一步说明了该模型的有效性。

【Abstract】 Drug discovery and development have raised widespread attention in the past twodecades and the target-orienented drug pharmacology has also achived great success.Identifying therapeutic targets and seeking specific drugs for targets, which are thefocal point for pharmaceutical enterprises and laboratory research, have alreadyachieved great progress and made indelible contributions to human health. However,in recent year, new drug development rate slows down and the cost of researchcontinues to rise, mainly due to two reasons: first is that vast screening of drugcandidates in early state still relies on time and labor consuming experimental means,while in later stage the unsatisfactory efficacy or side-effects of the drug may lead tofailures; second is that as most human diseases are complex disease induced by manyfactors, and the biological system has a certain degree of redundancy and robustness,the interference on single target by single drug can not alter the system phenotype.With the development of all kinds of omics, the accumulation of large amounts ofbiological data leads to the continuous expansion of biological database. Thedevelopment of bioinformatics and computational biology provides an effectivemeans to solve difficulties in drug development. Especially in the early stages of drugdevelopment, virtual screening method provides an efficient and high-throughputtechnique, which plays an important role in saving cost and narrowing down theresearch scope. It is at just the right time to use in silico methods for the integration ofa variety of data sources, data mining of the underlying associations, and screening ofreliable drug-target interactions and effective drug combinations.Based on the public database resources, this paper designs different computationalmodels, and verifies their effectiveness aiming at the two hot issues that concern in the medical field, drug-target interaction and drug combination prediction. The mainresearch work of this paper includes the following three parts:1. A chemical similarity ensemble model is established to explore the protein-ligandinteractions from public databases in a large scale. This research covers a total of53092ligand and14732human proteins: the selected proteins contains not only afew known drug targets, but also those with rich ligand information (more than5ligands each); the selection of ligands is not limited to a few commercial drugs,but contains drugs, small molecular compounds, ions et al that can be used asprotein ligands. Our reasearch has greatly enriched the application scope ofchemical similarity ensemble method. Using two different ligand fingerprintsGpiDAPH3and MACCS key, the areas under the ROC curves (AUC) achieve0.6608and0.8344respectively. It can be found that, the similarity ensemblemodel using MACCS key fingerprint still maintains a good prediction capability,showing strong extensibility. Later, the study of seeking protein targets forTraditional Chinese Medicine composition further illustrates that the chemicalsimilarity ensemble method has a certain validity to predict new drug-targetinteractions.2. A support vector machine (SVM) model based on the chemical-protein bindingsfrom STITCH is developed. New features have been built from ligand structurespace and ligand-protein networks and then chosen as the the input parameters forSVM model.332feature vectors are constructed from both ligand fingerprint andprotein-ligand interactions, called as chemical preference feature vectors. Thismodel shows good ability in predicting protein-ligand interactions, whichoutperforms the state-of-the-art method based on ligand similarity. The resultedAUC for5-fold cross validation and independent test reaches as high as0.9914and0.9878, respectively, achiving a very high accuracy of prediction.Furthermore, in order to simplify the model,182distinct features in pairs havebeen chosen to rebuild a new model which still shows similar outcome as the one built on the whole332features. Then, this refined model is used to search for thepotential D-amino acid oxidase (DAO) inhibitors out of STITCH database andthe predicted results are finally verified by our wet experiments. Out of10candidates obtained, seven DAO inhibitors have been verified, in which four arenewly found in the present study, and one may have a new application in therapyof psychiatric disorders other than being an antineoplastic agent. Obviously, themodel in this paper possesses abilities for high-throughput new drug and targetdiscovery in a timely manner.3. A new calculation method is proposed by integrating gene chip data andsub-network under the drug effect, as well as pathway information available, tobuild a machine learning model, for the prediction of drug combination. Firstlygene expression data of single drug is used to forecast gene expression variationratio of drug combinations. The weight of existing PPI network is definedaccording to gene expression ratios before and after the treatment, and theoptimal drug sub-network is identified by jActiveModules. Genes in the optimalsub-network is thought to be the response of cell system to drug interference. Thefrequencies of the genes in the optimal sub-network by drug combinations andsingle drug alone appearing in different pathways are adopted as the featurevectors to optimize the features and construct the model. Results of the crossvalidation indicate that mean area under the ROC curve reaches0.7941, whichindicates that this model can classify the positive and negative samples of drugcombination very well. A case study in cancer as an instance finds that, amongthe first10forecasted drug combinations, three already exist in the database andtwo others are supported in literature review, further indicating the effectivenessof the model.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络