节点文献

siRNA设计中若干关键问题的研究

Study on Some Key Problems in siRNA Design

【作者】 常亚萍

【导师】 刘元宁;

【作者基本信息】 吉林大学 , 生物信息学, 2013, 博士

【摘要】 RNA干扰是由双链RNA引起的基因沉默现象,广泛应用于研究基因的功能、药物靶点筛选、疾病治疗等方面。siRNA设计是实现RNA干扰的有效途径,siRNA设计的优劣将直接影响RNA干扰的效果。目前siRNA设计方法中,设计规则方面存在的问题是:设计规则是基于序列特征,没有考虑靶结构对siRNA干扰效率的影响,导致设计出的siRNA序列的干扰效率较低。目前siRNA设计中,在预测候选siRNA的干扰效率方面存在的问题是:目前的预测方法主要考虑siRNA自身的特征,因此,预测的准确度不高,相关系数通常在0.63左右,从而导致候选的siRNA数量过多,给生物实验带来了很大的困难。如何提高siRNA干扰效率预测的准确度是目前急需解决的问题。由于siRNA的沉默效率与靶mRNA的结构相关,因此,包含了靶mRNA结构特征的siRNA设计可能会大大提高设计的准确性。本文提出了序列特征和结构特征相融合的siRNA设计算法,将其应用于2009年H1N1流感病毒和2008年季节性H1N1流感病毒的siRNA设计中。在多特征融合的靶向流感病毒的siRNA设计过程中,既考虑序列特征,也考虑靶序列的结构特征,用结构系数去衡量靶结构的优劣,根据结构系数的大小,选择出较优的候选靶序列,然后,根据靶序列设计出相应的siRNA序列。只有找到与siRNA干扰效率密切相关的特征,才能提高siRNA干扰效率预测的准确性。本文通过定性分析和定量分析,发现哺乳动物的siRNA干扰效率与mRNA的GC含量、靶点附近的GC含量、mRNA的茎比率、靶点附近的茎比率之间有很强的相关性。由于mRNA全局的特征和靶点附近局部的特征与siRNA干扰效率之间的相关性很强,所以,本文提出了一个基于随机森林的siRNA干扰效率预测模型,在预测siRNA干扰效率时,考虑siRNA自身特征的同时,也考虑mRNA全局的特征和靶点附近局部的特征。10折交叉验证的相关系数从0.63提高到0.7,从而证实了考虑mRNA全局的特征和靶点附近局部的特征可以显著地提高预测的准确性。综上所述,本文的创新点主要有以下两点:1、本文提出了多特征融合的siRNA设计算法,根据模式识别理论与实践,多特征融合是提高模式识别精度的有效手段。采用多特征(序列特征、结构特征)融合模型,来进行靶向流感病毒基因的siRNA设计,是提高其准确性的途径之一。2、本文提出了一个基于随机森林的siRNA干扰效率预测模型,在预测siRNA干扰效率时,考虑siRNA自身特征的同时,也考虑mRNA全局的特征和靶点附近局部的特征。10折交叉验证的相关系数从0.63提高到0.7,从而证实了考虑mRNA全局的特征和靶点附近局部的特征可以显著地提高预测的准确性。

【Abstract】 RNA interference is making intra-cellular homology mRNA degradation byimport short double strand RNA, can inhibit expression of target mRNA. An effectiveapproach for RNA interference is through small interference RNA(siRNA) design, thequality of siRNA can influence the effect of RNA interference directly, thererfore,effective siRNA design method is crucial. Design siRNA by biological experimentrequires a lot of manpower and resources, high cost of experiments, long cycle, andlow efficiency, thus by bioinformatics and computer-aided means to design siRNA hasbecome effective means of achieving RNA interference.There are some problems in the design rule of siRNA design, at present the designrule is based on sequence feature, have not consider secondary structure of target, thusthe efficiency of designed siRNA is low.There are some problems in the prediction of candidate siRNA efficiency, atpresent predict of candidate siRNA efficiency are based on siRNA sequence features,the accuracy is low, the correlation coefficient is around0.63, thus which leads toexcessive number of candidate siRNA sequences, brings some difficulties to biologicalexperiments. How to improve accuracy of siRNA efficiency prediction is an urgentproblem.H1N1influenza virus is an RNA virus, it has strong infectivity and fast spreadvelocity, brings serious threat to human health. Now, the main method used to preventand treat flu is by vaccination and medication, the vaccine only can used to prevent fluand only for matched strains, when new flu outbreak, can not get correspondingvaccine timely, and can not guarantee the safety of the vaccine. Anti-influenza drugmainly are M2ion channel blocker and neuraminidase inhibitor, because after used ofthe former drug, can cause drug-resistant strains rapidly, thus the clinical application islimited;the price of the latter is expensive, ordinary people can not bear it and theproduction capacity of the drug is limited, if there is a large scale epidemic, thensupply of the drug is limited, we should pay more attention to that with the widely useof the drug, drug resistance is also steady spread and drug has some side effects on central system and digestive system. A Influenza virus brings serious threat to humanhealth, using the traditional method can not control new influenza virus timely andeffectively, thus researchers should consider various aspects of influenza virusinfection mechanism, look for effective method to prevent and treat influenza virus.By bioinformatics methods to analyze A H1N1influenza virus, using RNAinterference method to inhibit expression of virus gene, can control the spread of virus,compared with using the traditional experiment method to study H1N1influenza virus,this can reduce cost and shorten research cycle. RNA interference has become effectiveinstrument of inhibiting A influenza virus. The researchers according to traditionalsiRNA design method, designing siRNA which targeting to H1N1influenza virus toinhibit expression of the H1N1influenza virus gene, has got some achievements. Butat present siRNA design methods mainly are based on sequence features, have notconsidered influence of target structure on siRNA interference efficacy, thus designedsiRNA interference efficacy is low.Secondary structure of target mRNA is related to siRNA inhibitory efficacy, thuswhen designing effective siRNA, consider structure feature of target mRNA mayimprove accuracy. This study proposes a siRNA design algorithm which combinedsequence features and structure features, then apply it to design siRNA of2009H1N1influenza viral and2008seasonal H1N1influenza viral.Every H1N1influenza viral strain contains8gene fragments, namely PB2, PB1,PA, HA, NP, NA, MP, and NS, HA gene and NA gene are likely to mutation, while NP,MP, PA and PB1gene are relatively conservative, thus target gene of RNA interferencemainly are NP, MP, PA and PB1gene. The PA fragment has polymerase activity and isinvolved in the entire process of transcription and replication of the virus, play the roleof kinase or helicase, hence, it is a good target in the prevention and treatment ofH1N1flu, designing efficient siRNA to inhibit the expression of PA gene, can controlthe spread of H1N1influenza viral. In this study, the PA fragments of the H1N1influenza virus in2009and the seasonal influenza virus in2008of sequence andstructure are compared and analyzed, and found significant differences between them,not only in sequence features, but also in RNA secondary structures, which lead todifferent biological nature. This paper proposes a siRNA design algorithm whichcombined sequence features and structure features, when designing siRNA of H1N1influenza virus, not only considering sequence features, but also structure features, using structure coefficient to evaluate secondary structure of target, select the bettercandidate target and then according to target design corresponding siRNA sequence.On the basis of improved siRNA design algorithm, design siRNA of2009H1N1influenza virus and2008seasonal H1N1influenza virus respectively, and find that antarget which only have one base difference between2009H1N1influenza virus and2008seasonal H1N1influenza virus, which lay the foundation of finding mutualtarget.If researchers can find features which closely related to siRNA interferenceefficacy, then can improve the accuracy of prediction. This study proposes consideringmRNA global features and near siRNA binding site local features except siRNAfeatures, when predicting siRNA efficacy, considering20nucleotides at each side ofthe binding sequence, together with21nt at the siRNA binding region,61nt in all,named neighboring nucleotides. From the result of qualitative analysis, it can be seenthat the more the siRNA interference efficacy, the less the mRNA GC content, mRNAstem ratio, neighboring GC content, neighboring stem ratio. The qualitative analysisonly can see the tendency, but can not quantitative assessment, then do linearregression analysis, and find that there are strong correlation between the siRNAinhibitory efficacy and the average of the mRNA GC content, mRNA stem ratio,neighboring GC content, neighboring stem ratio, and the P-value is very significant.From the result of qualitative and quantitative analysis, it can be seen that there arestrong correlation between mRNA GC content, mRNA secondary structure feature andRNA interference efficacy, on the mRNA global level and neighboring location. Fromthe result of feature selection, it can be seen that some mRNA features and neighboringfeatures are important feature, and the number of important mRNA feature are muchmore than the number of important siRNA feature, thus when predicting siRNAinterference efficacy, should consider mRNA global feature and neighboring localfeature.Based on the above analysis, this study proposes a siRNA efficacy predictionmodel based on random forest using siRNA features, mRNA features, and near siRNAbinding site features, the correlation coefficient of10fold cross validation increasedfrom0.63to0.7, which confirmed that considering mRNA global feature andneighboring local feature can improve accuracy, therefore, when designing siRNA,should consider the influence of mRNA global features and near siRNA binding site local features on siRNA interference efficacy except siRNA features. The studysuggests that when designing effective siRNA target to mammal which have less GCcontent, fewer stem secondary structures, in other words, more loop secondarystructures of mRNA at both global and local flanking regions of the siRNA bindingsites are preferred,mRNA GC content and neighboring GC content less than50%arepreferred; mRNA stem ratio and neighboring stem ratio less than0.6are preferred. Thestudy provides a new idea for siRNA design, and directive significance to designeffective siRNA. In addition, the result of this study may also be helpful inunderstanding binding efficacy between microRNA and mRNA, it is because there aresome similarities between siRNA binding to mRNA and microRNA binding to mRNA.In summary, there are two innovation points in this paper:1、This study proposes a siRNA design algorithm of multi-feature fusion, whichtarget to influenza viral, according to theory and practice of pattern recognition,multi-feature fusion is effective means of improve recognition accuracy. Bymulti-feature(sequence feature and secondary structure) fusion method to designsiRNA which target to influenza viral is one means of improve accuracy.2、This study proposes a siRNA efficacy prediction model based on random forest,when predicting siRNA efficacy, consider the influence of mRNA global features andnear siRNA binding site local features on siRNA interference efficacy except siRNAfeatures. The correlation coefficient of10fold cross validation increased from0.63to0.7, which confirmed that considering mRNA global feature and neighboring localfeature can improve accuracy.For the future research, we will consider other features which related to siRNAinhibitory efficacy, mainly consider protein features. Protein binding features caninfluence siRNA inhibitory efficacy, it is because if there are proteins have bound ontarget, then siRNA difficult to bind on target, thus influence siRNA inhibitory efficacy.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2014年 04期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络