节点文献

用于行为分析反木马的模糊分类算法研究

Research on Classification Algorithms of Trojan Horse Detection Based on Behavior Analysis

【作者】 顾雨捷

【导师】 陈庆章;

【作者基本信息】 浙江工业大学 , 计算机应用技术, 2008, 硕士

【摘要】 信息安全问题,在网络迅速发展与广泛应用的现代社会中,日益受到人们重视。特洛依木马(Trojan-horse)是影响信息安全的一个重要问题,与绝大多数的网络政治经济案件联系在一起,数量和危害程度上均占到较大比重,并呈上升趋势,至今尚未能找到行之有效的办法来充分遏制木马危害的持续上升。因此,反木马的研究一直是信息安全领域的一个热点和重点。目前木马检测方法大多为特征码检测技术体系。行为分析由于具有检测特征码未知的木马、病毒等非法程序的能力,具有主动防御的特点,成为反木马和反病毒等研究领域中的最重要技术。在对现有的基于行为分析技术的反木马策略分析后,本文发现,大多数策略中存在着误报、漏报率过高、应用效率过低、交互式结构不符合用户使用等诸多不足。鉴于上述不足,本文通过对反木马算法理论体系研究和木马行为特征分析,建立了一套反木马算法设计标准,为算法的构建和实际应用提供有力的依据。在此基础上提出并设计了用于行为分析反木马的模糊模型分类算法,最后利用实验证明了本文反木马算法的有效性。本文的主要研究工作如下:1.归纳总结出行为分析反木马技术面临的核心问题。本文深入的分析木马危害的形成过程,描述了木马产业链的构成。讨论了主流木马查杀技术与评判标准。对现有的反木马算法进行了深入地剖析。最后本文总结认为:用于行为分析判别的分类算法不成熟是该技术面临的核心问题。2.建立了反木马算法设计标准,同时指出了判别精度的理论上界。本文依据国外著名学者Frederick.Cohen博士提出的恶意代码的不可精确判定理论体系,指出了冯.诺伊曼体系下的恶意代码是无法在多项式时间内100%精确地判别恶意代码,从而从理论上得到反木马系列算法只可做到局部100%的精度上界。在总结和归纳目前反木马分类算法上存在的不足,结合行为分析技术以及反木马的特点,我们给出了三条原则作为基于行为分析的反木马设计算法标准:算法能够有效缓减特征属性的增长带来的算法效率下降;算法允许在多项式时间内自适应局部收敛到一个有效精度;算法可以自动提炼特征属性。3.提出了一种新的基于多层模糊分类系统的反木马算法。模糊分类是指用来处理带有模糊性模式的识别方法,具备概率推理能力强、语义清晰、易于理解等特点。木马与合法程序的行为特征就存在这样的一种模糊性。基于这种模糊性的特点以及反木马算法设计的三个标准,本文给出了一种新的基于多层模糊分类系统的反木马算法,它根据模糊规则初始分类的正确与否自适应的调整规则的置信强度来训练算法中的规则,最终实现木马判别的局部高精度分类。4.对本文给出的算法进行实验验证,并利用交叉验证法首次解决了木马实验数据集一直存在的小数据量及测试样本库权威性的问题,从而使测试结果更加有效。实验中,共收集了200个合法程序,并查阅和研究了200个木马的技术细节,提取了常见的7个行为特征来进行实验。通过实验结果分析,证明了本文给出模糊分类算法在训练阶段可在有效的时间实现局部100%的分类精度,并在测试阶段取得高精度判别。同时,本文通过与同等实验条件下的贝叶斯反木马分类算法的比较后发现,我们的算法在平均精度和最佳精度上均高于贝叶斯算法。

【Abstract】 The rapid development and wide application of network are raising more and more concerns on information security. Trojan-horse is always associated with information crimes, be it economical, political or both. No practical solution has been found. The significance of the Trojan-horse problem is represented in the frequency and severity of related cases. Solving the problem has attracted concentrated effort of information research.Current anti-Trojan is almost signature-based strategies. Behavior analysis, with the ability to detect Trojans with unknown signatures, is a technique of initiative defense. Its potential to meet the future needs of information security has made behavior analysis a hotspot in anti-Trojan studies. Current behavior analysis based anti-Trojan strategies have the following problems: high false or failure alarm rate, poor efficiency, and poor user-friendly interface design, etc. we conclude that the core problem lies in the immature categorization algorithm model, which is used to analyze and judge the behaviors. Most of the previous studies have used existing classification algorithms that were not specifically designed for anti-Trojan and may cause problems. This paper works on the design of an anti-Trojan oriented algorithm based on behavior analysis. Our work is as follows:Firstly, we conclude that the core problem lies in anti-Trojan based on behavior analysis. We analyze the process of the Trojan harm, and next its class and character of behavior and illustrate the industry chain based on Trojan. Then we discuss the main anti-Trojan technique and criterion. We introduce the behavior analysis and point out the virtue with the above compeers. Some existing examples are proposed to present the anti-Trojan base on behavior analysis for the purpose that finding out the core problem, and we point out that the immature categorization algorithm model is the key, which is used to analyze and judge the behaviors.Secondly, we construct standard of anti-Trojan algorithm system and point the up-limit of the precision. We began with the theory that all malicious codes in a Von Neumann System cannot be precisely predicted within a polynomial computation time, so theoretically there is an up-limit of the precision of detection. First, we point out three principles of algorithm design: first, the algorithm should automatically extract features. Second, the algorithm should adapt its efficiency to increasing number of features Third, the algorithm should self-adaptively converge to a certain precision within polynomial computation time.Thirdly, we propose algorithms of Trojan horse detection based on behavior analysis. Fuzzy classification is a method that deals with some fuzzy pattern which always have a fuzzy domain but clearly pattern. The feature of both Trojan and legal code belongs to the fuzzy pattern. Based on the certain fuzzy point and three principles, we propose algorithms of Trojan horse detection based on behavior analysis. The method can adaptively tune the confidence value based on whether it is false or right classification primly in order to train the rules, finally, to get a powerful classification machine for anti-Trojan.Fourthly, we organize the experiment to get the result. In order to insure the authority of experiment and to solve the problem of limited of pattern number, we introduced the cross-fold method to test the algorithm. 200 legal codes and 200 pattern description of Trojan from Symantec corp. were analyzed, from which 7 behavioral features were extracted for experiment. Our results show 100% local accuracy after a practical amount of training, and high precision classification in the testing phase. We’ve compared our algorithm with the Bayesian classification algorithm. Under equal conditions, our algorithm yielded better results in terms of both average and optimal accuracy.

  • 【分类号】TP309.5
  • 【被引频次】18
  • 【下载频次】527
节点文献中: