节点文献

基于DOM建模的网页木马检测的分类器设计

Classification for Webpage Trojan Detection Based on DOM Modeling

【作者】 范宇

【导师】 季丽萍;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2010, 硕士

【摘要】 随着互联网应用的普及,网页已经成为人们获取和发布信息的最主要的方式之一。大量网站在提供信息的同时,也给用户带来了不少安全隐患。据统计显示,木马已经取代病毒成为目前最主要的威胁,而被感染的木马中有超过90%是通过网页传播。因此,如何有效地防止网页木马的传播,以保证用户在使用各种基于网页的应用中不被恶意代码感染,成为亟待解决的问题之一。不同于传统木马,网页木马传播速度更快、范围更广,威胁更大;另外,网页木马采用脚本编码,更容易被编码、加密,拥有更多的变种。因此,传统木马的检测模型并不适用于网页木马检测。然而现有的网页木马检测技术中,仍以广泛应用于传统木马检测的静态特征匹配技术为主,它对未知样本的处理滞后;并且,随着木马库的迅速增长,匹配效率会越来越低。因此我们需要一种针对网页本身的高效检测技术,在木马通过网页入侵主机之前,将威胁阻止。针对以上问题,课题首先对网页木马的攻击原理进行了深入分析,总结出网页木马攻击的特点和检测难点;然后研究了网页的DOM结构,以及浏览器对网页的解析过程,在此基础上提出一种基于DOM结构的网页代码审查模型WIM-DOM;最后,在WIM-DOM建模基础上,构建了基于决策树的网页木马分类器。本文的主要研究工作和创新点如下:(1)研究了网页木马的攻击原理和网页的DOM结构、解析原理,总结出网页木马攻击的特性及其在DOM元素属性和解构上的表现方式。(2)提出了一种基于DOM结构的网页代码审查模型WIM-DOM。该模型针对网页木马攻击的隐蔽性和局部性特点,利用DOM结构将网页源文件映射成为DOM元素序列。该模型既增强了DOM元素属性特征,又保留了元素间的层次结构,有利于局部特征在网页木马检测中发挥作用,为分类器的设计打下基础。(3)在WIM-DOM建模基础上,设计了两种基于决策树的网页木马分类器。分类器WIM-DOM(I)首次提出以DOM元素的属性信息作为分类特征。WIM-DOM(II)首次基于网页木马攻击的序列模式提取DOM元素的序列特征,以提高分类器对于具有多步骤攻击行为的网页木马的检测率,并利用统计信息降低网页自身差异对分类的影响。(4)设计分类实验,从准确性和效率两个方面验证了WIM-DOM分类器的优势。

【Abstract】 With the rapid development of internet application, website has gradually become the most important way to access and release infomation. At the same time, website brought users many new security risks. According to the statistics, trojan horse has become the main threat instead of virus, and more than 90% of them are propagated through webpages. Therefore, research on how to protect users from webpage trojans in web-base applications is attracting more and more attention.Different from the traditional trojans, webpage trojans spread faster and wider, with more serious threat. As webpage trojans are always coded by script which is more likely to be encoded and encrypted. As such, traditional detect models could not be adapted for the webpage trojan detection. However, webpage trojan detection still relies mainly on statical feature matching based on the traditional trojan horse detection nowadays. It’s unresponsitive to unkonwn samples, and the efficiency would decline seriously as the featuer-databases increase. Other researches proposed to detect trojans through monitoring the dynamic behavior of the host. Unfortunately, the detection is taken after infection. Therefore, novel webpage trojan detection methods which target the interpretation of webpage are urgently needed. Such methods could detect threats before the trojans infect into localhost by the webpages.To overcome the above problems, we firstly made a detailed survey on the principles of webpage trojan attack and the DOM structure of webpages, based on which, we proposed a novel webpage inspect model based on DOM structure(WIM-DOM). Then we design our decision tree based classifier with the WIM-DOM model as the input. Compared with previous work, we have made the following contributions:First, we propose a novel webpage inspect model based on the DOM structure,called WIM-DOM. The model uses the inherent DOM structure to map the source document into a sequence of DOM elements, which could reflect the two characteristic of webpage trojan attack: hidden and locality. The model enhances the attributes of DOM elements, and reserves the hierarchy among neighboring nodes as well. As such, local features could contribute more in the classification than other methods.Second, we design a classifier based on WIM-DOM which could be used for webpage-trojan detection. In the classifier, we proposed to use the attributes of DOM elements as the main classifier features, including some statistics to decrease the influence brought by the diversity of webpages. In addition, we are the first to use sequential patterns of the DOM elements for webpage trojan detection, which is proved to be effective in improving the performance of malicious sample with multi-step attack behavior.Finally, we designed several comparative experiments for the WIM-DOM classification from two aspects: the accuracy and efficiency.

节点文献中: