节点文献

面向智能Web站点的数据挖掘技术研究及应用

Research and Application of Data Mining for Intelligent Website

【作者】 水俊峰

【导师】 夏红霞;

【作者基本信息】 武汉理工大学 , 计算机应用技术, 2003, 硕士

【摘要】 目前,Internet和电子商务的发展带动了面向Web的数据挖掘技术的发展。在电子商务中,运用数据挖掘技术对服务器上的日志文件等Web数据进行客户访问信息的Web数据挖掘,根据对客户的访问行为、访问频度、访问时间的分析,得到群体客户行为和方式的普遍知识,动态地调整页面结构,改进服务,给客户个性化的界面,使电子商务活动更具有针对性。 Web挖掘技术使得人们能够充分了解Web中页面的关系,以及Web站点的组织形式与用户的访问模式之间的关联。其中,面向Web服务器日志的Web日志挖掘技术尤其得到众多研究人员的关注,利用Web日志挖掘,我们可以知道用户对网站的浏览模式,可以根据用户的浏览行为发现相似行为的用户群,以及根据Web页面被用户访问的情况将具有相同特征的页面分组。 基于上面的讨论,文中提出的提高Web服务的质量的解决方案是:采用数据挖掘技术中的Web日志挖掘为核心技术,建立一个智能Web站点(Intelligent Web Site,简称IWS)。智能Web站点利用Web日志、文档、数据库以及站点结构等可以获得的数据,采用数据挖掘技术,从中获取用户访问模式,根据用户当前访问的情况,实时地推荐用户可能感兴趣的内容,同时,Web服务器根据站点的使用情况,寻找站点设计的不合理之处,从而提醒管理员进行修正。 本文首先提出了IWS的结构和组成模块,然后围绕智能Web站点中的模块,研究了其中的一些关键的数据挖掘技术与算法,最后在此基础上实现了一个原型系统。根据这条思路,本文主要包括以下内容:第2部分给出采用Web日志挖掘技术的智能站点体系结构,作为论文后续内容的一个索引。第3到第5部分是本文的重点,论述了设计智能Web站点所需要的数据挖掘技术,第3部分主要介绍了Web日志数据预处理技术研究中的一种改善预处理结果的方法——Frame页面过滤技术。第4部分论述了一种快速高效挖掘Web日志文件中聚类模式的算法——SLIC(Slope-Item Clustering)。第5部分提出了挖掘Web日志中频繁访问页组的一个加强算法。第6部分简述了面向Web日志挖掘的智能站点的实时推荐模块和管理员模块。根据前面的讨论,第7部分给出一个试验原型系统——IWS,最后一章总结了本文的所做的研究工作并给出了进一步的研究方向。

【Abstract】 At present, the development of Internet and e-commerce drives the research for data mining technology facing web. In e-Commerce, the user’s browsing behavior can be discovered by applying data mining technology on web data such as server logs, and the general knowledge of the group customer’s behaviors and patterns can be obtained by analyzing the user’s accessing behavior and accessing time. In addition, the page structure, the service and marketing strategies can be modified and improved dynamically according to the discovered knowledge to make the electronic commercial activity more pointed.Web mining technology make people can fully find out the relation of the web pages, and the connection between the web organizational forms of website and the access mode of the customer. Among them, the web log mining technology gets the concern of the numerous researchers especially. By utilizing web log mining, we can know the browser mode of the customer, find the similar user group according to browser behaviors and divide the pages with the same characteristic into groups by the web pages visited by the user.On the basis of discussion above, the solution to improve the quality of website service putting forward in the article is adopting web log mining technology of data mining as key technology to establish a intelligent website (Intelligent Web Site, abbreviated as IWS). Intelligent website utilizes web logs, files, database, website structure, and other data resources that may win to obtain user’s accessing pattern by adopting data mining technology, and according to the situation which users visit at present, recommend the content that users might interested in real-time, besides, web server according to operating position of website, look for unreasonable place that website design, thus remind administrators to revise.In this paper, we propose structure and composition module of the IWS at first, and study some data mining algorithm, and then realize one prototype system on this basis finally. According to this, the paper includes the contents as follows: The second chapter puts forward the design standard and architecture of intelligent website based on web log mining as an index of follow-up content of thesis. Chapter 3 mainly introduces a method to improve the data preprocessing of the web log mining, that is Frame page filter technology. The 4th chapter expounds one fast high-efficient cluster pattern algorithm to mine web log ?SLIC. Chapter 5 proposes a strengthening algorithm of frequently accessing web page group in web log mining. The 6th part discusses the real-time recommendation module and administrator’s module for intelligence website based on web log mining. According to ahead discussion, 7 part give and publish one test prototype system -IWS, the last chapter summarizes research work of this dissertation and further researches are prospected.

  • 【分类号】TP311.13
  • 【被引频次】11
  • 【下载频次】346
节点文献中: 

本文链接的文献网络图示:

本文的引文网络