节点文献

基于Web的用户访问信息挖掘研究

Research on Web-Based User Access Information Mining

【作者】 赵朋

【导师】 杨保安;

【作者基本信息】 东华大学 , 管理科学与工程, 2006, 硕士

【摘要】 数据挖掘作为一种知识发现的手段,得到了广泛的应用,是数据库最活跃的领域之一。Web挖掘就是将传统数据挖掘技术应用到Web环境中,从Web中抽取信息或知识的过程。在Web挖掘中,基于Web的用户访问信息挖掘应用最为广泛,应用领域涉及电子商务、网络广告、智能推荐系统、网络营销、智能决策领域。一个好的挖掘模型和相应的数据表示及数据库设计是Web访问信息挖掘成功的关键,为此本文进行了相关的研究。 本文在对Web用户访问信息挖掘的相关理论和最新成果的研究的基础上,对数据预处理阶段和模式发现阶段的几个问题和方法进行了研究,并提出了一些改进方法和算法实现,针对具体的问题建立了相应的数据表示和数据库系统设计,并且在此基础上提出了一个基于数据库的Web用户访问信息挖掘系统,并初步实现了其中的几个功能模块。 数据预处理阶段是Web挖掘的数据准备阶段。本文通过SQLServer2000实现了基于数据库的数据清洗任务,并提出了一种网络蜘蛛的字符匹配模式的清除方法。用户识别提出了基于Cookie,ip和agent三个属性的识别算法,并且给出了会话识别和事务识别的具体算法,采用基于最大前向访问的事务识别。 模式发现阶段是Web挖掘的关键。本文首先创建了用户访问兴趣度的数据表示方法,利用概念分层的方式将页面数据进行归纳,并在此基础上导出了适合BP神经网络的数据集,将神经网络应用到用户分类中,构造了一个分类器;其次是在关联规则和序列算法研究的基础上提出并实现了一个频繁访问路径的算法;最后用Matlab实现了一个计算页面类别关联矩阵和统计分析的算法,实现较高概念层次的统计分析和关联规则挖掘,具有较好的扩展性和易用性。 本文最后在前面工作的基础上提出了一个基于数据库的Web用户访问信息挖掘系统的原型,并就原型的各模块进行了分析,该原型允许所有操作基于数据库,得到的模式及规则也存储在数据库中,更

【Abstract】 As a method of knowledge discovery, data mining has been widely used, and was the most active domain of database. Web mining is to use the traditional data mining technologies to extract information and knowledge in the Web environment. The web usage mining is the most wide used method, which is used in the field of e-commerce, internet ads, intelligent recommendation system, internet marketing, and intelligent decision support. A good model of web mining is the key to the success of web usage mining, this dissertation will do some research.The dissertation will improve and implement several methods and arithmetic based on the research of the theory and achievement, which is about web user access information mining. This dissertation will design the database to present corresponding data. Then construct a Web user access information mining system model bade on database, and realize several functional module.Data preprocessing is the preparation of web mining. This dissertation will realize data cleaning in SQLServer2000, and introduce method of data cleaning based on the character matching of the crawler. In the phase of user identifying, method based on Cookie, ip, and agent is used. This dissertation gives the concrete arithmetic of session identification and transaction identification, which uses maximum forward path.Pattern discovery is the key to web mining. This dissertation first constructs data presentation of the user access interesting dimension, uses concept hierarchy to induct the page data, then educes the data set suitable to BP networks, finally uses BP networks to constructs a classifier. Then this dissertation introduces and realizes arithmetic offrequent access path based on association rules and sequential mode. At last, this dissertation creates a Matlab arithmetic, which is extensible and practicable, to calculate the relation matrix and statistic analysis.On the ground of work above, this dissertation presents a Web mining system model bade on database, and describes and analyses every module. This model allows that all the operation be based on database. All pattern discovered should be involved in database so that we can manage and apply pattern discovered easily. This dissertation applies web user access information mining to shanghai agriculture information, and finds several useful patterns. The experience data proves that web user access information mining system is practical and effective.The dissertation uses SQL server 2000 as database system, and uses SQL sentence to implement data preprocess. The dissertation uses C++ and Matlab to develop all the function. Web user access information mining is the widely used web mining technique. It can know the interest of users, improve site structure, provide customized service, better marketing policy, recommend and predict the user’s behavior. The model given in this dissertation is applicable. Research of this dissertation has theoretical importance and practical value to web user access information mining.

  • 【网络出版投稿人】 东华大学
  • 【网络出版年期】2006年 07期
  • 【分类号】TP311.13
  • 【被引频次】4
  • 【下载频次】314
节点文献中: