节点文献

基于选择路径和浏览页面的用户聚类算法研究

【作者】 黄翔

【导师】 费洪晓;

【作者基本信息】 中南大学 , 软件工程, 2010, 硕士

【摘要】 人类已经进入了网络时代,网络技术的发展为网络教学提供了一片崭新的天地。现有的网络教学系统,虽然自身信息量极其丰富,但教师对学生的学习情况缺乏了解,无法满足学生个性化的学习需求。运用Web日志挖掘技术,从学生上网学习行为中发现相似的群体以及浏览的兴趣的兴趣路径,能帮助教师及时调整教学方案更新网络站点结构。本文对Web日志挖掘系统进行研究。按照Web日志挖掘的步骤,首先对Web日志预处理过程进行研究,分为六个步骤:数据收集、数据清洗、用户识别、会话识别、路径补充、事务识别,研究了相关理论、算法,并在此基础上提出对事务识别算法加以改进,省略路径补充过程,直接由会话得到事务。其次,对用户聚类算法进行研究,针对现有的基于Hamming距离的聚类算法的不足,只考虑了用户访问的次数而没有考虑用户访问该URL时在该URL上停留的时间,以及在这段时间内在该页面上所执行的操作,提出了选择路径兴趣和浏览页面兴趣相结合的用户兴趣度,并在此基础上提出相应的聚类算法,并将该算法运用到用户聚类和浏览兴趣路径的获取中。在上述研究的基础上设计并实现了基于用户综合兴趣度的Web日志挖掘系统。该系统是由JSP实现,可以帮助管理员/教师了解学生对网站的访问情况,改进站点结构。

【Abstract】 Mankind has entered the Internet era.The development of network technology offers a new world to the teaching of online education. Web-based teaching system has a vast amount of information.But Teachers lack understanding of the situation on students’learning. It does not meet the needs of individualized learning. With the Use of Web data mining technology, we could learn who are similar with each other from student Internet learning behavior, what is interesting path. It can help teachers adjust teaching plan and update network site structure.This article makes research on Web log mining system. Follow the steps for mining the Web log, we firstly made research on the Web log preprocessing, it divided into six steps:data collection, data cleaning, user identification, session identification, path supplementary, transaction identification.We researched on their theory, algorithms. And on this basis, we improved the transaction recognition algorithm which omitted to add the path. Secondly, we make research on user clustering algorithm. We focused on the clustering algorithm based on hamming distance, which only took the times of the user access into account, ignoring the users’behavior in the URL and residence time. We proposed a user interestingness, which combined the interest of choosing the path with the interest in browsing page. On the basis, we proposed a clustering algorithm, and applied it to user clustering and browsing path.In these studies, we designed the Web Log Mining System based on user interest rate. The system was realized by the JSP, which can help administrators/teachers to understand the behavior of students when they visits the site.It also help to improve the structure of the site.

【关键词】 Web使用挖掘个性化聚类
【Key words】 Web log miningpersonalizationclustering
  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2011年 02期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络