节点文献

基于Web的数据挖掘研究

Study on Web Data Mining

【作者】 张承明

【导师】 孙忠林;

【作者基本信息】 山东科技大学 , 计算机应用技术, 2003, 硕士

【摘要】 数据挖掘技术是近年来随着数据库技术和人工智能技术的发展而出现的全新的信息技术,融合了数据库、人工智能和统计学等多种学科的知识,试图从数据中提取出先前未知、有效和实用的知识。数据挖掘技术与统计学、数据库技术、数据库知识发现等学科与密切的联系,也有明显的不同。数据挖掘主要研究内容包括广义知识、关联知识、分类知识、聚类知识、预测型知识和偏差型知识的内容。使用关联分析、分类和聚类分析、神经网络、决策树和规则推理等技术进行挖掘。 由于Web上的信息具有数量庞大、无序性强、重复性大的特点,人们现在还不能迅速、方便地从Web所包含的大量信息中获取所需要的信息。Web挖掘是传统数据挖掘技术在Web环境下的应用,试图从大量的Web文档集合和用户浏览Web的数据信息中发现蕴涵的、未知的、有潜在应用价值的、非平凡的模式。Web挖掘分为Web内容挖掘、Web结构挖掘和Web使用模式挖掘。Web使用模式挖掘是从用户浏览网站的数据中抽取感兴趣的模式,理解用户的浏览兴趣行为,以便进一步改善网站结构或为用户提供个性化的服务。 本文对Web使用模式挖掘的数据采集、用户浏览兴趣的度量和表达两个方面进行了研究,主要的工作有: 1.分析了现有Web使用模式挖掘的数据采集方式,指出了当前数掘采集方式的不足,如由于HTTP协议的无状态连接而难以在Web日志中得到准确的用户浏览信息。提出了一种综合利用服务器日志文件和客户端数据获取用户浏览信息的方法。 2.兴趣是指个人对客观事物的选择性态度,准确地度量用户浏览兴趣是Web使用模式挖掘的基础。本文针对Web使用模式挖掘领域,首先分析了已有的度量用户浏览兴趣方式的不足之处,如度量方式过于简单而导致不能更好地区分用户感兴趣类与不感兴趣类;没有考虑页面信息量对用户浏览时长的影响等。在此基础上,提出了一种基于用户浏览行为度量用户浏览兴趣的方法。 3.如何有效地表达用户浏览兴趣是Web使用模式挖掘研究的方向之一。本文在分析了现有的表达用户浏览兴趣方式的基础上,提出了一种基于树形结构表达用户浏览兴趣的方式。 本文提出的基于用户浏览行为度量和表达用户浏览兴趣的方法改进山东科技大学硕士学位论文摘要了原有的度量和表达方式在数据采集、兴趣度量、兴趣表达儿个方面的不足,以便更好地为进一步的挖掘做准备。

【Abstract】 Data Mining is fairly a new communicational technology that has been developed with the technology of database and Artificial Intelligence. Data Mining tries to extract the unknown, effective and useful knowledge from data. On one hand. Data Mining technology has a close relationship with Database technology, statistics and KDD; On the other hand, they are quite different. Data Mining mainly studies on research Generalization Knowledge, Association Knowledge, Classification Knowledge, Clustering Knowledge, Prediction Knowledge, and Deviation Knowledge. In the data mining, the technologies of associative analysis, classification, clustering have been used.As Web information is of great amount, strong orderlessness, high repeatability, people cannot get the information they need from Web quickly and conveniently. Web mining is the traditional data mining technology used in Web, attempting to find implicative, unknown, and non-trivial schema which has potential application from the innumerable Web file assembly and the data information which can be gotten when the user browse Web. Web using schema mining gets the interesting schema from the data the user browsed, and apprehend the user’s browse interest behavior, in order to improve the Website’s structure or provide individual service for the user.This paper is dedicated to Web schema mining’s data acquisition mode, the measurement and expressing of user’s browse interest, and the main tasks are as follows:1.Analysing the present data acquisition fashion of Web schema mining, pointing out the shortage of the present data acquisition fashion, For example, because the non-state link of HTTP, it is difficult to get exact information of user’s browse from Web log; proposing a method which comprehensively use the service log file and the client end data to get the user’s browse information.2.The interest is the selectivity attitude of objective matter of a person, and measuring user’s browse interest exactly is the base of Web schema mining. According to the filed of Web usage schema mining, this paperanalyses the present the shortage of the style of measure and expresses the browsing interest of user. For instance, the too simple measure fashion often leads to difficulty of distribution which is the user interested in or not; not considering the page information amount’s influence on the user’s browse time and so on. As a result, point out a method based on user’s browse behavior to measure the user’s browse interest.3.One of the direction of using mode dining studying in Web is how to express user’ browse interest effectively. In this paper, we gives a kind of expressing user’ browser interest mode which is based on tree-type structure.The method based on user’s browse behavior and expressing the user’s browse interest in this paper improves the shortage of indigenous measurement and expresses the mode in data collection, interest measuring and interest expressing aspects, it can prepared for the further mining work better.

  • 【分类号】TP393.09
  • 【被引频次】7
  • 【下载频次】1609
节点文献中: 

本文链接的文献网络图示:

本文的引文网络