节点文献

K-均值聚类算法及其在高校图书馆日志挖掘中的应用研究

K-means Clustering Algorithm and Its Application in the College Library Web Log Mining

【作者】 康耀龙

【导师】 卢才武;

【作者基本信息】 西安建筑科技大学 , 系统工程, 2010, 硕士

【摘要】 在网络普及化的今日,人们在使用网络时留下了大量有价值的信息可供分析。面对着日益庞大的信息库,如何从中找出有用而不易被发现的知识,已成为一个重要的研究课题。利用Web日志挖掘技术对用户访问日志进行挖掘,可以解决上述问题。本文根据图书馆用户访问行为的特点,采用聚类方法对高校图书馆访问日志进行数据挖掘。针对K-均值聚类算法中初始聚类中心选取的随机性导致聚类正确性与效率下降的问题,结合网格等方法,提出了一种改进的K-均值聚类算法,简称IKM算法,此算法在聚类正确性、效率与稳健性方面都有较大的改进。在日志挖掘阶段,设计并实现一个可视化日志挖掘辅助工具。针对日志挖掘的研究,此工具可直接用来生成数据输入向量表,以及对聚类挖掘后的结果进行统计。最后利用改进后的K-均值聚类算法,构建I-Weka挖掘工具。通过Java开发平台,对I-Weka工具进行实现,将IKM聚类算法封装到Weka工具中。使用改进的I-Weka工具,对预处理后的高校图书馆日志数据进行聚类挖掘,从最终的结果进行分析,可以获得用户对不同种类书目的兴趣度,从中发现哪些类的图书关注度比较高,而哪些书存在馆藏数量不足的现象,为高校图书馆采购部门采购图书提供参考依据,从而达到合理使用经费,完善馆藏建设,提升图书馆的服务质量的目的。

【Abstract】 Nowdays, people are using the Internet, which can leave a lot of valuable information for analysis, along with the popularization of the network. Facing an increasingly large information base, how do we find a useful knowledge not easily found, which has become an important research topic. We can solve this problem, by mining the user access log records and useing the Web log mining technical.Accoding to the characteristics of library user access to, We mine the Web log of college library by the method of clustering. K-means clustering algorithm select the initial cluster centers is random, which can abate the accuracy. This paper proposed a improved K-means algorithm—IKM, combined the method of grid. This algorithm has a greater improvement in accuracy and robust of the cluster.During Web log mining, designed and implemented a visual log mining software. This tool can be used to generate a vector table of data input, and count the results of clustering mining.Finally, construct I-Weka mining system with the improved K-means clustering algorithm. Through the Java development platform, we add the IKM algorithm into the Weka system. We can mine the data preprocessed with clustering, using the improver I-Weka. Analysing the final results can draw what kind of books are the users interested in, find what kind of books are in a relatively high degree of concern or what kind of book collections have the incomplete phenomenon, and provide reference for purchasing books to the procurement department of college library, which can use the funds reasonable, improve collection strcture and to upgrade the library service quality.

【关键词】 Web日志挖掘聚类K-均值算法图书馆
【Key words】 Web log miningClusteringK-means algorithmLibrary
节点文献中: 

本文链接的文献网络图示:

本文的引文网络