节点文献

RFID路径数据聚类分析与频繁模式挖掘

RFID Path Data Clustering Analysis and Frequent Pattern Mining

【作者】 林国省

【导师】 邓辉舫;

【作者基本信息】 华南理工大学 , 计算机软件与理论, 2010, 硕士

【摘要】 RFID路径数据是指带有RFID标签的物品在移动过程中产生的路径数据。如何从大量的RFID路径数据中提取有用的信息和知识,成为一门重要的研究课题。本文结合RFID应用,分析了RFID路径数据的特点,提出了若干适用于RFID路径数据的聚类分析方法和频繁模式挖掘方法。在实际应用中,这些挖掘方法能够帮助业务决策,优化或改善业务安排和规划等。在聚类分析方面,路径对象的相似度计算是路径聚类分析算法的基础,本文借鉴了生物信息学中序列比对的相关研究成果,讨论了基于全局和局部相似性的两种路径相似度计算方法。在聚类分析算法方面,传统的聚类分析方法并不能处理RFID路径数据,本文提出了基于密度聚类的路径聚类算法DBPC。DBPC算法根据路径数据的特点,采用了新的构建簇的方法;本文还提出了路径数据的层次聚类算法PHC,使用簇成员加权方案计算簇之间的相似度;最后讨论了异常路径的检测方法和复杂路径数据聚类的可行方法。在频繁模式挖掘方面,本文修改和简化了传统的闭频繁序列挖掘算法CloSpan实现频繁模式挖掘;并基于修改的CloSpan算法提出了一种频繁模式挖掘算法CFPM。CFPM算法根据路径数据的特点,提出了基于节点计数剪枝方法,提高了剪枝效率,比修改的CloSpan算法具有更好的挖掘效率。最后本文讨论了复杂路径数据的频繁模式挖掘的可行方法。本文开发了RFID路径数据挖掘实验系统。此系统具有路径数据可视化的功能,能够直观地表现路径数据的分布和挖掘结果。实验表明,本文讨论的聚类方法和频繁模式挖掘方法能够适用于RFID路径数据。其中聚类算法DBPC和PHC算法能够形成高质量的路径簇,PHC算法具有较高的簇合并效率,路径频繁模式挖掘算法CFPM比传统的CloSpan挖掘算法提高了挖掘效率。

【Abstract】 With RFID technologies and applications developing, the data generated by RFID applications is growing rapidly. How to extract useful information and knowledge from large amounts of data has became an important research topic. RFID path data refer to the path data generated by RFID-tagged objects in their movement process. In this thesis, we analyzed the characteristics of RFID data path based on real applications, proposed some clustering analysis approaches and frequent path mining approaches, which suitable for RFID path data. In applications, these approaches can help business decision-making, optimize or improve the business arrangements, support the planning and so on.In cluster analysis, the similarity measurement is the basis of clustering algorithms. We refered to some related alignment techniques in Bioinformatics, discussed two path similarity calculation methods– global and local similarity based, which can reveal the true similarity of paths. The traditional clustering algorithms can not process RFID path data. This thesis presented a density based clustering algorithm DBPC for RFID path data clustering. Based on the characteristics of path data, DBPC proposed a new way to build clusters. We also proposed a hierarchical clustering algorithm PHC for path data, which use a weight scheme to calculate cluster similarity. Finally, we discuss outlier detection methods and complex path data clustering methods.In frequent path mining, we modified the closed frequent sequence mining algorithm CloSpan to make it suitable for path data. Based on CloSpan, we proposed an algorithm CFPM for frequent path mining, which well correspond with characteristics of the path data, use a node counting scheme for tree cutting, and has better performance. Finally we disscuss frequent pattern mining for complex path data.We developed an experiment system for RFID path data mining, which has path visualization functions, can intuitively reveal the distribution of path data and mining results. Experiments show that the mining approaches proposed by this thesis can be well applied to RFID path data. DBPC and PHC can build high quality clusters and have good efficiency. CFPM improves the mining efficiency compare to tranditional CloSpan.

节点文献中: