节点文献

基于时序逻辑的关联规则挖掘方法的研究

The Research Based on Time Series Logic for Data Mining of Association Rules

【作者】 王果

【导师】 夏幼明;

【作者基本信息】 云南师范大学 , 计算机软件与理论, 2007, 硕士

【摘要】 随着计算机信息系统的日益普及,大容量存储技术的发展以及条形码等数据获取技术的广泛应用,人们在日常事务处理和科学研究中积累了大量的各种类型的数据。在这些保存的数据中,其绝大部分都是呈现时间序列类型的数据。所谓时间序列类型数据就是按照时间先后顺序排列各个观测记录的数据集。时间序列在社会生活的各个领域中都广泛存在,如金融证券市场中每天的股票收盘价格;商业零售行业中,某项商品每天的销售额;气象预报研究中,某一地区的每天气温与气压的读数以及在生物医学中,某一症状病人在每个时刻的心跳变化等等。应该注意到时间序列不仅仅是对历史事件的记录,随着时间推移和时间序列数据的大规模增长,如何对这些海量的时间序列进行处理,挖掘其背后蕴涵的价值信息,这对于我们揭承事物发展变化的内部规律,发现不同的事物之间的相互作用关系,为人们正确认识事物和科学决策提供依据等等,都具有重要的现实意义。因此,有关时间序列关联规则挖掘的研究一直以来都受到广泛地重视,成为一个具有重要理论和实用价值的热点研究课题。本文是将时序逻辑和数据挖掘的知识有效的结合在一起。针对上述问题对基于时序逻辑的关联规则挖掘方法展开研究。提出了基于时间序列的关联规则的趋势预测方法,针对同事物同属性和不同事物同属性两种情况,提出对应的时序关联规则挖掘算法:从时间序列集U’中挖掘出来最大的上升和下降的子时间序列的长度;从时间序列集U’中挖掘出来长度为k的上升和下降的子时间序列的个数;给出基于时序逻辑的同事物同属性的关联规则的置信度和支持度的计算方法;提出通过挖掘子时间序列中频繁出现的属性状态信息建立时序关联信息树的算法。即首先找出时间序列集上的频繁1—属性状态,然后用这个集合中的元素做树根,分别建立时序关联信息树。时序关联信息树是以频繁1—属性状态集中的元素作为时序模式的导出属性状态建立的,树的分支是由在长度为k的子时间序列中频繁出现在树根元素之前的频繁项及其出现的相对时间段构成,然后在这—基础上给出了基于时序逻辑的不同事物同属性的关联规则的置信度和支持度的计算方法,并针对不同对象相同属性下的时序关联规则提出规则缩减原则。随着当今数据采集和存储技术的不断发展.数据库中存储的数据量急剧增加,数据库的规模也因此变得越来越庞大。人们发现自己己不再是缺少信息,而是被信息海洋所淹没。数据挖掘(DM,Data M-ming)能为决策者提供重要的、极有价值的信息或知识,从而产生不可估量的效益。因此,越来越多的大中型企业开始利用数据挖掘来分析公司的数据以辅助决策,数据挖掘正逐渐成为市场竞争中立于不败之地的法宝.它是从大量的、不完全的、有噪声的、模糊的,随机的数据中提取隐含在其中、人们事先不知道的但又是潜在的有用信息和知识的过程。特别要指出的是,数据挖掘技术从一开始就是面向应用的,这样一来.就把人们对数据的应用。从低层次的末端查寻操作提高到为各级经营决策者提供决策支持的高度上来。如何分析数据并从中挖掘出有用的知识是一项既费时又难于进行的工作.通常,对于特定领域的数据挖掘需要有一定的背景领域知识.并在此基础上采用某种有效工具从数据集中获取更多的隐含的、先前未知的并具有潜在价值的知识。这种挖掘在工业过程控制、医疗诊断、股票分析、水文气象等领域尤显重要.因为这些领域的数据有一个共同的特点,即它们都记录了某个领域的时间序列信息,且信息量特别巨大,如果没有合适的挖掘手段则势必给以后的决策和新数据的预测带来困难.时间序列数据的出现使得有必要针对这一特殊数据类型的挖掘给出相应的策略,以便发现在某段时间内连续记录的某属性序列值的变化规律.以及它的变化给其它属性值所带来的影响.近些年来,随着粗糙集理论的研究深入,它已被广泛地应用于数据库中的知识发现、智能控制、机器学习、决策分析、专家系统以及模式识别等众多领域。本文首先对数据挖掘进行了详细的概述.并介绍了当前数据挖掘的常用技术:如决策树方法、遗传算法和进化理论、神经网络方法、贝叶斯分类方法、模糊集方法、粗糙集方法和类比学习等方法。并列出了数据挖掘的一些成功的应用;接着介绍了数据挖掘中最活跃的研究方向关联规则的挖掘,对其主要算法进行了描述和分析:最后具体地介绍了关联规则中的时序关联规则的挖掘,并对一些时序关联规则挖掘算法进行了比较分析。

【Abstract】 With the computer information systems is becoming increasingly popular, the development of large-capacity storage technology and bar code data acquisition technology are widely used, People in the day-to-day affairs and scientific research have accumulated large amounts data in various types. The preservation data are showing their most types of time series. The so-called time series data type is arranged the various data sets of records according to the chronological order. The time series are extensive exist in the social life of all fields. For example, the daily stock prices in the financial market; the commodity daily sales in commercial retail industry; the reading of the daily temperature and pressure in weather forecasting research; the every time changes of a symptoms a patient’s heartbeat in the biomedical, and so on. We should note that the time series is not just for the record of historical events, With the passage of time and time series data massive growth, how to deal with these massive time sequence, and mining the implication information of value, it has an important practical significance in exposing the internal changes law of things development, founding different things interaction, providing a basis for the people to correctly understand the issues and scientific decision-making ,and so on have. So the time series Mining Association Rules research has been widespread attention, and become an important and hot research topic with theoretical and practical value.This text effective combination the temporal logic and knowledge of data mining, and study the association rule mining method that base on sequential logic. Proposed the trend forecasting methods of association rules which based on the time sequence, against the same things with same attribute and different things with different attributes, proposed the corresponding temporal association rule mining algorithm; Ming the biggest length of the time-sequence with max rise and fall from time series U; The up and down time-sequence number from the time series U mining which length is k; Put forward the calculation method of the association rule’s support degree and confidence degree based on the temporal logic ,and the association rule has same attributes and different things; Proposed the algorithm by setting timing associated information tree though mining the attributes of mining-time series that frequently appearance. This chapter identifies the state of frequent 1 - attribute in time series first , and then use the elements in this set as root, established the timing association information tree, timing associated information tree is establish though use the elements of frequent 1-tree state attributes as a focus on the temporal pattern, Tree branches construct by the frequent items which length of the k-time series frequently appeared before the roots elements, on this basis, I give the calculate measure of related rule’s believe and support degree based on the sequential logic with different things and same attributes, and use the knowledge reduction to reduction the temporal association rules. Due to the development in technique of data acquisition and database, the data amount stored in the database increases sharply, the scale of the database becoming greater. People find that they no longer lack information, but they flooded by information ocean. The Data Mining can offer important, extremely valuable information or knowledge to policymaker therefore produced the inestimable benefit So, more and more large and middle-sized enterprises use Data Mining to assistant decision-making, the Data Mining is becoming a Trump to be able to establish an invincible position in market competition, It extraction the potential and useful information and knowledge that people do not know in advance from abundance, incomplete, fuzzy, random data. In particular, Data Mining began with orient application. Thus, it produces offer height of decision support for managing a policymaker in the units at various levels from search people application in data low end of level.It is a time-consuming and difficult work to analysis the data and mining the useful knowledge. Usually, It need certain background knowledge in some specific field Data Mining and use a certain valid tool to obtain more implicit, unknown but have potential value knowledge from the dataset. This mining process is important in industry process control, medical diagnosis, hydrometeor and stock analysis, etc, because the data of these fields have a common characteristic, they all write down certain time series information, and the amount of information is very enormous. It will be difficulty in forecast and decision without the suitable mining measure. As the Time Series Data appearance, it necessary to providing a corresponding tactics for this special date type mining, in order to find the change law of certain array written down in a succession time, and find the change influence on other attribute values also. In recent years, as the research on rough set theory become deeper. It is extensive applied in many field, such as knowledge discovery in database, intelligent control, machine learning, decision analysis and expert system.At first, this text summarize the data mining in detail, and introduce the data mining technology usually used at present: Such as decision tree method, genetic algorithm and evolutionary theory, neural network method, Bayesian theorem, fuzzy set method, a rough set approach, analogy learning, etc, and list some successful application about data mining; Then, this article introduce the most active study trend: association rule mining. Describe and analysis its main algorithms. Finally, this article introduced the time series association rule mining concretely, and compare and analysis some time series association rule mining algorithms.

  • 【分类号】TP311.13
  • 【被引频次】2
  • 【下载频次】211
节点文献中: 

本文链接的文献网络图示:

本文的引文网络