节点文献

面向产品持续质量控制的数据挖掘技术与应用研究

Research on Technologies and Application of Data Mining for Product Continual Quality Control

【作者】 谭军

【导师】 卜英勇;

【作者基本信息】 中南大学 , 机械工程, 2013, 博士

【摘要】 知识是制造企业最有价值的资产。数据挖掘能够从大量的各种业务数据中提炼出有价值的知识,从而极大促进了制造技术和制造模式的发展。关联规则挖掘是一种最重要的数据挖掘技术之一,它可以有效地发现数据项之间的关联,并且规则的表达形式简洁,易于理解和解释,因此关联规则挖掘算法的研究具有重要的理论意义和广阔的应用前景,一直是数据挖掘领域研究的热点。本文对关联规则挖掘关键技术及其在产品持续质量改善中的应用做了深入的研究,主要的创新性工作包括:(1)为了构建条件FP-tree,FP-growth算法必须扫描数据库两次,这极大制约了它的应用。针对这一局限性,本文提出一种新颖的FP阵列技术,直接从FP阵列得到频繁项的计数,从而极大减少了遍历FP-tree的需要。本文将FP-tree数据结构与FP阵列有效地结合起来,分别提出了挖掘频繁项集和闭频繁项集的新算法。实验评测表明这两种算法在运行时间、内存消耗和可扩展性方面都具有稳定优良的性能,尤其对于稀疏数据库。(2)Apriori算法和FP-growth算法都是以批处理方式处理所有事务,无法满足动态更新关联规则的需要。本文在FUFP算法的基础上提出了一种基于次频繁项的改进算法,在算法中引入两个支持度阈值:阈值上限和阈值下限。如果处理的新事务数没有达到一定的值(由两个支持度阈值和数据库的规模决定),该算法就不需要重新扫描原数据库,从而提高了关联规则更新的效率。实验评测表明数据库的规模越大,算法的性能优势越明显。(3)传统关联规则挖掘算法不能同时处理多种类型的数据,无法适应多样性客户需求数据挖掘的需要。针对这一局限性,本文首先给出了各种数据类型的定义以及挖掘的规则模式的定义,提出用相似度统计项目的支持度计数,然后提出一种基于模糊集的新方法以统一的方式处理各种数据类型,最后提出一种基于Apriori的模糊关联规则挖掘算法,并将其应用到电动自行车问卷调查数据的关联分析。(4)以上述研究工作为基础,本文开发了一个产品持续质量改善信息系统(ARMS),其目标是以低成本、低资源消耗为代价生产高质量产品,提高客户的满意度。ARMS由三个模块组成:流程数据集成模块、关联规则挖掘模块和关联规则优化模块。ARMS系统采用基于XML的流程质量语言将各有关部门的流程数据集成到中央数据仓库,在此基础上采用本文提出的新算法发现不同部门的流程参数组合与产品质量特性之间的关联规则,再运用遗传算法优化这些规则,从而帮助流程工程师调整流程参数的设置以持续提高产品的质量。图92幅,表19个,参考文献202篇。

【Abstract】 Knowledge is the most valuable assets of a manufacturing enterprise. Data mining can extract valuable knowledge from all kinds of manufacturing data, which has promoted enormously the development of manufacturing technology and manufacturing mode. Association rules is one of the most important data mining technologies which can effectively find the relationship between data items. And the expressions of association rules are concise and easy to understand and explain. So association rules algorithm research has important theoretical significance and broad application prospect which has been a hot research field of data mining. In this paper, the key technologies of association rules and their application in product continual quality improvement have been studied deeply. The main innovation work is as follows:(1) For generating conditional FP-tree, FP-growth algorithm need scanning database twice. Thus FP-growth algorithm can’t adapt to the characteristics of data in dynamic real-time database. Aiming at the limitations, this paper presents a novel FP array technology. The counts of frequent items are obtained directly from FP array, thus the first scan is omitted. An improved frequent itemsets mining algorithm and a closed frequent itemsets mining algorithm are presented which use the FP-tree data structure in combination with the FP array technology. Experimental evaluations show that the two algorithms have stable superior performance in running time, memory consumption and scalability aspects especially for the sparse database.(2) Apriori and FP-growth algorithm process all transactions in a batch way which can’t adapt to the need to update association rules dynamically. This paper presents the concept of pre-frequent itemsets. Through an upper minimum support threshold and a lower minimum support threshold, pre-frequent itemsets are defined. On the basis of fast updated FP-tree algorithm(FUFP), this paper presents an improved algorithm based on pre-frequent itemsets which does not need to scan the original database untill the new transactions reach a certain amount. So it improves the efficiency of the update. Experimental evaluations show that the larger the size of the database, the more obvious the performance advantages of the algorithm.(3) Customer demand is the driving force behind the development of enterprises. The diversity of customer demands leads to the diversity of the questionnaire data type. But traditional association rules mining algorithm can’t handle a variety of types of data. Aiming at the limitation, this paper first defines the various data type and mining rules mode, and presents to statistic the support counts of items by similarity degree. Then a novel method based on fuzzy set theory is presented to deal with all kinds of data types in a unified way. At last a fuzzy association rules based Apriori is presented which is applied to the analysis of survey data of electric bicycles.(4) Based on the above research work, this paper designs an information system for product continual quality improvement (ARMS) whose goal is to product high quality products with low cost and low consumption of resources and to improve customer satisfaction. The system integrates the process and quality data in different departments through the process quality language based on XML. Then on the basis of it, a mining algorithm which is proposed in this paper is used to find the relationship between the distributed process parameters combination and product quality problems, then the genetic algorithm is used to optimize these rules so that can help quality managers to adjust the settings of process parameters in order to facilitate continual quality improvement.

  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2014年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络