节点文献
扩展粗糙集模型及其在烟叶质量预测与评价中的应用
Extended Rough Set Models and Their Applications in Quality Prediction and Evaluation for Tobacco Leaves
【作者】 谭旭;
【导师】 陈英武;
【作者基本信息】 国防科学技术大学 , 管理科学与工程, 2009, 博士
【摘要】 粗糙集是处理不精确、不完备、不一致数据的有效机器学习方法。Pawlak经典粗糙集在应对实际应用中所存在的各类不确定性因素时,有着较大的局限性。为了使粗糙集方法能更好地应用于实际,本文对经典粗糙集的相关理论与模型进行了改进和扩展,并通过充分的理论证明与大量的实验分析,表明了相应改进算法与扩展模型的合理性和有效性。经进一步地将理论研究成果应用于烟叶质量预测与评价中,充分说明了粗糙集方法解决同类问题的优越性。全文总体分为四部分内容进行论述,前两部分内容围绕粗糙集相关理论和模型进行探析,后两部分内容为基于粗糙集的烤烟烟叶质量评测的应用性研究。理论研究方面,首先针对粗糙集理论中的三个核心关键问题,即离散化问题、知识约简问题和规则提取与推理问题,进行了相关的分析研究。在离散化问题的研究中,考虑粗糙集方法的特点,提出了三个新的启发式离散化准则,并以此构建了半全局离散化算法。在知识约简问题的研究中,为了克服不一致数据给属性约简带来的困扰,提出了基于条件熵的改进分辨矩阵的条件属性约简方法,并以此给出了增量意义下的条件属性约简算法。在粗糙集规则知识提取和推理决策问题的研究中,分析了粗糙集作为一种不确定性归纳推理机器学习方法,其不确定性解决思路所在,归纳了利用粗糙集进行规则知识提取和推理决策的基本步骤以及相应的建模过程。基于对Pawlak经典粗糙集的研究,进一步考虑实际应用中所存在的各种数据不确定性和知识不确定性,在本文的第三章提出了三种扩展粗糙集模型,即区间型数据粗糙集模型、杂合数据粗糙集模型以及广义相似不完备数据粗糙集模型。不同的扩展粗糙集模型均考虑了不同类型的不确定性,并基于不同类型的不确定性给出了相应的粗糙集建模流程和算法,对应的解析证明和仿真实验验证了这些扩展粗糙集模型。此外,对于扩展粗糙集模型中普遍存在的两类阈值选取问题,从粗糙集理论的相关定义出发分别对两类阈值的合理选取作出分析,并给出了阈值寻优算法。应用研究方面,首先对烤烟烟叶质量评价的两大方面,即烤烟烟叶的外在质量和烤烟烟叶的内在质量,依据烟叶的常规化学成分,结合相应的扩展粗糙集模型,对其中依赖人工经验取值的评测指标进行辅助预测,以获取更为合理的烤烟烟叶内、外在质量评测指标值,并实现指标评测的自动化和智能化,同时为后续的烤烟烟叶质量的综合评价奠定基础。此外,通过粗糙集方法所挖掘的规则知识,较好地揭示了各评价指标与相应化学成分的映射关系。通过与同类智能算法基于烤烟烟叶历史数据集的预测对比实验,充分说明了粗糙集方法以及相应扩展粗糙集模型的合理性与优越性。对于烤烟烟叶质量评价问题,首先研究了基于烤烟烟叶等级划分的常规质量评价方法,构建了相应的粗糙集“两级推理”评价模型,使得等级质量评价达到“分级”级别上的精细结果,也以此为粗糙集模型应用于大规模数据集的处理提供了解决思路。之后,通过分析现有烤烟烟叶综合质量评价的研究成果,构建了一个较为完备的烤烟烟叶综合质量评价指标体系。最后,通过结合主客观赋权的多属性决策方法,首次给出了粗糙集方法下的烤烟烟叶综合质量评价决策模型。文末的实例阐释并验证了评价模型。
【Abstract】 Rough set theory is an efficient machine learning tool which deals with imprecise, incomplete and inconsistent data. However, Pawlak’s traditional rough set theory shows its limitations when facing some kinds of uncertainties in reality. In order to get better use in the practical process, some improvements and extentions of Pawlak’s traditional rough set theory and related models are proposed. The rationality and and effectiveness of the improved algorithms and extended rough set models are evaluated through a large number of experiments and corresponding theoretical proofs. Further more, theoretical research results are applied to quality prediction and evaluation for flue-cured tobacco leaves, which prove that rough set theory has advantages over other methods when solving similar problems.The whole article is composed of four parts. The first two parts focus on rough set theory and related models, the latter two parts are about applications of quality prediction and evaluation for flue-cured tobacco leaves based on rough set theory.In aspect of theoretical research, three key problems in rough set theory are discussed, which include discretization problem, knowledge reduction problem and problem of rule extraction for further reasoning. As to discretization problem, three new heuristic criterions are put forward by considering the uniqueness of rough set theory, and a novel algorithm named“half-global discretization algorithm”is presented. As to knowledge reduction problem, in order to eliminate troubles aroused by inconsistent data during attribute reduction, a new conditional attribute reduction algorithm based on improved discernibility is presented, which takes conditional entropy into account. After that, an incremental conditional attribute reduction algorithm is developed. As to the problem of rule extraction for reasoning and decision, the uncertainty solving routines of rough set theory, which is well known as a kind of machine learning tool based on uncertainty reasoning, is elucidated. In addition, steps of rule extraction for reasoning and related modeling processes of rough set theory are summarized.In chapter3, on the basis of Pawlak’s rough set theory and considering data uncertainty and knowledge uncertainty in reality, three extended rough set models--interval data rough set model, hybrid data rough set model and incomplete data rough set model based on general similarity are presented. Each extended rough set model deals with specific uncertainties. For each extended rough set model, interrelated modeling methods and reasoning algorithms are presented. These new models were proved by corresponding theoretical and then validated by simulation experiments. In addition, regarding the widely existing problem of threshold determination in extended rough set models, we analyzed all rough set theory related definitions and designed the optimization computing algorithms to choose the two threshold values. In aspect of application research, we firstly focus on two major aspects in quality prediction for flue-cured tobacco leaves—appearance quality and inherent quality. In order to obtain more reasonable values of evaluation indexes for tobacco leaves’appearance quality and inherent quality, to score the evaluation indexes automatically and intellectualized, and to establish good foundation for the upcoming synthetic quality evaluation for flue-cured tobacco leaves, we predict values of some evaluation indexes which rely on manual work in usual, based on conventional chemical compounds of flue-cured tobacco leaves and proper rough set models. Furthermore, in virtue of the rules acquired by rough set theory, mapping relationships between each evaluation index and chemical compounds are brought forth perfectly. Through comparable experiments based on flue-cured tobacco leaves’historical data, it can fully prove the feasibility and advantage of using rough set theory and the proposed extended rough set models.As to quality evaluation problem for flue-cured tobacco leaves, we firstly studied conventional quality evaluation method which derives from flue-cured tobacco leaves grading. In this section, two-level reasoning model based on rough set theory is built up, which can reach refined evaluation results and also provide a feasible way to deal with large-scale database by using rough set model. Then, by summarizing current research results for flue-cured tobacco leaves’synthetic quality evaluation, a relatively comprehensive tobacco leaf quality evaluation index system is defined. Lastly, by combining the multi-attribute decision making method based on the objective and subjective synthetic approach to determine weight, a flue-cured tobacco leaves’synthetic quality evaluation and decision model based on rough set theory is proposed for the first time. Examples are given to expatiate the given evaluation model.
【Key words】 Discretization; Knowledge reduction; Production rule; Uncertainty; Rough set theory; Flue-cured tobacco leaf; Prediction; Multi-attribute decision making;