节点文献

多视点商品本体学习研究

Research on Multi-view Commodity Ontology Learning

【作者】 张博

【导师】 聂规划;

【作者基本信息】 武汉理工大学 , 产业经济学, 2010, 博士

【摘要】 互联网及电子商务的发展对商品信息和知识的共享提出了更高的要求,越来越多的应用性研究中尝试使用本体来解决信息交换时的语义差异问题,然而多数研究是建立在一个“假想”的商品本体之上,目前以商品数据和商品知识为描述对象的真实商品本体相对匮乏。尤其是现有的中文商品本体,不但在规模上达不到实际应用的需求,而且在设计上均忽略了商品认知的多视点特性,造成本体不能全面的描述商品知识,难以支持许多应用场景对本体的要求。为解决上述问题,必须研究如何设计多视点商品本体以描述商品认知的多视点特性,研究本体的学习方法以获取期望的多视点商品本体。基于上述两个主要研究目标,本文借鉴了现有的商品本体设计与本体学习研究成果,利用自然语言处理领域的方法和技术,对中文多视点商品本体建模和商品本体的学习方法进行了深入研究。主要的工作包括:多视点商品本体的建模与本体学习任务的确定。针对商品认知的多视点特性,提出商品主观知识的概念并建立了商品主观知识的分析指标,提出了商品知识结构模型;在此模型基础上,设计了多视点商品本体元模型对商品知识结构进行规范描述和形式化说明,由多视点商品本体的元模型所规定的知识内容确定了多视点商品本体的学习任务框架。基于大规模商品电子目录的商品间分类关系抽取。提出一种基于UNSPSC的商品概念间分类关系抽取方法,依靠UNSPSC中收录的商品与服务名称及其分级标准,构建以商品概念为基本单元,概念间分类关系为基本语义关系的多视点商品本体骨架。给出了商品本体的扩展概念集,还提出了基于组词特性的概念关系修剪算法进行整理。基于Web的商品属性概念的获取。提出了一种基于Web的商品属性概念获取策略。根据Web页面的结构化程度,对于显性页面块,研究了根据属性术语识别模板和过滤模板获取候选属性术语和短语的方法;对于普通文本块,提出了纯文本中属性术语分类的内外部特征,研究了基于SVM的属性术语识别方法,为保证属性术语识别的准确性,还建立了基于规则的启发式识别方法。基于属性匹配的商品间非分类关系的学习。提出了基于属性匹配的商品间非分类关系学习策略,采用基于词形和基于概念相似度的属性子集匹配方法,根据属性子集匹配结果,提出匿名关系类型判定规则。提出了基于决策树分类器的商品属性的自动分类方法,将商品属性划分到目标子集中。面向属性分布的商品主观知识的挖掘。提出基于已知视点类型文本的属性视点隶属度与属性关联度挖掘策略。针对未知内容和发布者类型的Web文档,研究基于内容的Web文本分类和基于风格的商品描述文档分类方法以识别文本的内容和视点类型。提出基于共现率的属性视点隶属度和属性关联度计算模型。多视点商品本体的应用实例研究。介绍了使用多视点商品本体的一个应用系统案例,阐述了商品本体在系统中的作用,介绍了该商品本体的构建方法。

【Abstract】 The development of the Internet and e-commerce putting forward higher requests to the sharing of commodity information and commodity knowledge, a growing number of application research attempt to use ontology to resolve the semantic problem which exists in the process of information exchange. However, most researches are based on an "hypothetical" commodity ontology, the true existing commodity ontologies are not enough, especially the Chinese commodity ontologies. The current Chinese commodity ontologies not only can’t meet the certain application scale, but also ignore the multi-view cognitive property when the ontology was designed. These Chinese commodity ontologies cannot describe the commodity knowledge entirely and support most application environments.To solve the problem above, the multi-view commodity ontology should be designed for describing the multi-view cognitive property of commodity and the ontology learning method should be researched for obtaining the anticipant multi-view commodity ontology. Based on the two objects and current research results in the nature language processing, the research work on multi-view commodity ontology modeling and learning are proposed in this paper. The works are described as follows:The hyponymy relations extraction between commodity concepts based on E-catalog. In this part, the paper proposes a hyponymy relations extraction method on the basis of commodity and service catalog in the UNSPSC. The extraction result is the foundation architecture of multi-view commodity ontology which is composed of commodity concepts and category relations between commodity concepts. Besides that, the relation revise algorithm based on phrase construction features given to adjust the extraction result set.The acquisition of commodity attribute based on Web. In this part, the paper brings forward a commodity attribute acquisition strategy which takes the web as the data source. The strategy take different methods according the web page’s structuring feature. To the semi-structured page, the strategy uses the acquisition method by the attribute term recognition template and filter template. To the text page, the strategy uses the acquisition method based on Support Vector Machine (SVM) which according to the commodity attribute term’s interior and exterior characteristic in the text. In addition to this, a heuristic recognition method based on rules is also proposed to guarantee the accuracy of the SVM acquisition method.The non-category relations learning between commodity concepts based on commodity attributes matching. The learning strategy based on attribute subset matching is proposed and the key technologies of attribute matching method are morphology matching and concepts’similarity calculation. The matching result activates the decision rule of the anonymous relation. The attribute subsets are created by the automatic classification method based on decision tree.Attribute distribution-oriented mining method of commodity subjective knowledge. The paper proposes the mining strategy of attribute’s membership degree to certain view type and attribute’s association degree to another attribute. The core of this strategy is calculating the distribution of attribute terms in certain text block which belongs to a definite view type. Wherefore, for the unknown content and view type web text, researches on text categorization based on content and based on writing style are necessary. The key process of the attribute’s distribution calculation is how to build the calculating models of attribute’s membership degree and attribute’s association degree.As an application case about multi-view commodity ontology, a semantic integration system based on multi-view commodity ontology is introduced in this part.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络