节点文献

农业科技多维语义关联数据构建研究

Research on Construction of Multidimensional Semantic Linked Data of Agricultural Science and Technology

【作者】 鲜国建

【导师】 赵瑞雪;

【作者基本信息】 中国农业科学院 , 农业信息管理, 2013, 博士

【摘要】 人类社会正以惊人速度产生海量数据,信息总量每18个月就会翻一番,“大数据”时代已经来临,以数据密集型计算为特征的科学研究“第四范式”方兴为艾。在“信息泛滥、知识匮乏”的背景下,科研人员面对日益复杂的科研问题,对个性化、关联化、集成化精品信息资源,以及嵌入科研过程的深层次知识服务需求更为迫切。通过引入关联数据的理念和方法,本文基于国家农业图书馆和农业科学数据共享平台中各类数据资源,开展了农业科技多维语义关联数据构建方法和关键技术研究,并选取“水稻”领域进行实证研究,设计并实现基于关联数据驱动的领域知识服务原型系统。开展的主要工作及取得成果包括:(1)调研了国内外最新研究进展,分析了关联数据与数据网络、语义网和知识组织系统等概念的区别与联系,对关联数据进行了分类,深入调研了关联数据的构建流程、语义关联描述模型、构建工具和关联关系构建方法。(2)应用简单知识组织系统SKOS,将农业科学叙词表CAT进行了规范语义描述,并与AGROVOC、NALT等几大涉农知识组织体系建立了映射,自主开发了批量转换工具将CAT转换为CAT/SKOS关联数据。(3)在深入分析国家农业图书馆科技文献资源特点基础上,综合应用DCMI、BIBO等本体,对其进行了规范化描述和语义关联模型构建,自主开发了文献自动标引工具,将CAT/SKOS规范概念植入各类文献,基于开源工具D2R实现了农业科技文献语义关联数据构建。(4)继承复用SWRC、VIVO、FOAF等本体,将农业科学数据共享中心700多个数据库集,以及农业科技机构、科技人员和科研项目等专题数据库进行了规范化语义描述,构建了覆盖科学数据、科技文献和叙词表的多维语义关联模型,构建了轻量级的农业科技多维语义关联数据网络,创建RDF三元组超过300万个。(5)设计了关联数据驱动的领域知识服务系统体系架构和功能模块,通过集成SPARQL、Virtuoso等关键技术,开发了水稻领域知识服务原型系统,实现了领域知识的集成浏览和关联发现、动态分面导航与检索、SPARQL终端查询、HTTP URI参引解析和RDF内容下载等功能。研究表明,引入关联数据的理念和技术方法,是实现海量农业科技信息资源精细化揭示、规范化描述、语义化组织和深度整合的最佳实践,对提高农业信息资源的可知性、可见性、可获得性将起到重要作用。基于关联数据来设计和开发新型知识服务功能,可进一步拓展知识服务途径,对促进农业科研创新有着重要研究价值和实践意义。本文在农业科技多维语义关联模型和关联数据驱动的领域知识服务系统构建方面都具有一定创新性。

【Abstract】 The “Big Data” age is arriving as for the human beings are creating huge amount of data everyday,and the total volume of data would be doubled every18months.The “fourth paradigm” of discoverybased on data-intensive science emerging rapidly.However, the existing information organization andinformation service means is difficult to adapt to changes in agricultural research environment anddevelopment trend, and the researchers are often drowning in information, but starved forknowledge.They are looking for domain high qality information resources which are well organizedspecially, linked and merged, and depth-knowledge service means embed into their process ofagricultural research and innovation.The idea and methodologies of Linked Data have been adopted in this paper, the key technologyand approaches of the construction of agricultural sci-tech semantic linked data have benn studied, andthe multidimensional semantic linking model has been designed.The semantic linked open data basedon part of the literature resources of China National Agricultural Library and also some scientific datafrom the National Agricultural Scientific Data Sharing Platform have been constructed andpublished.Finally, a prototype knowledge service system completely driven by the semantic linked datahas been designed and realized in the selected “rice” domain.The main research work and results of this paper are concluded as following:(1) The latest research and practices progress of Linked Data home and abroad have beeninvestigated, the difference and relation between the Linked Data and the Web of Data, Semantic Weband Knowledge Organization System was analyzed, and the classification of linked data, thetechnological process, semantic link descriptive model, automatic tools have also been studied.(2) Concepts and semantic relationships within the Chinese Agricultural Thesaurus (CAT) havebeen formally described using the elements of Simple Knowledge Organization System (SKOS).Aconversion tool has been developed, and successfully converted the whole contents of CAT and itsmapping results with other knowledge organization systems related with agricultrure such asAGROVOC, NALT, etc., into the linked data version namely CAT/SKOS.(3) Main classes and properties have been abstracted from the literature resources of ChinaNational Agricultural Library.The widely used vocabularies and ontologies such as DCMI, BIBO,etc.have been reused to formally describe and model these classes, properties and semanticrelationships.An automatic indexing tool has also been developed to embed the concepts and links ofCAT/SKOS into these literatures.The semantic linked data of agricultural literatures have beenconstructed and published based on the open source software D2R Server.(4) Core metadatas of over700datasets and some particular databases hold the data about theinstitutes, researchers and projects of agricultural have been modeled by reusing the well knownvocabularies or ontologies (e.g.VIVO, SWRC, FOAF).The multidimensional semantic linking modelcovering the scientific data, literatures, and thesaurus has been designed.And based on this model, the expected multidimensional semantic linked data of agricultural sci-tech has been created and published.(5) The architecture and function models of domain knowledge service system have beendesigned, which based on linked data.A prototype system was realized based on some key technologiessuch as SPARQL, Virtuoso and so forth.Some import service functions have been provide in this system,such as integrated browseing and discovery of domain knowledge, dynamic facet navigation andsearching, SPARQL query endpoint, HTTP URI dereferencing, downloading RDF triples.In conclusion, this study proves that it is the best practice to applying the ideas, principles andmethodologies of linked data, to describe, organize and merge the huge amount of agriculturalinformation resources in a more fine described, formally structured and semantically linked way.Linkeddata would play a great role to increase the popularity, visibility, accessibility and value of agriculturalinformation resources.This research would be meaningful to explore new ways and patterns to organize,integrate and make good use of huge agricultural information resources, and promote the innovation ofagricultural science and technology. This study would be innovative at the aspects of construction ofmultidimensional semantic linking model and the prototype system completely based on the linked data.

  • 【分类号】S-0;TP391.1
  • 【被引频次】6
  • 【下载频次】750
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络