节点文献

面向地质勘查的多源异构数据集成关键技术研究

Research on the Key Technology of Multi-source Heterogeneous Data Integration for Geological Exploration

【作者】 李鹏

【导师】 谢忠;

【作者基本信息】 中国地质大学 , 地图制图学与地理信息工程, 2013, 博士

【摘要】 随着国家《关于进一步加强地质勘查行业服务与管理的若干意见》、《关于促进国有地勘单位改革发展的指导意见》、《找矿突破战略行动纲要(2011-2020年)》等相关政策文件的相继出台,我国地质勘查单位(企业)在市场浪潮中正经历着一次深刻的转型变革,彼此之间的竞争环境已由封闭环境下的竞争,转变为开放环境下的竞争,是一个在国内外开放环境下由外企、民企、国企等不同性质、不同规模的企业参与的大竞争,竞争比以往更为激烈。这对地质勘查单位来说,既是巨大挑战,也是发展的战略机遇。地质勘查单位要想在激烈的竞争中保持优势,信息化手段的辅助必不可少。通过信息系统的建设,地质勘查单位可以加快信息的流转、提高生产力、辅助管理决策;在日常工作中,有信息系统的支持,可以有效减少人为的失误,提高业务流程的正确性和数据的精确性;在市场竞争中,借助信息系统可以分析历史数据、收集现有信息、挖掘商机,提高企业的市场反应能力和竞争力。目前,各个地质勘查单位在长期的地质勘查工作中积累了大量宝贵的地质勘查数据,现在进入转型时期,每年接受地质勘查项目的数据更多,例如:项目数据、地质资料数据、槽井坑钻数据、物化遥数据、过程监管数据等等。这些数据大多零散地存放于各个地方,多年下来,积累的数据量十分巨大,给管理和利用带来不少麻烦。同时,随着地质勘查单位信息化的不断深入发展,企业所使用的管理系统越来越多。由于部分企业中的相关人员对信息系统的搭建和实施缺少系统全面的认识和把握,使信息化建设缺乏整体一致性和系统协调性,系统之间相对独立,在信息需要共享时无法进行高效的互操作。在地质勘查数据集成过程中,一般还面临如下问题:(1)数据有较复杂的表达方式存在。地质勘查数据来源丰富,获取手段多样,各类数据有明显的异构性,特别是存在多语义性;由于地质勘查项目实施周期长,数据也具有明显的多时空性、多尺度性。(2)数据交换方式复杂。随着地质勘查单位的深入改革、市场竞争的激烈,各个单位之间的合作关系越来越紧密,数据交换的范围也在逐步扩大,涉及到的交换方式也越来越多样化,比如离线交互手段、广域网互联互通等方式。(3)数据升级更新同步问题。由于地质勘查单位存在的系统繁多,每个系统软件在一定时期内都存在着更新升级的问题,如果在升级过程中,对数据结构进行了修改,那么必然会导致基于原数据结构开发的交换系统发生故障。(4)非结构化数据的有效利用问题。由于在勘探工作中,会产生大量的图片、音视频、文档、图件数据,这类数据自成一体,很难用结构化的信息系统对其进行有效管理,然而,这些数据又包含了丰富的信息,对矿产评价、项目管理、成果管理等都具有重要意义。从理论角度看,目前针对地质勘查行业数据集成方面的研究并不多见,至今没有形成一个统一的系统建设理论架构,特别是针对我国特定的经济环境、管理制度,如何利用数据更好的为地质勘查行业服务,理论与现实还存在较严重的脱节。当前地质勘查行业专业软件应用偏多,缺少覆盖地质勘查项目主要业务工作的数据集成系统的研究。面对以上问题,本文通过开展对地质勘查行业多源异构数据集成技术的研究,在全面覆盖地质勘查主流程业务的基础上,构建了基于数据中心的而向服务架构的数据集成理论,为地质勘查行业数据集成研究做了有益探索,对企事业单位多源异构地质勘查数据集成管理、挖掘分析应用等方面具有重要的指导意义。通过本研究,可以整合地质勘查单位的各类数据,实行一体化存储、管理与利用,提供各个系统之间数据的共享与交互,为地质勘查业务管理、辅助决策分析提供数据支撑,为地质勘查业单位制定发展战略、提高管理效率、防范项目风险都具有深远的现实意义。具体的研究工作如下:(1)回顾并分析了多源异构数据集成技术以及地质勘查行业数据集成技术的研究现状。从地质勘查业务活动模型和结构模型的角度,分析了业务与数据的关联关系,阐明了地质勘查数据的多种来源性。按照数据业务类型和结构类型的划分,对地质勘查数据资源进行分析,阐述了地质勘查数据的系统异构、结构异构、语法异构、语义异构等方面的区别。针对多源异构地质勘查数据,分析并提出了对其进行数据集成所应满足的需求,以便指导设计出符合地质勘查单位工作所需的数据集成系统。(2)研究了多源异构地质勘查数据本体建模方式。为了解决数据集成中语义异构的问题,需要在数据表达阶段定义好数据模型。本文将本体建模方式引入地质勘查数据集成领域,构建了地质勘查领域本体,对地质勘查数据进行形式化概念分析,定义了地质勘查本体GeoExploration-ontology,并对其进行语义分析与形式化描述,建立起了地质勘查信息本体分类框架,并采用OWL i语言进行形式化表达。最后将本体与数据源进行关联,研究了西者的映射关系,为后文数据交换中的语义匹配奠定了基础。(3)研究了多源异构地质勘查数据集成模型。在分析现有的数据集成模式的基础之上,结合面向服务架构理论,提出基于数据中心模式的、适合多源异构地质勘查数据的集成模型,并在此基础上展开阐述基于中间件技术的数据仓库服务和可灵活订制的功能仓库服务:深入分析集成系统的层次结构,对地质勘查数据进行三级信息分类与处理;为增强系统的扩展性与互操作性,制定地质勘查数据共享与互操作机制,提高系统的通用性,为数据集成应用阶段数据的显示、查询、分析等提供底层支持。(4)研究了多源异构地质勘查数据交换技术。首先分析地质勘查数据交换体系,对各个系统中的数据交换需求进行分析,提出基于XML的数据交换模型,并制定适用于地质勘查行业的数据交换中间文件,配合数据检查、映射、清洗、加载等流程,实现数据的交换;然后研究并制定了数据检查规则,提出基于本体模式的数据映射方法,采用语义相似度匹配技术对本体映射关系进行计算,并结合XML Schema和XSLT技术,实现了语义标注与数据转换生成;最后对数据清洗与加载进行了研究,提出并制定了相应的规则。(5)研究了多源异构地质勘查数据集成可视化表达的技术。在结合Flex技术在多媒体、在线地图、在线文档等方面的集成展现优势,将超媒体数据模型引入地质勘查领域进行研究;针对行业应用中地质勘查数据丰富,难于表现的问题,结合地理超媒体数据模型,提出地质勘查超媒体数据模型,并在此模型的基础上,综合Flex富表现技术,对地质勘查数据的集成展现进行了研究。(6)原型实现与在有色地调中心的应用。在基于已有研究成果的基础上,搭建地质勘查多源异构数据集成原型系统,扩展并增强系统的数据集成应用功能,融入了门户系统、办公系统等,并以建设“有色地调中心地勘业务工作平台”为应用案例,实现了对前文研究成果的综合应用,验证了理论与技术的可行性,体现了面向地质勘查的多源异构数据集成技术的有效性、可扩展性和实用性。本文的创新点在于:(1)将本体建模方式引入地质勘查数据集成领域,构建了地质勘查领域本体,定义了地质勘查本体GeoExploration-ontology,并对其进行了形式化概念分析和语义分析,建立了本体与数据源的映射关系。(2)提出了基于数据中心模式的多源异构地质勘查数据集成模型,分析了该模型的组成和特点,并对其进行层次化数据分析,提出了数据共享和互操作机制,增强了系统的适用性与可扩展性。(3)在多源异构地质勘查数据交换技术研究中,提出了地质勘查数据交换中间文件格式,制定了数据检查规则,构建了基于本体的数据映射技术。(4)提出了地质勘查超媒体数据模型,并结合Flex富表现技术,实现了地质勘查数据可视化的综合表达。

【Abstract】 With the introduction of relevant national policy "Several Opinions on Further Strengthening Service and Management in Geological Exploration Industry","Guidance on Promoting Reform and Development of State-owned Geological Exploration Units","Prospecting Breakthrough Strategy for the Platform for Action (2011-2020)", geological exploration units (enterprises) in the market wave are undergoing a profound transformational change, the competitive environment has been changed from the closed competitive environment, to the open environment of competition. The competition is intense more than ever as foreign invested enterprises, private enterprises, mining companies and other enterprises of different nature and scale can participate in the open environment of both abroad and at home. As to geological exploration units, this is a huge challenge and a strategic opportunity of development as well.In order to maintain the favorable position in fierce competition, the assist of informatization is a necessity for geological exploration units. Through informatization, the geological exploration units can greatly accelerate the speed of information processing, increase productivity, and assist managerial decision-making. In daily work, with the supporting of information systems, mistakes probably made by men can be reduced, and the accuracy of work flow as well as the data calculated can be improved. Moreover, in the market competition, we can also benefit from the information systems in terms of analyzing historical data, collecting existing data, seeking new business opportunities, and increasing the enterprise’s market responsiveness and competitiveness.Currently, every geological exploration unit has accumulated a lot of valuable geological data during their long-term geological exploration work. Since the transforming stage is coming, the annual data of geological projects are increasing as well, for example:project data, geological data, trenching data, exploratory shaft data, adit prospecting data, drilling data, geophysical prospecting data, chemical prospecting data, remote sensing data, process regulatory data, etc. These data are mostly fragmented stored in various places, and years of accumulating formed the numerous quantities of data. This brings a lot of trouble to the management. With the deepening development of geological exploration unit informatization, more and more management systems are being used in enterprises. The lack of overall understanding and command of the information system structures and implementation, contribute to the lack of overall consistency and system coordination of information construction. Data cannot be interoperated in the relatively independent systems. During the process of geological exploration data integration, there are some general problems:(1) Complexity of data presentation. Sources and the access of data are various, the heterogeneity, as well as the nature of multi-semantic are obviously existed. Also due to the long period of project implementing, data acquired obtains the nature of multi-space-time, and multi-scale.(2) Complexity of data exchange. With the penetration into geological prospecting construction and the increasingly intense competition, the cooperation among geological prospecting units becomes closer and closer. Subsequently, the range of data exchange is gradually expanded, and means of data exchange involved are diversified, such as offline, WAN, and etc.(3) Issue of data update, upgrade, and synchronization. Every system in geological exploration units is faced with update and upgrade problems due to the numerous numbers of systems. If data is modified during the upgrade, the exchange system which is developed dased on the origin data structure would fail to work properly.(4) Issue of non-structural data utilization. Large amounts of pictures, audio, video, documents, maps produced in the field work are difficult to be managed by the structural information system. However, these data are valuable to mineral resources evaluation, project management, and result management.Theoretically, the current research on data integration of geological prospecting industry is rare, the theoretical framework of system construction hasn’t been unified, especially in our economic environment and management systems, how to use the data to better serve geological exploration industry, the theory and reality is seriously disconnected. There are many professional software applications in the geological prospecting industry, but seldom research on the integration system covering the main business of the geological prospecting projects work. Confront these problems, this dissertation researches on multi-source heterogeneous data integration technology in geological exploration industry. Thus builds a data integration theory based on data center service-oriented architecture, comprehensively covers the main flow business of geological exploration. The study has done a useful exploration for the data integration of the geological exploration industry. And has important guiding significance to integration and management of multi-source heterogeneous geological prospecting data and mining analysis applications in enterprise and public institution enterprises.Through this study, geological exploration units can integrate various types of data, implement integrated storage, management, and use of data, provide sharing and interaction between the various systems, provide data support for geological prospecting operations management and decision analysis. It has profound practical significance for formulating development strategies, improving management efficiency, and preventing project risks in geological exploration units.Specific research work as follows:(1) Review and analysis of multi-source heterogeneous data integration technology, and the current research status of data integration technologies for geological exploration industry. Analyze incidence relation of business and data from the perspective of the geological prospecting business activity model and structure model and clarify geological prospecting data from multiple sources. Classify geological prospecting data resources into business type and structural type, and elaborate the distinction among system heterogeneity, structure heterogeneity, syntax heterogeneity, and semantic heterogeneity of geological prospecting data. For multi-source heterogeneous geological prospecting data, analyze and propose the needs of data integration that should be met, in order to guide and design the data integration system that correspond the requirement of geological exploration unit.(2) Research on the ontology modeling of geological exploration of multi-source heterogeneous data. In order to solve the problem of semantic heterogeneity in data integration, to define the data model in data representation stage is needed. This dissertation introduces ontology modeling approach to data integration field of geological prospecting, geological exploration domain ontology is constructed. Formal concept analysis of geological prospecting data, defines the geological prospecting body GeoExploration-ontology, then semantically analyze and formal describe it. Thus establishes a framework for geological prospecting information ontology classification, and use OWL language for formalization. Finally, associate the ontology and data source, studies the mapping relationship between the two, and laid the foundation for the later data exchange semantic matching.(3) Research on multi-source heterogeneous geological prospecting data integration model. On the basis of analyzing the existing data integration model, combined with service-oriented architecture theory, proposed the system architecture which is based on data central model, and suitable for the integration of multi-source heterogeneous geological exploration data. On this basis, commencing middleware technology based data warehousing services and warehouse services that can be flexibly customized. Deeply analyze the hierarchy of the integrated system, and classify and process geological prospecting data into three informational levels. Also, in order to enhance system scalability and interoperability, and tailor geological prospecting data sharing and interoperability mechanisms. Hence supply the underlying support for data display, query, and analysis in the application stage of data integration.(4) Research on the exchange technology of multi-source heterogeneous geological exploration data. First, analyze the geological prospecting data exchange system and the needs of data exchange in each system, propose XML-based data exchange model. Develop data exchange intermediate files suitable for geological exploration industry, with the data checking, mapping, cleaning, loading, etc., achieve data exchange. Then research and develop data checking rules, propose data mapping method based on ontology patterns. Use semantic similarity matching technique to calculate the ontology mapping, and achieved semantic annotation and data conversion generation with XML Schema and XSLT technology. Also, data cleaning and loading were studied, and a corresponding rule is proposed and developed.(5) Research on visual expression technology of geological exploration of multi-source heterogeneous data integration. After analyzing universal applied and strong performed RIA technology, focus on integration of Flex technology which advances in the combination display in multimedia, online maps, online documentation, and introduce the hypermedia data model to the field of geological prospecting to study. As to the problem of variety and performance difficulty of geological prospecting data, combine with the geographic hypermedia data model, propose geological prospecting hypermedia data model. And on the basis of this model, integration display of geological exploration data has been studied with the integration of Flex performance technology.(6) Research on prototype implementation and application in the China Non-ferrous Metal Resource Geological Survey. Build geological prospecting multi-source heterogeneous data integration prototype system based on existing research results. Expand and enhance the application function of the system’s data integration. Add the portal system, office systems, etc. Take the building of "CNMRGS geological survey operations platform" for the application case, to achieve a comprehensive application of previous research achievement. Verify the feasibility of the theory and technology, reflect the effectiveness, scalability and practicality of multi-source heterogeneous data integration technology for geological prospecting.The innovations of this dissertation lie in:(1) Introduce ontology modeling into geological prospecting data integration domain, and construct geological survey ontology. Also, define the geological prospecting ontology GeoExploration-ontology, and carry out a formal concept analysis and semantic analysis, establish the mapping relationship with the data source.(2) Propose data center model based multi-source heterogeneous geological exploration data integration model, the composition and characteristics of which are analyzed. Moreover, with the hierarchical data analysis, propose data sharing and interoperability mechanisms. Enhance the applicability and expandability of the system.(3) In the geological exploration of multi-source heterogeneous data exchange technology research, propose geological exploration data exchange intermediate file format, develop data checking rules, and build ontology-based data mapping technology.(4) Propose hypermedia data model for geological exploration, and combined with Flex rich internet application technology, achieve comprehensive expression of geological prospecting data visualization.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络