节点文献

基于语义的中医药数据采集工程及应用平台

Semantic Based TCM Data Acquisition Ngineering and Application Platform

【作者】 陶金火

【导师】 陈华钧; 姜晓红;

【作者基本信息】 浙江大学 , 计算机应用技术, 2011, 硕士

【摘要】 积累了两千多年的中医药数据文献是一个价值连城的知识宝库。将中医药数据结构化的收录到信息系统中,对中医药数据的分析,处理,利用有着至关重要的作用。十多年来CCNT实验室网格组与中国中医药科学院合作对中医药文献的结构化建模及数据采集方面做了大量工作,建立了一套中医药语义本体和多个中医药专题数据采集系统。尽管如此,中医药文献数据采集还有许多亟待改进的地方,比如数据采集系统数量太多,彼此之间相互孤立,无法相互连接访问,组件重用性较低,可维护性差,数据采集智能化程度偏低等。针对这些问题,本文提出了一种采用语义本体配置元数据对中医药数据模型和存储逻辑进行配置的方法,该方法实现了对数据模型的语义描述,并且描述了如何将数据存储到存储逻辑中。另外本文还提出了一种语义关系图标注算法,用以辅助数据采集。该算法以语义本体知识库为基础,对中医药文献进行关键词抽取,高频关键词计算,及关键词之间1语义关系的识别和预测,.得到语义关系图,实现数据采集的半自动化。最后本文设计实现了一体化中医药数据采集平台,以语义配置信息为系统配置元数据,将不同专题的数据集成到一个统一的平台中采集。一体化中医药数据采集平台以语义本体对中医药数据模型的描述为基础,实现数据采集的高度可配置性,解决了中医药文献数据模型繁多的问题。一体化中医药数据采集平台支持基于语义关系图标注的半自动化的文献加工的。一体化中医药数据采集平台是一个坚持面向实际应用的语义数据网格系统。目前平台已经投入实际使用,提高了数据采集的效率,大幅降低运维成本。

【Abstract】 With a more than 2000 years research and application, data of Chinese traditional medicine has come to a huge number. The structure of TCM data in information system has a vital role with the puropse of data analysis, handling, use. Cooperated with china academy of Chinese medical science over decade, CCNT laboratory has done a great contribution to information of Chinese traditional medicine, and has established a set of TCM semantic ontology several TCM information system also has been established in the last decade。However, there are many place which need to be improved, such as many separated TCM information system, bad interconnection, pool maintenance, low auto-collection and so on.In order to solve these problems, improving TCM data collection efficiency and reduce the cost, this paper puts forward a method based on the semantic configuration metadata and the storage medium, and based on this, integrate different data model into a unified platform for data management. According to this method, we designed and developed Chinese traditional medicine co-construction platform, which used to manage data of different model.The platform by using semantic ontology to describe the data model of TCM, realize highly configurability of data acquisition, solve the problem that TCM data model is various. Additional, the platform support semi-automatic literature processing based on semantic relation annotation. The platform is a insists on providing practical application of semantic grid system. At present it has been put into practical, and achieved good effect.This paper also puts forward a semantic relationship graph annotation method, to facilitate data acquisition based on semantic ontology knowledge base, this method extract keywords, calculate high-frequency keywords, and identify and predict semantic relation between keywords to realize the semi-automatic data acquisition.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2011年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络