节点文献

中文数据库自然语言查询处理研究

The Methodology and Implementation of Chinese Natural Language Query in Databases

【作者】 孟小峰

【导师】 林宗楷; 王珊;

【作者基本信息】 中国科学院研究生院(计算技术研究所) , 计算机应用技术, 1999, 博士

【摘要】 数据库自然语言界面(NLIDB)是指允许用户用自然语言访问数据库的一种方式。它是多学科交叉的产物,涉及自然语言处理,数据库系统,人工智能,人机界面等多方面研究。三十多年来,数据库自然语言界面方面的研究取得了很大进步,但其系统还没有能够广泛地推广应用,其中还有许多技术问题需要进一步研究解决。 本文尝试用一种基于数据库语义的一整套语言处理逻辑来解决NLIDB一些关键问题,在研究思路上作者侧重综合利用各学科的相关知识,以求克服原有的人工智能流派和数据库流派的研究方法的不足。 本文首先给出了NLIDB的形式化定义和分类,用一个通用的抽象模型来界定NLIDB的研究范畴,从理论的高度对NLIDB中的关键问题进行了重新的理解和诠释,确立了中文数据库自然语言界面的两类系统研究,即基于模板的中文查询语言Chiql和基于受限汉语的中文自然语言查询系统NChiql。 本文提出了中文数据库自然语言界面NChiql的系统体系结构,在设计上强调系统良好的可移植性、可用性、可适应性、鲁棒性和智能性。在知识的构成和表述上,提出了语言知识,领域知识和数据库知识融为一体的构想,建立了语义概念模型(SCM)。在知识提取方面,给出了静态和动态双重提取机制。在中文自然语言查询处理上,提出了基于数据库语义的分词方法,通过回溯机制、相关语义确定法、通用消歧规则,可以有效解决分词中的歧义切分、歧义词和未知词等问题。在对句法分析方法经过分析判断后,给出了适合中文自然语言查

【Abstract】 Natural language interfaces to databases (NLIDBs) provide users with a way to access information stored in databases directly in natural language. NLIDBs involve many kinds of subjects, such as AI, NLP, DB, HCI, etc. Over the past thirty years, although there have been signification advances in the area, the NLIDB systems did not gain rapid and wide commercial acceptance for the problems of portability and usability.This thesis attempts to develop a new methodology based on the database semantics to solve the key problems in NLIDBs. I argue that previous approaches to NLIDB are problematic, mainly because they do not pay more attention to benefit from different subjects synthetically.This thesis first presents a formal definition and classification about NLIDB, and then gives a general abstract model involved in NLIDB to outline the research scope and highlight the key and tough points of NLIDB. Based on the above discussion, two kinds of Chinese natural language interfaces are depicted in our project, namely Chiql, a template-based system, and NChiql, a restricted natural language based system.This thesis provides the portable architecture of NChiql, which emphasizes on the portability, usability, adaptability, robust and intelligence. In order to achieve these goals, this thesis presents a semantic conceptual model (SCM) which attempts to integrate the knowledge of language, specific domain and database. Static and dynamic knowledge acquisition mechanism is adopted to construct SCM.Based on the domain concepts and database semantic in SCM, this thesis depicts a word segmentation algorithm, which can handle the lexical ambiguity and unknown words by applying backtracking, related semantic

节点文献中: 

本文链接的文献网络图示:

本文的引文网络