节点文献

汉语口语对话系统中口语语言分析

Spoken Language Analysis in Chinese Spoken Dialogue System

【作者】 张琳

【导师】 陆汝占;

【作者基本信息】 上海交通大学 , 计算机软件与理论, 2008, 博士

【摘要】 在当今信息时代,人机口语对话系统有着广泛的应用需求。口语语言中包含有大量的省略、停顿、重复、自我纠错以及不合语法的现象等,因此在汉语人机口语对话中,口语语言的分析理解是人机对话系统实现的关键和难点。现在的口语对话系统语言的分析处理方法大多采用模板匹配处理方法,然而口语语言的灵活性使得模板数量过于庞大,而且导致系统准确性不高。本文着重研究汉语口语对话系统中的口语语言分析,尝试用内涵概念分析的思想,把对语言的分析上升到概念层次上进行分析来解决该问题。本文研究工作背景来源于上海市科委科技项目限定领域口语对话系统——交通领域口语对话系统SHJTQ, SHJTQ提供上海市任意两个地点之间、在不同的交通方式(步行、自行车、出租车、公交车)下的交通路线信息的查询。目前限定领域的口语语言分析理解的方法大致可以分为两类:概率统计方法和规则分析方法。概率统计分析技术主要基于语言结构的统计特性,缺乏智能性和可靠性。规则分析方法又分为逻辑分析方法和概念分析方法两类。逻辑分析方法以蒙太古的语义学为代表,用模型论来表示片断英语的语义,但要处理真实文本、全面地解释汉语语义,仍然感到逻辑分析的局限性。概念分析是后继的逻辑学家维特根斯坦、奥斯汀、塞尔这些哲学家提出的,语言哲学家以及心理哲学家关心的是有关心理、感觉、情感一类词汇的概念分析,但不注重对于指称实体的词类及其概念的研究。现在的口语对话系统都是在应用层面进行语言的分析处理,提出的解决方法大多采用字串匹配的方法或是在字串匹配的基础上加入一些处理方法。这种分析方法最大的缺点就是字串不同或字串次序变化,分析就会失败。因此无法解释灵活多变的口语语言。本文提出了内涵概念分析的思想,把对语言的分析上升到概念层次上进行分析,口语语言虽然灵活多变,但表达的概念是一样的,从而解决了模板匹配解决不了的问题。从实现角度讲,除音调外一字符串(如短语、句)的语音要用1K数据存储空间,对通常对话的语音处理将占用超大量存储空间。如果改为一汉字一模板,两千常用汉字的语音信息共2K*1K数据,利用汉语是字组合表达概念直接耦合的优点,将字语音模板作为单位,字组合即为语音模板信息组合,可大大减少语音数据,为语音对话使用开创了可行前景。可不限用户使用规定的表达格式、模板可自由表达。但由此带来语言处理的复杂性和重要性。本文充分利用汉语概念内涵模型思想,实现了专用领域内对话词语的概念分析,并获成功。本文研究了SHJTQ中词汇(主要是交通工具类词汇)的内涵特征,提出了名词具有“定义特征”和“情景区分特征”两个概念。在不同语境下,词汇的凸现特征(情景区分特征)有所不同。提出用一种“E—A-V”(实体-属性-值)的方法表征名词的概念。本文研究了SHJTQ中用户问句,针对用户问句多为疑问句的特点,借助言语行为理论的思想,对SHJTQ系统用户查询问句进行了言语行为分类。分析研究了SHJTQ口语语句的内涵概念,根据用户查询问题的分类,逐个将各类用户查询例句了进行概念分析,解决了字串匹配不能解决的口语语言中的各种变异现象,为汉语口语语言的理解研究提供新的思路。本文介绍了限定领域对话系统口语语言的概念分析方法在SHJTQ中的具体应用。着重分析了SHJTQ语言分析模块的设计等,同时本文给出了系统的测试结果和分析。本文研究的创新点主要在如下几方面:1.用概念分析的方法分析汉语口语语言,有别于传统的应用层面上的字串匹配分析方法。从概念层面分析解释了SHJTQ中口语语言,解释了口语语言在形式上灵活多变,但表达的是同一概念。另外采用概念分析方法,汉语和其他语言(如英语等)在体态(形态、时态等)方面的差别就会退化,有助于实现多语种的口语对话。第三,在具体实现方面,有了口语语言的概念分析,语音识别需要的模板量就可以大大减少,可以推进口语对话系统的发展。2.采用“E—A-V”(实体-属性-值)的表示概念语义模型,表征了名词的多义性。本文采用的是陆汝占先生的内涵逻辑分析即概念分析的基本思想,即对一个词语所表示的概念进行分解,求解出上位概念、下位区分概念、定义属性特征以及扩展特征;解释了词语、指称实体、概念三者关系。认为名词是指称实体的词语项,实体包括物理实体和抽象实体两类。名词具有内涵性质即语义特征,提出了“定义特征”和“情景区分特征”两个概念。分析了SHJTQ中词汇(主要是交通工具类名词词汇)的内涵特征,将内涵特征理论引入到汉语语言研究,通过内涵特征来解释纷纭复杂的汉语语义是一个新尝试。3.用内涵概念分析的方法研究了SHJTQ口语语句,根据用户查询问题的言语行为分类,逐个将各类用户查询例句进行概念分析,分析了简单完备用户表述语句、带变异的表述、不完备表述语句,把用户不规范的表达转变成概念层次上规范的查询表达,从而解决了字串匹配无法解决的口语语言灵活表达的问题。实现了概念分析指导下的限定领域口语对话系统。经测试,系统准确性较高。

【Abstract】 Nowadays, man-machine spoken dialogue system (SDS) is an active research field with wide application demand. There exist lots of non-written languge phenomena in spoken language, such as ellipsis, pause, parenthesis, repeat and re-start. Most of spoken language sentences are grammatically incorrect or ill-formed. So in Chinese SDS, the key and difficult problem is how to understand spoken language. Template matching processing is a popular method to do this in currunt Chinese SDSs. But the flexibility of spoken language makes the template amount very huge, so it influences system’s accuracy. In this thesis, we focus on understanding spoken language in Chinese SDS, attempts to analyze the spoken language phenomena in conceptual level. The background of this research work is a Chinese SDS――SHJTQ (shanghai Jiaotong Query System), which can provide information about the best route between any two sites in Shanghai.There are two methods for handling spoken language: statistic method and rule-based analytical method. Statistics analysis does it mainly according to the statistic characteristics of the language structure, regardless of semantic features. So this method lacks of intelligence and reliability. Rule-based method can be divided into two types: logic analysis and concept analysis. Montague semantic is the representative method of logic analysis method. Using model theory, it successfully gives the meaning of fragment English. But it fails in handling true text, especially in explaining Chinese. Concept analysis is put forward by philosophers such as Wittgenstein, Austin and Searle. What the language philosophers and mental philosophers concern is analyzing concept of the word in mental state, felling, emotion. But they seldom pay attention to analyze the concept of the word of reference entity. Most of currunt SDSs analyze spoken language in the applied level, string matching method or combined with some other processing methods is adopted. The severe disadvantage of these kinds of methods is when the string changes or the order of string changes, the analysis will fail. So it can not deal with the flexibility of spoken language. In this thesis, we put forward the thought of intension concept analysis, and analyze the spoken language in the upper concept level. So we explained why such different character strings (expression type) express the same concept.From the aspect of realizing, to store the pronunciation of a string (for example phrase, sentence), we needs 1K data storage space (ignored tonality information). That is to say, ultra massive storage space is required to process the pronunciation of usual dialog. If changes the method, a Chinese character correspond to a template, the pronunciation information of 2000 common character altogether will take limited 2K*1K space. This is due to Chinese has the characteristic that the character combination express concept. So this method can reduce the pronunciation data greatly. But it brings a new question, that is, the language processing become more complex and important. In thhis thesis, using the thought of Chinese intension concept model, we realized conceptual analysis of words in domain-specific SDS, and attained successfully.The intention characteristic of the word in SHJTQ domain (mainly the word belongs to vehicles) is analyzed in this thesis. We purposed a noun has 2 concept characteristic: "definition characteristic" and "the situation distinction characteristic". The appearance characteristic of a word (the situation distinction characteristic) varies in different situation. We proposed a kind of E-A-V (entity-attribute-value) method to represented noun’s concept. In our domain-specific SDS– SHJTQ, most of user’s spoken language are interrogative sentences. After combining the thought of speech act theory, we reclassified the users’query sentences in SHJTQ, and this directly helped our spoken language analysis. We analyzed the intention concept of spoken sentences in SHJTQ. According to the classified user’s query questions, we analyzed the truly user’s query sentences in the upper concept level one by one. And we also analyzed several variant phenomena in spoken language. It’s a new way of thinking to understand Chinese spoken language.Concept analysis of spoken language in domain-specific spoken dialogue system is applied into SHJTQ. We propose the whole design of the system. Sub-modules of the language-understanding module, such as POS tagging, robust parsing and concept analysis are especially discribed. System performance is tested and analyzed.The novelties of this paper reside in the following aspects.1. Proposed a method called conceptual analysis to analyze Chinese spoken language, which is different with traditional string match analysis method in the application level. Analyzed spoken language in SHJTQ from the aspect of conceptual analysis, explained the question that spoken language is formally flexible, but expresses the same meaning. The other advantage of concept analysis is it helps to establish multi-language SDS. For example, Chinese language has no tense, single plural variety, but other language (like English) contains the variety of the appearance, tense...etc., so if we analyze it in the upper concept level, these appearance phenomena will become weak. The third, in realizing SDS, the template amount which the speech recognition is required has been possible to reduce greatly by using this method. It might give an impulse to the development of spoken language dialog system.2. Proposed using the E-A-V (Entity– Attribute– Value) model to represent polysemy of noun modification. We adopted the thought of intension logic analysis (put forward by Prof. Lu Ru-zhan), decomposed the concept that a word expressed into upper concept, lower concept, defined attribute and expanded features. So, we can explain the relation of word, denotation entity and concept. Our research work also indicates a noun is a phrase item of a denotation entity, it owns two concept characteristic: "definition characteristic" and "the situation distinction characteristic". Next, we analyzed the intention concept of spoken sentences in SHJTQ. It’s a new way for understanding Chinese spoken language.3. By using intensional concept method, we analyzed the truly user’s query sentences in the upper concept level. And we also analyzed several variant phenomena in spoken language. Our solution for general language phenomena such as ellipsis, reference and negation is also described. The system performance is tested and analyzed.

  • 【分类号】TN912.3
  • 【下载频次】433
节点文献中: 

本文链接的文献网络图示:

本文的引文网络