节点文献

语言节奏提取及其在文本分析中的应用

Extracting Language Rhythm and Applying It in Text Analysis

【作者】 陈钒

【导师】 冯志勇;

【作者基本信息】 天津大学 , 计算机应用技术, 2011, 博士

【摘要】 语言节奏是语言中广泛存在的一个重要特征,在语音识别、文学审美等诸多领域有广泛的研究与应用。语言节奏是一个复杂且综合的概念,每一个研究领域研究重点不同,对语言节奏的定义与分析方法各不相同,但目前还未有就语言中节奏信息具体量化的定义与分析。在此,本文综合多学科研究成果,通过对语言进行多方面分析与论证,提出了一套定义语言节奏和构建语言节奏的方法,并对生成的语言节奏进行特征提取与分析,在文本分析领域中得到了良好的应用。围绕语言节奏内容与应用的研究,本文主要进行了以下几个方面的工作:(1)通过对语言本质和本原进行研究与分析,完成了多层次语言节奏的定义。语言是综合而复杂的,在语言表达中涉及到生理特征、语法现象、情感内涵、逻辑关系等等诸多方面的内涵。虽然语言很复杂,但其是有规则的:语言中蕴含的这些丰富内容都具有一定节奏性特征,即可以从语言中节奏性表现出的特征,完成对语言中蕴含的更深层次内容分析。因此,语言节奏是语言的一个重要特征。本文将语言中存在的节奏划分为:自然节奏、语法节奏、逻辑节奏和情感节奏。针对四种节奏表现出的不同节奏特性,完成了这四种节奏的具体定义,且根据每一种节奏在语言中表现出的特性,完成了对其节奏标记的寻找与定义。深入的分析了每一种语言节奏的特征,对其存在的性质进行了具体的研究分析与论证。(2)完成了语言节奏提取方法的具体分析与设计。通过对语言节奏内涵和性质的具体分析与定义,根据语言中具有的节奏性特征,完成了语言节奏提取方法的具体研究分析与设计。针对语言中存在的自然节奏、语法节奏、逻辑节奏以及情感节奏各自特征,结合其在文本中存在的节奏标记,字距离等内容,完成了语言节奏单元、节奏序列等概念的具体定义,及各种语言节奏提取方法的描述与定义。且将各种语言节奏提取步骤进行了详细的阐述,同时对构建完成的语言节奏具有的性质进行了具体的分析。(3)完成了对语言节奏中存在的特征提取方法的研究与设计。根据语言节奏的中的节奏性特征,选取了两种不同的方法对语言节奏的特征进行提取。一是,通过构建语言节奏状态转移矩阵完成节奏特征提取,即随着文章的展开,语言节奏是不断发展变化的,也就是说语言节奏可以看成是在不同的状态之间不断转换的,通过构建语言节奏的状态转移矩阵可以完成对语言节奏状态变化之间存在的特征进行捕捉。二是,根据语言节奏中各节奏单元之间存在着邻接关系,提出了应用语言节奏网络完成对语言节奏特征的提取。完成了对语言节奏网络的定义与构建方法的具体描述。(4)将语言节奏特征分析的方法应用于不同的文本分析任务中,对语言节奏特征在文本分析中的有效性进行了实验验证。针对文本分析任务中文本分类、作者判别、作品文风判别、作者同一性判别以及话题判别任务,采用贝叶斯分类方法和K均值聚类方法,对实验中文档的语言节奏特征进行具体分析,实验结果良好。实验验证,通过对语言节奏的特征分析能够很好的解决文本分析领域中多种任务。(5)通过对语言节奏网络中存在的特性进行分析,完成了对语言中一些本质现象的探讨。通过对实验语料语言节奏网络的分析,得出语言节奏的网络是具有“小世界”特性的网络。通过对名著中语言节奏复杂网络特性分析,得出其具有平均距离短、聚类系数高显著的复杂网络特征,且具有平均距离聚类系数积值大的现象,完成了应用复杂网络分析方法找出名著中具有的相对显著特征。从对名家作品的语言节奏复杂网络分析,发现其同样具有平均距离短、聚类系数高,且平均距离聚类系数积值高的特征,从而从复杂网络分析的角度上完成了对作者语言掌控能力的分析。

【Abstract】 Language rhythm is an important feature of language, which existing wildly andapplying in speech recognition and literary aesthetics and etc. It is a complex andcomposite concept, then there is not a unified define that can be recognizedextensively. Each Research field has its own view about it, so it is difficult to studyLanguage rhythm quantitatively. Thus, abstracting all the research successions, it issolved that how to define and build the language rhythm in this paper. Furthermore,extracting its character for analysis achieve very well and it prove that languagerhythm character is appropriate in text analysis. The main contents of this dissertationare as follows:(1)Researching the essence and nature in language, the definition of languagerhythm is accomplished in hierarchy. Language is very complex, it involves nature,grammar, logic, emotion and etc connotation, but there is a rule in it: rhythm. Rhythmis an outstanding characteristic for language. Then the language rhythm is portionedfor four: nature rhythm, grammar rhythm, logic rhythm and emotion rhythm. Andeach of them has been defined in this paper and their properties have been discussedfully.(2)Analyzing and designing the efficient method of extracting language rhythm.By researching the intention of language the rhythm in language is appeared.Thinking of their own character, the methods for extract the nature rhythm, grammarrhythm, logic rhythm and emotion rhythm are described here. And the languagerhythm unit, rhythm array and etc are defined too.(3)Finding how to extract the character of language rhythm. Two methods promptin this paper. One is to build the state transition matrix of language rhythm and theother is to create the language rhythm network. Because language is show in turn, andit can be describe transferring from one state to another one. Then the state transitionmatrix can denoted the language rhythm. For each language rhythm unit adjacent withanother one, then there is a kind of relation between neighbor, and the languagerhythm network is created by the node: rhythm unit and the edge that exited in twoneighbor nodes.(4)Applying the language rhythm feature in text analysis. Some tests are designedfor proving. The tasks, such as text classification, author distinguishes, styleidentifying, and topic finding, are accomplished successfully by using languagerhythm feature. Bayesian classifier and k-means are used in analyzing language rhythm feature. It is proved that language rhythm feature is suit for text analysis.(5)Analyzing the feature in language rhythm network, some nature of languagehas been discussed. By analyzing the language rhythm network, a truth is appearedthat it is a complex network with small shortest average distance, high clusteringcoefficient and scale-free. Studying the language rhythm of Masterpiece, find that itsnetwork has the salient features of "small-world "network, and its shortest averagedistance and clustering coefficient product is high markedly. And tested the same inthe work of phenomenon, the same happened. It is concluded that the ability forcontrol language can analyses by language rhythm network.

  • 【网络出版投稿人】 天津大学
  • 【网络出版年期】2014年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络