节点文献

基于音节网格的汉语语音文档检索方法研究

Research on Syllable Lattice Based Chinese Spoken Document Retrieval Method

【作者】 郑铁然

【导师】 韩纪庆;

【作者基本信息】 哈尔滨工业大学 , 计算机应用技术, 2008, 博士

【摘要】 随着计算机技术和多媒体技术的发展,被人们记录并保存在计算机中的语音数据越来越多。为了更高效地访问、管理和利用这些语音资源,必须实现基于语义内容的语音文档检索技术。所谓语音文档检索是指,根据用户输入的查询请求,在语音资源中搜索和返回与之相关联的语音段或语音文档的处理过程。语音文档检索技术与语音识别技术紧密相关,它总是利用语音识别技术为资源库建立语义层级的索引。然而,语音识别结果中普遍存在的较高的错误率和对词表外词的误识等问题也直接困扰着检索性能,研究者不得不求助于子词网格(Lattice)形式的语音识别结果,通过子词规避词表外词问题,通过Lattice这种多候选形式向检索者提供更准确的索引内容。在汉语语音文档检索研究中,采用基于音节Lattice的检索技术,业已成为了研究者们的共识。语音文档检索是一个未成熟的且极具潜力的研究领域,还存在很多问题需要解决。其中的一个核心问题就在于,Lattice并不是一个易于索引的数据形式,它的有向图结构,以及正确信息与错误信息相混杂的特点,不但直接导致了传统的检索方法性能不佳,而且也需要较大的存储开销和搜索时间。因而,研究适合音节Lattice特点的,且能够同时兼顾检索精度、索引尺寸、检索速度三方面性能指标要求的汉语语音文档检索方法,就有着非常重要的理论意义和实用价值。本文针对音节Lattice的特点,首先研究了三种实现机理不同、性能各有侧重的汉语语音文档检索方法,然后针对Lattice识别结果的错误率下界制约检索精度进一步提高的问题,研究了两种能够改善Lattice错误率下界的有效方法。论文的具体研究内容如下:1)提出了依赖词检出实现的语音文档检索方法,直接保存音节Lattice作为索引,并采用词检出技术来实现检索任务。提出了置信测度和发生频次相结合的相关度计算方法,提出了将传统的词检出技术拆分为离线和在线两个阶段的分解方案,从而提高了在线阶段的检索速度。该方法取得较好的检索精度,其值相当接近于在Lattice的最优候选上所得到的检索精度,但由于必须存储和搜索Lattice索引,因而索引尺寸和检索速度指标都还需要进一步的提升。针对Lattice索引尺寸较大,冗余较多的现象,提出了基于音节后验概率直方图的Lattice有效成分分析方法,研究了保留有效成分去除冗余成分的索引去冗余方法。实验结果表明,该方法能够以检索精度小幅度的下降为代价,大规模的去除索引中的冗余信息。2)提出了基于音节倒排索引的语音文档检索方法,利用倒排索引形式的特点,在保留音节Lattice主要内容的前提下,有效缩减索引尺寸。研究了通过放松匹配过程中的路径约束条件来提高检索精度的匹配机制,提出了两种有效的匹配机制:时间匹配机制和位置匹配机制。在采用位置匹配机制的检索方法中,将音节Lattice解释为具有特定位置标号的若干竞争集的级联,给出了相应的搜索匹配方法,以及匹配路径处于特定位置的后验概率值的计算方法。研究了根据音节候选在其竞争集中的名次来修正文档相关度的加权方法。实验结果表明,两种匹配机制都使检索精度有小幅度的提升,其中位置匹配机制提升更明显,且名次加权方法又进一步提高了该检索精度。提出了能够灵活控制检索速度的基于后验概率门限的剪枝方法。3)提出基于邻接音节后验概率矩阵的语音文档检索方法,旨在通过建立文档层级的索引,大规模地提升索引尺寸和检索速度指标,为实现面向大规模语音资源库的检索系统创造条件。提出了K步邻接音节对的概念,以刻画索引中音节间长距离的关联性,利用Lattice的邻接后验概率矩阵来表示Lattice的内容,进而综合各Lattice的邻接矩阵,计算邻接音节对在语音文档中的后验概率值,存储语音文档的邻接音节后验概率矩阵作为文档级索引。实验结果表明,虽然检索精度有5%左右的下降,但索引尺寸和检索速度指标都基本达到了文本检索技术的水平。研究了利用语音中韵律信息来修正文档相关度的方法,初步尝试了三种韵律加权方法。其中能量加权方法最有效,检索精度提升了约2.7%。4)分析了制约检索精度的根本原因。提出了两种基于更低Lattice错误率下界的检索精度提高方法:一种是基于扩充Lattice的方法,另一种是基于词片语言模型的方法。前者在语音识别技术的框架之外,通过建立识别结果和识别错误之间关联关系的统计模型,并基于Dempster-Shafe证据理论,估计特定音节被识别器遗漏的概率,研究了扩充Lattice的生成方法。实验结果表明,扩充Lattice相比于原始Lattice,错误率下界下降了1.7%,检索精度提高了约4%。后者在语音识别框架内部,通过引入词片基元来改善语音识别结果的准确性,讨论了词片的概念,研究了基于最大互信息准则的词片自动选择算法,通过实验证明了引入词片有助于改善语音识别系统的识别率和检索系统的检索精度。

【Abstract】 With the development of information technology and multimedia technology, more and more speech data are avaliable worldwide via the internet. For the rapidly growing need to efficiently organize and analyze those data, context based spoken document retrieval technology is a key issue. The task of spoken document retrieval (SDR) can be described like this: according to the queries given by a user, all the files or pieces including relevant speech contexts are found and listed from a large collection of multimedia documents. In spoken document retrieval, speech recognition is always adopted to index documents, however, its high error rate and missing out of vocabulary (OOV) words in recognition results also limit retrieval performance. Thus, subword lattice based retrieval methods are investigated to avoid the problem of OOV words and compensate retrieval performance loss resulted by recognition error. For Chinese, syllable lattice based retrieval technology is widely used by researchers.A key problem of syllable lattice based approach is that lattice is difficult to be indexed. Its directed graph structure and mixed contents consist of correct candidates and wrong candidates, not only result in very low retrieval accuary for traditional retrieval methods, but also need much more index space and searching time. Thus, the retrieval methods, which are suitable for syllable lattice and have balance performance for retrieval accuracy, indices size and retrieval speed, will be valuable and important research work.Three Chinese spoken document retrieval methods with different indexing and searching technology are firstly proposed in this thesis, in order to develop different performance bias. Then considering that retrieval performance is also restricted by error rate low-bound of lattice, two accuracy improvement methods based on lower error rate bound are studied. Concretely speaking, this thesis is arranged as follows:1)Word Spotting based retrieval method is proposed, in which syllable lattice is directly stored as indices, word spotting algorithm is separated to an online part and an offline part to implement retrieval tasks, and word frequence and word confidence score are combined in similarity measure. Though higher accuracy is acquired, which is even closed to the retrieval accuracy on the best alternatives of lattice, but indices size and retrieval speed are not good enough to afford the retrieval tasks of large collection. A removing redundancy method is also proposed, which can distinguish useful information from redundant information by a syllable posterior probability histogram and then remove redundancy from lattice indices. Experiment shows that smaller indices size and faster searching time are acquired by using the removing redundancy method.2)Syllable inverted index based retrieval methods are proposed, in which indices size can be effciently reduced. In order to improve accuracy, two matching methods that can relax path limitation in searching stage are investigated: time based matching method and position based matching method. In position based matching method, syllable lattice is explained as a sequence of some competition sets and then position specific posterior probability is calculated for all candidates. According to rank lists in the competition sets, a similarity weighting method is studied. Experiment shows that two matching methods both improve accuracy a little, in which position based matching method is better and rank weighting can improve accuracy more. A posterior probability based prunning method is also present to speed the retrieval process.3)In order to build indices in document level , a neighbor syllable posterior probability matrix based retrieval method is proposed, which can improve index size and retrieval speed substantially so as to meet the need of the SDR tasks with large-scale corpus. K step neighbor syllable pairs is introduced to represent long distance correlation and neighbor posterior probability matrix is adopted to represent the contents of lattices. Posterior probability of neighbor syllable pairs in documents is calculated and a neighbor syllable posterior probability matrix built in document level is taken as document index. Experiment shows that though accuracy fall 5%, its peformance of index size and retrieval speed is comparable to text retrieval approach. Prosody is adopted to weight similarity measure. Three prosodic weighting methods are investigate, in which energe based weighting method get the best result, 2.7% of accuracy is improved.4)The limitation of accuracy improvement is explored and two accuracy improvement methods based on lower lattice error rate bound are proposed, one is based on extended lattice, the other is based on word fragment language model. Extended lattice based approach improve lattice error rate by estimating the probability of the syllables lost by recognizer, by which lattice error rate falls 1.7% and 4% of accuracy is improved. Word fragment based approach improve lattice error rate by introducing higer semantic level unit to speech recognizer.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络