节点文献

光学乐谱识别技术研究与实现

Research and Implement of Optical Music Recognition

【作者】 刘晓翔

【导师】 张树生;

【作者基本信息】 西北工业大学 , 航空宇航制造工程, 2006, 博士

【摘要】 纸质乐谱向数字化乐谱的转换,是人类音乐活动与计算机音乐处理之间进行信息交流的必经之路。光学乐谱识别是将纸质乐谱扫描输入计算机后,对乐谱图像加以处理、识别、分析,最终获得乐谱的计算机数字表达的过程。光学乐谱识别技术突破了纯手工的乐谱数字化瓶颈,为乐谱数字化提供了一条智能、高效、快捷的新途径,具有重要的理论研究意义和应用价值。本文以印刷体多声部五线谱为研究对象,从谱线定位与删除、音符识别、乐谱专用符号识别、乐谱重建与语义解释四个方面,对乐谱识别的关键技术进行了系统深入地研究,在多个环节上提出了行之有效的新思路和新方法。谱线定位与删除是乐谱识别的首要环节。在谱线定位方面,本文提出了基于交叉相关性的乐谱图像变形校正与谱线定位算法,其本质是利用“化整为零、相关计算”的思想,对基于水平投影的谱线定位方法的一种改进。该算法在保持了投影方法计算简单、抗噪声能力强的原有优势的同时,弥补了其对变形敏感的不足,有效解决了现有统计和结构两类谱线定位方法存在的抗变形和抗噪声之间的矛盾。在谱线删除方面,本文针对谱线删除过程中存在的“过删除”问题,提出了基于图段拓扑关系的谱线删除算法。与现有谱线删除方法相比,该算法在提高删除单元表达层次的同时,强调对删除单元周边环境特征的分析和判断,能够更全面、清晰地观察到谱线与非谱线像素的区别,从而明显减少谱线“过删除”现象的发生,保证了乐谱图形符号在谱线删除之后的完整性。音符识别是乐谱识别的核心与关键。本文根据音符的多样性和多态性特点,确定了基于结构的音符识别方案,将其识别过程划分为基元抽取和结构分析两个阶段。在音符基元抽取方面,提出了基于垂直游程编码的粗提取、基于水平游程编码的精检测的符干抽取方法,克服了现有方法对复杂音符适应性差、抽取结果不完整等缺陷;设计了一种先分割、后特征检测的实心符头抽取方法,该方法利用音符先验知识和已有的谱线、符干识别结果对符头进行切割,解决了粘连符头的切分难题;提出一种基于块状体分割和特征检测的尾桥抽取方法,避开了传统的直线抽取方法所无法处理的尾桥粘连问题。在音符结构分析方面,提出了一种基于作用场的音符结构分析方法。该方法将物理学中的作用场概念引入到音符基元的关系表达,实现了知识性、鲁棒性和精确性三者的统一。在此基础上,定义了六个音符子结构,建立了关键结构优先定位的音符结构分析模型,实现了音符基元数据向音符对象的重建。该模型体现了人工识谱时突出重点特征、从整体到细节的思维习惯,不仅减少了分析的复杂度,而且具有较强的基元冗余排错能力。对于音符除外的其它乐谱符号即乐谱专用符号。提出了一种基于几何、中心矩和穿刺三类特征的神经网络识别方法。上述三类特征能够很好地兼顾统计特征的抗噪性和结构特征对细微差别的分类能力,体现了各类乐谱专用符号的实际特点,并采用具有强大非线性分类能力的BP神经网络作为符号分类器,取得了良好的识别效果。最后,本文通过建立“乐谱结构树”,实现了由散乱图形符号数据到乐谱数据的有机组织和重建。讨论了音乐事件序列的生成方法,实现了乐谱语义解释及其语义内容的标准MIDI格式文件输出。作为本文的主要研究成果之一,开发研制了一个完整的乐谱智能识别原型系统IOMRS。应用本文提出的图形与语义相结合的识别评价体系,对IOMRS系统和商品化乐谱识别系统进行了性能评测和比较。测试结果表明,IOMRS的整体识别性能已达到目前优秀商品化乐谱识别系统的水平,并且在音符识别、不同数据环境下的适应性和执行速度三方面表现出明显的优势。

【Abstract】 The conversion of music scores in paper form into the digital ones is the necessary accessonly by which could have information been exchanged between the human music activities andthe computer music processing. Optical music recognition (OMR) is the process of analyzing thescanned image of a music score, and recognizing the music objects in the score, finally achievinga versatile machine-readable format. It is the OMR that breaks through the bottleneck of thepurely handcraft activates in the traditional music notation digitalization, and provides a smart,high effectiveness and efficiency method for the music digitalization. Therefore, OMR meansgreat value in both theoretical research and the applications.This paper takes multi-voice score as the research object, and performs a thorough researchon OMR from four aspects, including location and removal of staff lines, recognition of musicnotes, recognition of specific music symbols, reconstruction and semantic interpretation of themusic score. New practical theories and effective approaches are presented to the several keypoints as follows.The location and removal of staff lines is the first step in the OMR process. The paperinvestigates an integral method for image distortion correction and staff lines location. Theessential of this method is to improve the existing horizontal projection method by the idea of"breaking up whole into parts and correlation computing". The new method not only keeps themerits of the projection method such as simple and stable, but also is not sensitive to the distortionof the images. It resolves the conflict between the distortion resistance capability and noiseresistance capability, which exists in both statistical and structural strategies for locating stafflines.In the staff lines removal aspect, for resolving the issue of "over-removal", this paperproposes a new removal algorithm based on analyzing topological relationship among the imagesegments. Comparing with the current staff lines removal methods, the new algorithm places moreemphasis on association relations among environmental features of the segments to be removed,and thus identifies the differences between the staff line segments and non-staff line segmentsmore clearly and completely. As a result, staff lines can be removed without damaging symbolsaround staff lines.Recognition of music notes is the core and key in OMR. Considering the variety andpolymorphism characteristic of the music notes, a structure-based recognition strategy is chosen,in which there are two procedures: primitive extraction and structure analysis.In the procedure of primitive extraction, three methods are proposed respectively for threeshapes of note primitive: first, stem extraction is accomplished by searching stem candidatesroughly from vertical run length units plus checking the candidates strictly in terms of horizontal run length properties, and this method can overcome the weaknesses of the current method such asthe fragmentary results and poor adaptability to the complex notes; second, a "segmentation withfeature testing" method for extracting black note heads is designed, which employs the priorknowledge of notes and the information of recognized staff lines and stems to separate the noteheads that touch; third, a extraction method for beams similar to treating the black note heads isproposed, and this method can avoid the troublesome touching issue in the former line detectionmethod for extracting beams.In the procedure of structure analysis, an innovative approach is discussed for the structureanalysis of music notes based on the action field. By introducing the physics concept of actionfield to describe the association relationship of note primitives, it has good qualities withknowledgablity, robustness and precision. Further more, six substructures of notes are defined, anda model with priority identification for key structures is set up. This model imitates the wayhuman recognizes music score, which is prone to focus on point features and observe objects fromwhole to detail, so it helps to reduce the calculation complexity greatly, and gets rid of incorrectredundancy primitives successfully.Besides music notes, there are many text-like music symbols in the score. This paperproposes a method for those specific symbols based on classifying three groups of feature,including geometrical, normalized central moment and slicing feature. Such features have bothcapability of anti-noise remains to statistical feature and capability of distinguishing delicatedifference remains to structural feature. Additional, a three-level BP Neural Network with strongability of non-linear classification is employed as classifier. Experimental results with life scoresshow that the presented method can recognize the symbols effectively.Finally, by building the "tree-like structure of music score", this paper realizes theorganization and reconstruction of the music score from messy recognized data. Thereafter, bygenerating music event sequences, semantic interpretation of music score is carried out, and thesequences are written to a file by standard MIDI file format.As one of the main research achievements of this paper, a complete OMR prototype systemIOMRS is developed. This paper also proposes a performance evaluation criterion based on musicsymbols and semantics. In the light of the criterion, the IOMRS system is tested and comparedwith commercial OMR systems. The testing results show that the overall performance of IOMRShas achieved the standard of the current excellent commercial OMR system, and show obviousadvantages in music notes recognition, adaptability to the variety of raw music scores and thespeed of performance.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络