

Research on Effective Management and New Service Model for Chinese Calligraphy

【作者】 鲁伟明

【导师】 庄越挺;

【作者基本信息】 浙江大学 , 计算机科学与技术, 2009, 博士

【摘要】 中国书法是数字图书馆中的重要特色资源。随着数字化书法作品的日益增多,如何有效管理和利用数字图书馆中的书法资源成为一个重要的挑战。本论文从书法字特征提取、计算机辅助书法牌匾生成、书法风格建模和关联挖掘以及海量数据检索和处理等方面出发,探讨数字图书馆中书法资源组织管理和服务的关键技术问题,论文主要工作如下:1.书法字特征表达。由于书法字OCR技术的缺乏以及手工标注费时费力,而且汉字演变(如篆书)和书法字形变(如草书)均给书法字识别带来了困难,因而迫切需要不依赖书法字识别的检索方法。书法字形状特征表达成为基于形状书法字检索的关键问题,其思路在于融合形状上下文特征和梯度方向直方图特征表达书法字形状,关键在于利用梯度下降法求融合参数。此外,书法字风格是书法牌匾生成、书法风格建模和关联挖掘的基础,根据书法风格可表现为书法线条局部和全局统计意义上相似性,本文利用核函数融合书法字Gabor、Contourlet以及pHoG等特征来表达书法字风格特性。2.计算机辅助书法牌匾生成。由于数字图书馆中拥有大量的书法字资源,如何利用这些书法字组成用户需要的风格一致的牌匾是一项有意义的工作。计算机辅助书法牌匾生成主要思路为:用户提交所需牌匾内容,系统从书法字库中选取内容符合且风格一致的书法字产生书法牌匾候选集,最后根据风格一致性进行排序。为解决底层特征描述风格的不精确性和风格鉴赏的主观性,本文引入书法字上下文特征和相关反馈技术来提高风格相似度计算的准确性,改善书法牌匾的生成质量。3.书法风格建模。书法风格建模是基于风格的书法作品组织的关键,本文采用一种产生式概率模型,量化地表达书法作品的风格组成,其思路在于:利用聚类算法计算风格关键字,提出风格的可视化表达方式,通过构建潜在风格模型为书法作品风格建模,用风格来表达书法作品,最后用Kullback-Leibler距离度量书法作品的风格相似度。4.书法风格关联挖掘。为挖掘书法家、书法作品以及书法字之间的风格关联性,本文提出了两种基于图的挖掘模型:监督式权重学习的随机行走模型和联合随机行走模型。其核心思想为:利用书法字底层风格特征、书法资源上下文特征以及用户交互信息构建实体-关系图,然后再用随机行走模型进行风格关联挖掘;重点在于:实体-关系图的构建、不同类型的边的权重学习、用户交互信息的利用以及如何减小用户交互对边权重的影响。最后,基于模型提出了几个相关应用,如:基于风格的书法作品浏览、相似风格的书法作品的检索、计算机辅助书法作品鉴定、计算机辅助书法家风格分析等。5.海量数据检索和处理。为提高书法字检索速度,本文提出两种快速检索算法:基于关键点匹配和基于LSH的书法字检索算法,综合考虑检索速度和检索质量,提出再排序机制。针对海量书法资源,将书法风格建模和关联挖掘算法纳入到MapReduce框架。

【Abstract】 Chinese calligraphy has become an important and characteristic resource in digital library. With the growing of the digitized calligraphy works, how to utilize the calligraphy resource effectively in digital library is becoming a challenge.This thesis focuses on calligraphy character feature extraction, computer-aided calligraphy tablet design, calligraphy style modeling and relationships mining, as well as massive data retrieval and processing, and discusses some key technical problems about calligraphy resource management and services in digital library. The main contributions of this thesis are as follows:1. Calligraphic character feature representation. Due to the lack of calligraphic character OCR technology as well as the time-comsuming and laborious manual annotation, it is difficult to implement a text-based character retrieval system. In addition, the evolution of Chinese characters and the deformation of calligraphic characters both bring difficulties for character recognition, so there is a need for a character retrieval method which does not rely on the character recognition. Shape representation is the key problem for the shape-based calligraphic character retrieval. In Chapter 3, shape context and HoG feature are combined to represent the shape of character, where the combining parameter is learned by gradient descent algorithm. Moreover, character style feature is the base of calligraphy tablet design, style modeling and style relationships discovery. At the character level, calligraphy style is represented as the statistical similarity of stokes, so Gabor feature, Contourlet feature and pHoG feature are fused by kernel function to represent the calligraphy character style feature.2. Computer-aided calligraphy tablet design. How to generate style-consistent calligraphy tablet with the calligraphy characters in digital library for users is a meanful work. Users can submit a query to the system, and then the system select the corresponding calligraphy characters from the database according the query to form the candidate tablets, and finally rank these tablets according to the style consistency model. In order to address the inaccuracy of low-level feature and subjectivity of appreciation, the paper introduces the context feature and feedback technologies to adjust the style similarity measurement and to improve the quality of the generated tablets.3. Calligraphy style modeling. Style modeling the key for style based calligraphy works management. The paper introduces a generative probabilistic model for automatically extracting a presentation in calligraphic style for calligraphy works. At first, style words are generated by a clustering algorithm, and then Latent Style Model is builded to discover the calligraphic styles expressed by the collection of works. Finally, Kullback-Leibler distance is used to measure the style similarity between two calligraphy works.4. Calligraphy style relationships discovery. In order to discover the sytle relationships among artists, works and characters, two graph-based models are proposed: Supervised Learning Weighted Random Walk Model and Co-Random Walk Model. The main idea is as follows: constructing an entity-relationship graph according to low-level feature, context information and user interaction information firstly, and then using random walk model to measure the relationship between entities. The paper focuses on entity-relationship graph construction, edge weight learning, the usage of user interaction information and how to reduce the impact of user interaction on edge weight. Finally, several applications such as Calligraphy Style-Guided Works Browser, Style-Similar Calligraphy Works Retrieval, Author Identification for Works and Author Writing Style Analysis are built with different types of style relationships.5. Massive data retrieval and processing. In order to speed up the character retrieval, two efficient methods keypoint-based method and LSH-based method are proposed, and then a ReRank strategy is proposed to balance the tradeoff between retrieval speed and quality. Facing massive calligraphy resource in digital library, style modeling and relationships discovery algorithms mentioned above are implemented in the framework of MapReduce.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2011年 03期