节点文献

笔画码汉字输入法软件设计与实现

Design and Implementation of A Chinese Character Keyboard Input Method Named "Stroke Code

【作者】 吴海辉

【导师】 吴建国;

【作者基本信息】 安徽大学 , 计算机应用技术, 2004, 硕士

【摘要】 汉字是中华民族传统文化的核心和信息交流的主要工具,古老而复杂多样的汉字属于二维平面的方块字,不像英文等西方文字那样是一维线形文字,可以直接输入计算机,而是需要采用特殊的汉字输入法软件。汉字输入计算机是计算机中文信息处理的第一个环节,汉字输入技术直接影响着中文信息处理的发展。本文着眼于汉字输入法软件在系统中的设计和开发过程,提出一种简单、方便的汉字键盘输入法。 论文首先统计了国标二级字库中汉字笔画信息的各种数据,这些数据主要包括:汉字的平均笔画数及按使用频度加权的平均笔画数、能与其它字区分开的汉字前若干笔画的平均数与加权平均数、以各种笔画起笔的汉字数、各种笔画在汉字字库中的出现次数、汉字字库中笔画相同的汉字以及汉字字库中相邻笔画的频度等。根据这些统计数据,我们采用书写汉字时的笔画顺序作为汉字输入码,设计了笔画码汉字输入法和实现该输入方法的键盘。 为了在输入法中显示汉字的笔画,论文介绍了采用曲线轮廓描述技术的TrueType字体,分析TrueType字形描述技术原理和TrueType文件结构,利用字体创造软件建立汉字笔画的TrueType字体文件。 在Windows系统下,输入法文件实际上是一个动态链接库程序。为此,论文分析了Windows操作系统对输入法支持的内部机制,揭示了输入法与系统的关系,并根据输入法原理,描述了输入法接口函数的工作过程以及应用程序对输入法的支持。 在实现输入法软件时,论文首先分析了笔画码汉字输入法的运行流程图,划分了程序的各个模块,并重点论述输入法外码与内码转换处理模块。为了存储汉字的笔画编码,论文提出了一种基于有序二叉树的高效优化索引树,将传统的trie-索引树进行优化,将节点合并,并采用特定的非定长结构存储树节点,大大节省了存储空间。由于汉字的平均笔画数过大,在用笔画码输入汉字时,如果完整的输入汉字笔画就会使得码长过长。为了实现汉字输入码的不完整输入,解决带有模糊输入符的字符串模式与一个字符串集合之间的模糊匹配问题,论文在第五章提出了一种字符串集合的模糊匹配算法,给出了算法的具体实现和复杂度分析。 论文在最后提出一种利用计算机硬件信息解决软件安全注册的实现方法,设计了软件注册的流程图。

【Abstract】 Chinese character is the core of Chinese tradition culture and the main tool of information exchange. The western character, just like English, which is one dimension linearity character, can be put into the computer directly. But the old and complex Chinese character, which belong to two dimensions square character, needs special input method software to put it into the computer. The first step of Chinese information processing is to put Chinese character into the computer and the technology of Chinese character input method affect the development of Chinese information processing directly. A simple and convenient Chinese character keyboard input method has been proposed in this thesis with a view to the process of design and development of Chinese character input method software.In this thesis we firstly made all kinds of statistics on Chinese character stroke information of Chinese National Standard Code For Information Interchange (GB2312-80), such as the average strokes of each Chinese character and each character which uses utility frequency as its weight, the average strokes of each character been added weight or not that can be differed from the other characters, the number of Chinese characters which begin with each stroke, the times of each stroke in the Chinese character set, the Chinese characters which have same strokes in the Chinese character set, and the frequency of adjacent stroke, etc. According to the statistic, we have devised a new Chinese Character keyboard input method named "Stroke Code" by adopting the stroke sequence of hand-written Chinese character as its input codes and designed a kind of key arrangement for the method.In order to display Chinese character’s strokes in the input method, we introduced TrueType font adopting curve outline describing technology, analyzed the principle of the describing technology and the file structure of TrueType font. Then we created a TrueType font file for Chinese character’s strokes by using font creator program.Under the Windows system, the file of input method editor(IME) is a dynamic link library in fact. Through analyzing the kernel mechanism of Windows operating system(OS) supporting IMB, we discovered the relation between IMB and OS. According to the principle of IME, we described the work procedure of IME’s interface functions and the application supporting IME.In order to develop a software of Chinese character IME named "Stroke Code", we analyzed flow chart of the IME and divided the program into several modules firstly. And then we mainly discussed the module which convert outside codes into internal statement number of Chinese character. We have proposed an efficient optimized trie-tree based on ordered binary tree to store the stroke strings of Chinese character. By merging some trie-tree nodes and using special structure as the trie-treenode’s memory structure, we got an optimized trie-tree which has the merits of low memory space. To avoid inputting the stroke strings completely while inputting a Chinese character, we have a requirement of missing some elements in Chinese character input codes. So a fast pattern matching algorithm on mass string assemble has been proposed to solve the problem of fuzzy matching between a string pattern and a string assemble. The algorithm has been described in detail and the cost of space and run time has been analyzed in chapter 5th of the thesis.The final work we have done is that we have proposed a safety method of shared software registration by using PC’s hardware information.

  • 【网络出版投稿人】 安徽大学
  • 【网络出版年期】2004年 03期
  • 【分类号】TP311.52
  • 【被引频次】10
  • 【下载频次】373
节点文献中: 

本文链接的文献网络图示:

本文的引文网络