节点文献

公开密钥算法RSA的分析及其IP核的实现与验证

Analysis of Public-key Cryptography RSA and IP Implementation and Verification

【作者】 李佳璐

【导师】 周玉洁;

【作者基本信息】 上海交通大学 , 电路与系统, 2009, 硕士

【摘要】 随着电子商务的发展,出现了智能卡、电子钥匙USB KEY等,广泛应用于交通、身份认证等领域,极大的方便了人们的工作和生活。RSA是目前应用最广泛的公开密钥算法,在智能卡等小型移动设备中实现RSA密钥算法,具有十分重大的意义。本设计的设计目标定义为面向低端,兼顾小面积和高性能,设计内容是包括RSA IP核设计和RSA密钥生成在内的一整套RSA算法解决方案。为保证安全性,可支持密钥长度要达到2048比特。本文首先对RSA加解密算法进行了深入分析,确定使用RL方式的二进制扫描算法实现模幂,使用Montgomery CIOS算法实现模乘,详细分析算法参数的选择,并对模平方等算法进行优化。通过软件建模,明确了算法实现的层次,为硬件实现奠定了良好的基础。在IP核设计中,根据设计目标选择32位高基模乘器作为核心硬件结构。之后合理划分层次模块,根据算法优化控制逻辑。在模乘器数据通路中,使用两级流水线,4-2压缩器等技术缩短关键路径,并使用两倍于模乘器字长的T SRAM存储运算中间结果,进一步提高了硬件利用率。存储系统使用5个SRAM存储操作数、中间结果和最终结果,有效缩小IP面积;用反相时钟技术和高效的存储策略,将其他模块与存储系统交互的开销降到最低。最终使IP核实现小面积、高性能,在100Mhz时钟下,2048位模幂(操作数长度均为2048位)的速度约为3.7次/秒,1024位模幂速度约为33次/秒,性能优良。IP核支持大于32位,小于2048位的模幂和模乘运算,采用通用接口设计,利于SOC集成,且具备密钥保护功能,操作简便。为了达到商用标准,本设计对RSA IP核进行了严格的测试验证,包括功能测试、性能测试和压力测试。遵循经典ASIC设计流程,搭建符合C*Bus总线时序的仿真环境,做综合,静态时序分析,形式验证,以及版图设计。本设计在smic 0.18um工艺下最终版图面积小于1mm2,在100Mhz时钟下满足前端定义的时序约束。由于RSA密钥更换频率较低,为了节省硬件资源,采用嵌入式软件的方式实现RSA密钥生成算法。本设计使用的算法用于生成1024位和2048位RSA密钥,基于国产32位CPU核(C*CORE C340),使用硬件真随机数发生器HRNG和RSA协处理器进行硬件加速。通过对算法原理和步骤进行深入分析,提出生成素数的三个步骤:随机产生奇数候选数,预筛选,素性测试。通过研究和优化预筛选算法,大大提高了产生素数的效率。分别使用Euclid算法和扩展Euclid算法求解最大公约数和模逆。最后根据嵌入式系统的特点对整个算法进行了优化,测试结果与同类产品相比有一定优势。

【Abstract】 With the development of electronic business, smart card, USB Key and other small mobile devices are widely used in areas like traffic and identity authentication, and are very helpful in people’s work and life. Nowadays, RSA is the most widely-spread public-key cryptography. It is highly significant to implement RSA in small mobile devices like smart card.This design will be used in low-end market. The design goal is to keep balance between area and time, and to deliver a whole RSA cryptography implementation scheme including RSA IP design and RSA key generation. In order to ensure the security, the length of RSA key should be as long as 2048 bits.Firstly, RSA encryption and decryption algorithms are deeply analyzed in the paper. RL binary method is used for modular exponentiation, and Montgomery CIOS algorithm used for modular multiplication. I describe in detail the selection of parameters and the optimization of the algorithms. Through software modeling, the algorithm hierarchy is defined, which provides a basis for hardware implementation.During the design of IP core, according to the design goal, I select 32-bit high radix modular multiplier as the core hardware structure. After that, I divide the whole design into proper hierarchy and modules, and optimize the control logic based on algorithms. For data path in modular multiplier, I use two-level pipeline, 4-2 compressor, and some other methods to shorten the critical path, then adopt T SRAM in which the word length is twice the word length of the multiplier, in order to increase the hardware utilization ratio. For memory system, to decrease the IP area, I use 5 SRAMs to store operators, intermediate results and final results, and I use reverse-phase clock and efficient storage strategy to make sure that the cost of data transmission between memory and other modules is the lowest. Finally I achieve the goal of small area and high performance. At a clock frequency of 100 MHz, the speed of 2048-bit modular exponentiation (all operators are 2048-bit long) is about 3.7 times per second, and 1024-bit modular exponentiation is about 33 times per second. The IP core supports modular exponentiation and modular multiplication of 32-bit to 2048-bit operators, has a universal interface which can be easily integrated in SOC, provides key protection function and quite easy to operate.To come up to business standards, I have done strict tests and verification on RSA IP core, including function tests, performance tests and pressure tests. According to the general ASIC design flow, I establish the simulation environment based on C*Bus protocol. Then I do synthesis, static timing analysis, formal verification, and layout design. The IP layout area is less than 1 mm2 in smic 0.18um technology, and meets the frontend timing requirements at 100 MHz clock frequency.Since the frequency of RSA key changes is quite low, in order to save hardware resources, RSA key generation algorithm is implemented with software in embedded system. The algorithm in the design is used to generate 1024-bit or 2048-bit RSA key, accelerated by true random data generator HRNG and RSA coprocessor in the system, and implemented based on 32-bit processor C*CORE C340. Through detailed analysis of algorithm principle, I come up with three steps to generate a big prime number: random generation of odd candidates, trial division with small primes, and primality tests. Efficiency of prime generation is greatly increased by careful research and optimization of trial division. Greatest common divisor and modular inversion are resolved with Euclid algorithm and extended Euclid algorithm, respectively. At the final stage the whole design is optimized based on the features of embedded system, and the test result is better compared with similar products.

【关键词】 RSA模幂模乘MontgomeryIP密钥生成
【Key words】 RSAmodular exponentiationmodular multiplicationMontgomeryIPkey generation
节点文献中: 

本文链接的文献网络图示:

本文的引文网络