节点文献

HTTP缓存系统设计与实现

Design and Implementation of Http Cache System

【作者】 张全明

【导师】 张新有;

【作者基本信息】 西南交通大学 , 计算机应用技术, 2013, 硕士

【摘要】 随着上网用户数量剧增,互联网应用种类的不断增加,大量的数据流量吞噬了网络带宽,导致网络拥堵现象增多、数据传输速度下降。为此,网络缓存技术已经成为众多网络应用研究的热门领域。本文对现有的主流缓存技术进行分类与分析,发现如Squid、Apache等著名的WWW缓存代理服务器运行时,对于缓存未命中的用户请求,系统的拦截转发式监听过程都会额外的增加用户访问延时。针对这一问题,本文提出基于旁路镜像式监听的缓存技术。该技术采用旁路端口镜像方式监听用户Internet通信流量,根据用户访问的倾向,将频繁访问的Web资源本地缓存。当缓存系统监听到用户请求Web资源且请求资源已缓存时,系统采用会话劫持技术引导用户去内网缓存服务器获取所需资源,因此用户无需再与远程Web服务器进行连接。所以基于旁路镜像式监听的缓存技术不仅达到了减少网络出口流量、节省带宽资费、加快用户访问速度和传输速度的效果,同时解决了拦截转发式缓存技术影响用户访问延时的问题。本文在Windows平台下设计实现了旁路镜像式HTTP缓存系统。系统应用WinPcap工具捕获镜像流量中的原始数据包,经过网络协议解析和过滤,获取用户资源请求信息,实现镜像监听功能;对于用户频繁访问的Web资源,系统应用套接字网络编程实现将其从外网下载且磁盘缓存;系统通过IIS建立内部网HTTP服务器,实现对磁盘缓存资源的发布和管理;引导用户获取缓存资源过程是通过封装含有缓存所在地址的响应包,冒充Web服务器欺骗用户内网获取资源来实现的;系统应用Microsoft SQL server实现日志显示内网用户资源请求状况。与此同时,为了提高系统查找磁盘缓存的效率,本文实现了用来存储和组织用户请求资源信息的哈希表结构,采用哈希查询算法来缩短系统处理延时,并且系统采用缓存资源替换和过期检测方法提高缓存系统的命中率和资源一致性。最后本文对HTTP缓存系统的功能和性能进行测试,结果表明了本文设计的旁路镜像式缓存系统达到了镜像监听用户访问Internet通信数据、对用户请求劫持重定向、内网缓存加速、减少用户访问延时、SQL server数据库记录显示内网用户资源请求状况的目的。从而验证了基于旁路镜像式监听的HTTP缓存系统的实用性和可行性。

【Abstract】 In recent years, the number of Internet users and the demand for Internet applications has quickly increased, and a great amount of data traffic consumes network bandwidth, resulting in a decline in network congestion and data transmission speed. Therefore, Web cache technology has become a hot research field of network application. In the thesis, with the current mainstream of cache technology classified and analysed, we find that when the famous cache proxy server, like Squid or Apache, is running and the cache misses the user’s request, the process based on monitoring by interception will add extra user’s access delay.In order to solve this problem, the thesis puts forward the cache technology based on monitoring by bypass mirrored. The technology uses bypass port mirrored to monitor users Internet traffic. According to the tendency of users’request, the system downloads and caches the Web resources which are frequently requested. When the cache system listens to a user’s request and the requested Web resource has been cached, the system will use the HTTP session hijacking technology to guide the user to cache system to obtain the resources needed, so the user no longer needs to connect with a remote Web server. Therefore, the cache technology based on monitoring by bypass mirrored not only reduces network outlet flow, saves bandwidth charges, accelerates user’s access and transmission speed, at the same time solves the issue that cache technology based on monitoring by interception affects the user access delay.Our HTTP cache system based on monitoring by bypass mirrored is designed on the Windows platform.For achieving mirrored monitoring function, The system uses WinPcap tool to capture raw network packets from the image flow, and through network protocol analysis and filtering, the system gets the information of user requesting Web resource. For Web resources which users frequently request, system uses Socket program to download them from external network and cache them on disk. For achieving the management and distribution of disk cache, The system uses IIS program to establish the Intranet HTTP server. Guiding users to obtain the cached resources is by packetaging the response packet containing the cache location, and the system act as Web server to cheat users to obtain resources from Intranet HTTP server.The system uses Microsoft SQL server to show the information of Intranet user requesting Web resource.At the same time, in order to improve the efficiency of the system for the disk cache, the thesis implements hash table structure to store and organize the information users request, then uses hash algorithm to shorten the system processing time delay. And the system also uses cache replacement and resources expired detection methods to improve the cache hit rate and system resources consistency.Finally in the thesis, the HTTP cache system functionality and performance test results show that the designed system can mirroring monitor users’access to Internet traffic data, and hijacking redirect users request. Intranet cache accelerates to reduce user access delay. SQL server database shows the users’resource request status. Thus the HTTP cache system based on monitoring by bypass mirrored is practicable and feasible.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络