节点文献

农业复杂自适应搜索模型研究及实现

Complex Adaptive Agriculture Vertical Search Model and Its Implementation

【作者】 黄河

【导师】 王儒敬;

【作者基本信息】 中国科学技术大学 , 模式识别与智能系统, 2010, 博士

【摘要】 截止2009年底,互联网上的涉农网站已超过30000个,积累了丰富的农业技术、市场信息、政策法规、农业新闻等信息资源。然而由于互联网信息资源缺少统一的形式化表达,信息异质、异构、分散、重复现象严重,形成“信息孤岛”,很难发挥农业信息资源的集成效用。同时,由于农户文化水平、计算机操作能力的限制,“三农”用户很难使用传统的搜索工具去直接交互、捕捉和筛选个性化信息。面对海量的农业信息资源,“三农”用户只能望洋兴叹,“信息淹没”问题严重。因此,建立专业化、个性化、智能化的农业搜索模型及相应的搜索引擎系统意义重大。本文针对互联网的开放性、分散性、层次性、演化性、巨量性等本质特性,提出了一种农业复杂自适应搜索模型。该模型建立农业信息资源发现、信息获取、信息处理与用户服务主体联盟,通过主体与网络资源、主体与网页内容和网页表现形式、主体与用户个性化需求之间的学习与适应机制,实现对复杂、动态的互联网环境的适应,从而提高农业搜索引擎的查全率与查准率,解决新一代搜索引擎面临的核心问题。针对农业互联网资源的动态性和高度分散性特点,本文提出了AADWED(Adaptive Agriculture Deep Web Entry Discovery)算法,一种自适应农业领域Deep Web资源发现算法。该算法通过不断从样本中学习到合适的查询表达式提交给通用搜索引擎来高效获取领域Deep web资源入口页面。实验证明,该算法大幅度提高农业领域Deep Web资源发现的收益率。针对对Web站点页面表现形式具有多样性、动态性等特点,本文提出了一种自适应的Web结构化数据提取算法。该算法在MDR算法的基础上,提出了一种基于相对熵的页面去噪算法,从而提高了Web结构化数据抽取的准确率。针对互联网存在的大量农业领域数据描述不统一、不完整、冗余等问题,本文重点研究了农产品价格、供求等信息的空间属性自动标注和基于语义的数据冗余处理问题,提高了数据的质量和可用性,为进行精确检索和可视化分析服务提供了基础。针对不同Web用户的个性化需求,本文提出了一种基于FCA的自动挖掘用户兴趣主题算法。挖掘出的兴趣主题模式被描述成一组形式化概念,兴趣主题模式之间的联系被显示的在概念格中描述出来,利于用户理解。本文还提出了一种文档和用户感兴趣主题相关度的计算方法。通过对比实验,证明该方法是有效的。最后,本文基于所提出的农业复杂自适应搜索模型,设计并实现了农业垂直搜索引擎系统“中国搜农”,该系统已经开始大规模对外公开服务,并已在多个省市得到推广和应用。

【Abstract】 By the end of 2009, there have been more than 30000 agricultural web sites on the internet, which cover almost all kinds of agricultural information, such as agricultural technology, market information, agricultural news and policies. However, agricultural information on the web has no uniform representation and is heterogeneous, distributed and redundant, which forms isolated information islands. Since the knowledge of farmers to operate a computer is limited, it would be hard for them to use traditional search tools to acquire and filter personalized information on the web. Facing huge amount of information, farmers are often frustrated and the phenomenon of "information overload" is a serious matter here. Obviously, it is significant to develop personalized, intelligent and professional web search models and tools.For the characteristics of openness, scatterings, hierarchy, evolution and hugeness of internet, an agricultural search model based on complex adaptive system is proposed in this dissertation. This model constructs the agent alliance of agricultural information discovery agent, information acquisition agent, information processing agent and service agent. The model fit the complex and dynamic internet environment through learning mechanisms between agents and web contents, representation methods and user needs. The method proposed improves the precision and recall of agricultural search engine and solves the core problem for the next generation search engine.For the characteristics of dynamics and high scattering of web resources, AADWED (Adaptive Agriculture Deep Web Entry Discovery) algorithm is proposed to acquire domain-specific deep web resources effectively and efficiently. This algorithm constantly constructs queries according to the sample and submits the queries to a search engine in order to find the entry page of hidden web resources. The experiments validate that this method can significantly improve the efficiency of finding hidden web resources.Aiming at the two characteristics (dynamics and diversity) of web pages on the web sites, an adaptive web structural data extraction algorithm is presented in this dissertation. This algorithm is based on traditional MDR algorithm and adopts relative entropy theory for noise removal so as to improve the precision of web structural data extraction. Aiming at huge amount of heterogeneous, incomplete and redundant agricultural information on the web, this dissertation studied the automatic spatial property annotation and processing redundant data based on semantics for agricultural product price and buy/sell information. The proposed method improves the quality of data and constructs a fundamental for precise retrieval and visualization.To tackle the problem of personalized information needs from different web users, a new approach that automatically mining web user profile based on FCA is proposed. The interest models of web users are represented as formal concepts and the relationship between these models are described in a concept lattice. The method of assessing document relevance to the topics is also proposed. The experiments show that our approach is effective.At last, based on the complex adaptive agricultural search model proposed in this dissertation, agricultural vertical search engine "Sounong" has been designed and implemented. This search engine has served publicly for many provinces.

节点文献中: