节点文献

基于用户个性挖掘的Web社区营销研究

Web Community Marketing Research Based on User Characteristic and Interest Mining

【作者】 余伟

【导师】 李石君;

【作者基本信息】 武汉大学 , 计算机应用技术, 2011, 博士

【摘要】 随着Web社区的蓬勃发展,基于Web社区的网络营销越来越受到企业的关注。调查数据统计,截至2010年底,全国社区用户数量达到2.94亿,占全国网民总数的70.3%,2010年全国互联网广告市场份额达321.20亿元。但是消费者的购买行为在日益发展的社会形态中发生变化,传统的互联网广告已经不能够取得人们的信任,用户往往通过在互联网上搜索相关信息和评论来进行决策。社区营销成为时下网络营销推广的产物,利用Web社区进行口碑传播在消费者决策中扮演了极其重要的角色。对于消费者来说,其对人际信息的信任程度远高于对广告的信任。因此基于Web社区进行网络营销成为了品牌低成本、高效率的信息推广方式。Web社区营销发展时间较短,尚未形成有效的理论和统一的方法。Web社区营销的核心是互动和精准营销,本文就如何选择合适的社区进行社区营销;如何让用户在社区中检索到合适的主题;如何挖掘虚拟用户的真实特征属性和兴趣爱好;如何发现社区中失效的主题四个角度展开研究,解决了Web社区营销中的一些基本技术问题,形成了基本理论,主要研究内容如下:(1)针对如何选择Web社区,提出了基于数据质量评估和抽样方法的Web社区排序理论。通过建立数据质量,给出了评价社区数据源优劣的量化标准,从而使得评价标准可以度量和扩展,这种方法解决了传统排序算法中排序标准不能完整的反映真实评价的问题;而通过合适的抽样方法,从庞大的社区主题中随机抽取样本,使样本能够反映总体的特性,解决了社区中主题数量庞大不好度量的问题。(2)针对社区中资源的模糊搜索,提出了基于Trie树的新型模糊算法。当用户只记得某个单词的一部分时,用户只需输入该部分,通过本文的系统仍然可以找到需要的结果。并且具有交互功能:用户每输入一个字母,系统就会实时的提示用户可能目标词。为了实现高效性从而不影响用户满意度,本文提取了一种基于Trie树的算法。实验表明该算法能高效的实现本系统。(3)针对用户的特征属性和兴趣爱好挖掘,本文提出了基于本体语义分析的用户特征属性和兴趣爱好挖掘方法。通过建立用户的行为模型和特征模型,建立特征属性的属性集和推断规则集,建立不确定性的推断方法,来根据用户的行为特征和言论推断用户的特征属性和兴趣爱好。实验结果表明该方法具有良好的扩展性和准确性,解决了Web社区营销中目标的精准定位的问题。(4)为提高挖掘用户特征属性和兴趣爱好的效率,提出了基于交互关系的用户特征挖掘方法。本文通过大量社区用户数据统计和分析,研究了Web社区中用户之间的交互行为和兴趣相似度,建立了基于假设检验的理论评价方法,证明了社会学家关于“交往亲密的朋友具有更多的兴趣相似性”的观点在虚拟Web社区中同样具有适用性。在此基础上构造了快速挖掘Web社区中兴趣相似用户集合的算法,并通过置信度量和算法检验,证明了此算法在快速实现Web社区中兴趣相似的用户挖掘是有效的。(5)针对社区中主题失效的问题,提出了社区中主题网页时间一致性的建模、度量、推理和发现方法。网页的时间一致性是指网页所述的时间与实际时间相符,它是评价网络信息质量的一项重要指标,关系到网页内容的时效性和精准性。大量时间敏感度较高的网页中均存在时间的不一致性,严重影响了用户对网页内容的理解和决策行为。本文首先针对主题网页的时间维度进行了建模,包括对网页信息的时间敏感性分析、基于时间序列的网页分类和网页的时间维度抽取;然后针对网页时间一致性进行了度量与推理,包括对网页事件的时间不致性分类、网页事件的时间不一致性建模和主题网页中不一致的发现。通过此方法可以实现自动过滤Web社区中的时间不一致的主题,提高用户的使用感受。本文的研究为Web社区营销提供了理论支撑和技术支持,解决了如何从众多Web社区中进行甄别和排序;实现了社区主题的模糊查询方法;解决了如何精确挖掘用户特性特征和属性;实现了网络社区中过时主题信息的建模和发现方法。

【Abstract】 With the rapid development of Web communities, Web-based community network marketing recieves more and more attention from business. Survey data shows, by the end of 2010, community users have reached 294 million, accounting for 70.3% of total Internet users, and the national Internet advertising market share reaches 32.12 billion yuan in 2010. However, with the change of purchases in developing society, people tend to search related information on the internet for decision-making, instead of relying on traditional internet advertising. As a product of network marketing and promotion, community marketing plays a very important role in the consumer decision-making by word-of mouth advertising. For consumers, they trust the information among people more than advertisers. Therefore, Web-based community network marketing becomes a low-cost, high efficiency way of information promotion.Because of the short development period, Web community marketing has not yet built an effective theory and a unified approach. As the core of Web community marketing is the interaction and precision marketing, this paper studies four aspects: How to choose the appropriate community for community marketing; How to make users access to appropriate topic in community; how to mine the true characteristics and interests of the virtual user; how to find out-dated topic of community. Based on this, this paper solves some basic technical problems of Web community marketing and builts the basic theory. Main contents are as follows:(1)About how to choose Web community, this paper proposes a Web community ranking theory based on data quality assessment and sampling methods. The establishment of data quality gives a quantitative criterion for the evaluation of Web community data sources, which makes the evaluation criterion be measured and extended. This approach solves the problem that criteria in traditional sort algorithms can not completely reflect the real evaluation; and through the appropriate sampling method, which randomly draws out samples from large community topics so that samples can reflect the overall characteristics of the community, solves the problem about bad metrics of huge number of topics.(2) According to the fuzzy search of community resources, this paper proposes a new fuzzy algorithm based on Trie tree. When a user only remembers part of a word, the user just need to enter the remembered part, our system can still find the desired results. What’s more, our system has interactive characteristics:when a user enters a letter, the system will prompt the user possible target word in time. Experiments show that the algorithm can efficiently implement the system.(3)In view of users’ characteristics and interests mining, this paper presents a method for users’ characteristics and interests mining based on ontology semantic analysis. Through building users’ behavior model and characteristics model, establishing a characteristic set of properties and inferred properties of rule sets, and then creating uncertainty inference method, to infer the user’s characteristics and interests according to the user’s behavior characteristics and attributes of speech. Experimental results show that the method has good scalability and accuracy and solves the problem on the precise location of targets in Web communities marketing.(4) In order to improve the efficiency of mining user characteristics attributes and interests, this paper puts forward a mining method of user characteristics based on the interactive relationship. In this paper, according to a lot of data statistics and analysis, we present an evaluation method based on the theory of hypothesis testing, proving the sociologist’s point of view about "intimate friends have more similar interests" also has applicability in the virtual Web community. Afterwards in terms of statistical regularities, this paper constrcts the user group discovery algorithm. Final results show that this is a fast and effective method on mining user groups who have some interest.(5) Aimed at the problem about out-dated topics in Web community, this paper presents the modeling, measurement, reasoning and discovering methods of time consistency of topic pages in Web community. Time Consistency of Web pages which related to the timeliness and content accuracy is that the time webpages referred to matches the actual time, it is an important indicator for evaluating the quality of network information. Many time-sensitive pages exist time inconsistency, seriously affecting the user’s understanding of content and decision-making. This paper firstly constructs a model on the time dimension of the theme pages, including time-sensitive analysis of web information, time series-based classification and time dimension extraction of webpages; then measures and reasons on the web time consistency, including time inconsistency classsification of web events, time inconsistency modeling of web events and time inconsistency discovering of topic pages. This method can achieve automatic filtering time inconsistency topic in Web communities to improve the user’s experience.This study provides theoretical and technical support for the Web community marketing, and solves the problem that how to identify and sort from a lot of Web communities, realizes the fuzzy query method of community topics, addresses how to precisely mine users’characteristics and attributes and achieves outdated topic information modeling and discoverying method in Web community.

  • 【网络出版投稿人】 武汉大学
  • 【网络出版年期】2012年 04期
节点文献中: