节点文献
含地理位置信息的社交媒体挖掘及应用
Geo-referenced Social Media Mining and Its Application
【作者】 蒋锴;
【导师】 俞能海;
【作者基本信息】 中国科学技术大学 , 信号与信息处理, 2014, 博士
【摘要】 近年来,随着移动互联网技术的发展和智能移动终端的普及,人们越来越习惯于通过智能移动终端上的应用随时随地获取或分享信息。在用于信息获取和分享的移动应用中,基于用户地理位置的服务已成为主流。人们在使用此类应用的过程中,产生了海量的含有地理位置信息的社交媒体数据,并且此类数据的规模呈爆炸性增长。这一类新型的海量媒体数据的出现,为许多研究领域带来了新的机遇和挑战,吸引了研究人员的兴趣和广泛关注。与传统的媒体数据相比,含地理位置信息的社交媒体数据具有独特性质,主要表现在以下三个方面:异构的地理信息表达和组成方式;强调移动性、时效性和交互性;包含空间、时间、社交等丰富上下文信息和多模态媒体内容。本论文针对含地理位置信息的社交媒体的上述三个特性所带来的科学问题,研究此类媒体的挖掘和推荐算法,包括:异构地理信息社交媒体融合挖掘算法;面向移动应用的高效在线推荐算法;以及融合多种上下文信息的个性化推荐算法。论文的主要研究工作和创新成果如下:1.提出了一种异构地理信息社交媒体融合挖掘算法以及基于二部图结构的重排序算法。论文针对点评网站等基于地理位置的服务中,结构化数据缺乏语义信息而难以应对特定信息需求的问题,提出了结构化和非结构化社交媒体数据的融合挖掘算法,用来对结构化数据的语义信息进行补充。在此基础上,又提出了一种基于二部图的排序算法对基于地理位置的服务中的商户进行重排序。实验结果表明:与仅使用点评网站中结构化信息的挖掘算法相比,论文所提出的融合挖掘算法的平均准确率均值相对提升了73%;相比于仅使用点评网站中结构信息的排序算法,以及仅使用商户在点评网站中原始评分的排序算法,论文提出的基于二部图结构并融合多种因素对商户进行重排序的算法更能满足用户的特定信息需求。2.提出了一种面向移动应用的高效在线推荐算法。论文针对移动应用场景中的在线地点推荐问题,提出了一种基于前缀树结构的可变记忆马尔科夫模型。论文所提出的算法从用户的历史地点序列中挖掘频繁序列模式,并以此构建前缀树结构,从而能够根据用户当前的地点高效地向用户推荐下一个地点,并根据用户的反馈动态调整模型。对于长度为l的地点序列,论文所提出的算法相比于传统的可变记忆马尔科夫模型:概率后缀树算法,将时间复杂度从O(Dl)降低至O(l),因此可以满足推荐的实时性要求。在提升算法效率的同时,实验结果表明,论文所提出的带有平滑模型的可变记忆马尔科夫模型能获得更高的推荐准确率。与固定阶数马尔科夫模型所能取得的最好结果相比,论文所提出的算法的平均准确率均值相对提升了69%;与概率后缀树算法相比,论文所提出的算法的平均准确率均值相对提升了36%。此外,论文所提出的在线地点推荐算法仅依赖于用户当前地点信息,所以该算法能够以很少的代价嵌入现有的各类基于地理位置服务的移动应用中。另外如果将“地点”的概念进行推广,该算法可以应用于地点推荐以外的其它问题,例如网页中用户点击行为预测、搜索引擎查询词推荐等。3.提出了一种融合多种上下文信息的个性化推荐算法。论文针对照片分享网站中海量社交媒体数据及其包含的丰富上下文信息,研究个性化的地点推荐算法。论文所提出的算法首先充分挖掘了照片分享网站中各种上下文信息,包括GPS位置信息、照片拍摄时间、用户信息、文本信息、照片视觉信息。在此基础上,该算法从多个方面计算地点与用户兴趣的匹配程度,并把融合多种上下文信息进行个性化推荐的问题建模为排序学习的问题,从而融合多种地点与用户兴趣度评分进行个性化的地点推荐。实验结果表明,论文所提出的算法能有效提高推荐的准确率,特别是在用户的历史信息比较稀少的情况下有显著提升。例如,实验数据集里42.7%的用户的历史地点序列中仅包含4个地点,论文所提出的算法在这种情况下平均准确率均值相对现有典型算法提升了27.5%。此外,论文提出的利用排序学习框架进行推荐的算法不仅限于个性化地点推荐问题,也可以应用于其它需要融合多种上下文信息进行推荐的问题。论文的最后对全文的研究工作进行了总结,并对未来的研究方向做出了展望。
【Abstract】 In recent years, the Internet has come into a mobile era, people are now used to browsing and sharing information through applications installed on their mobile devices. Most of these mobile applications are location based services, and users have generated a huge amount of geo-referenced social media while they’re using these location based services. The ever growing volume of geo-referenced social media has shed light upon many research fields, bringing challenges and opportunities to researchers.In contrast to conventional multimedia, the geo-referenced social media have unique characteristics, which lie in three aspects:Firstly, the geo-information and the way it organizes with media content is heterogeneous. Secondly, it emphasizes mobility, efficiency and user interaction. Thirdly, it contains various contextual information including spatial-temporal information, social information and multi-modality media. This dissertation focuses on the research problems brought by above three characteristics, and carries out the study in the following directions: heterogeneous structured data mining algorithm, efficient online location recommendation algorithm for mobile application and personalized location recommendation algorithm which exploits multiple types of contextual information.The content and contributions of this dissertation are as follows:1. Proposed a heterogeneous structured social media mining algorithm and a bipartite graph based ranking algorithm.In order to enrich location semantics in local review website and meet specific information need of travelers, the proposed algorithm combines both structured and unstructured geo-referenced social media to mine local semantics. After that, the proposed method applies a bipartite graph based ranking algorithm to re-rank the POIs in local review website. Experiments show that the algorithm can achieve a73%improvement in MAP compared to method that only uses structured data. Experiments also show that the bipartite graph based ranking algorithm can improve POIs original ranking in local review website so that it can fit travelers’information need.2. Proposed an efficient online recommendation algorithm for mobile applications.The online mobile recommendation problem requires time efficiency and dynamic model adjustment. To meet these requirements, the proposed algorithm extracts frequent sequential patterns from users’traveling history and uses these patterns to build prefix tree. Experiments show that in contrast to existing VMM algorithm probabilistic suffix tree (PST), the proposed algorithm reduces the time complexity from O(Dl) to O(l). Not only efficient, the proposed algorithm can also achieves better recommendation precision while combining with certain smoothing model:69%improvement compared to fixed order Markov model and36%improvement compared to PST. Because the proposed algorithm only relies on the user’s current location, it can be easily embedded into existing commercial location based service applications. Besides, if we extend the concept of’location’, the proposed algorithm can be applied in many other problems, such as user click prediction on web pages, query term recommendation in search engine, etc.3. Proposed a personalized recommendation algorithm which exploits various types of contextual information.The algorithm first exploits various contextual information from photo sharing websites including GPS coordinates, photos’taken time, user information, textual tags and photos’visual information. After that, the algorithm calculates user’s preference to a certain location from different aspects. Finally, the algorithm formulate the recommendation problem as a learning to rank problem, so that it combines preference predictions of different aspects to generate the final recommendation result. Experiments show the proposed algorithm improves recommendation precision, especially when the user’s traveling history contains little locations.42.7%users in experiment data set only have4locations in their travel history, in this situation the proposed algorithm achieves27.5%improvement compared to existing method. That is meaningful to alleviate the cold start problem. Besides, the recommendation algorithm based on learning to rank can be applied in other recommendation problems when various contextual information are involved.In the end, this dissertation concludes with a summarization of the research content and an outlook of future research opportunities and directions.
【Key words】 Location based service; social media mining; heterogeneous data mining; online recommendation; personalized recommendation;