节点文献

基于GPS轨迹和照片轨迹的时空数据挖掘

Spatial-Temporal Data Mining Based on GPS Trajectory and Geo-Tagged Photo Trajectory

【作者】 王冠男

【导师】 王志忠;

【作者基本信息】 中南大学 , 概率论与数理统计, 2013, 博士

【摘要】 现代信息技术和位置感知技术迅猛发展,人们能够方便的获得各种移动物体的轨迹数据。通过分析轨迹数据可以获得很多有价值的信息,并且推断出新的知识。许多基于位置推荐的门户网站已经引起了大众和研究者的关注,关于轨迹数据挖掘的研究也逐渐变得炙手可热。轨迹数据时时空数据的重要分支,本文主要研究了基于GPS轨迹以及照片轨迹的时空数据挖掘方法,采用多种统计方法挖掘轨迹之间的相似度、旅游者的行走模式等有价值的信息,通过针对真实数据进行实验,证明了方法在实践中的高效性。本文的研究工作主要包含如下四个方面:1)提出度量GPS轨迹相似度的几何算法(GSAs)。轨迹相似度算法能够提炼移动轨迹之间的相似信息,这些信息在城市道路网、交通和地理信息系统中发挥着很重要的作用。首先提出LAR定义(Length-Angle Ratio),用来简化GPS轨迹并检测其中的重点区域(sig-region);然后按照重点区域将轨迹分段,分别使用向量法和面积法计算轨迹中各段之间的差异性;最后通过综合分析这些差异性得到GPS轨迹的相似度。算法的优势在于三方面,首先,当GPS点缺失时,算法仍然有效;其次,GSAs应用了真实距离,体现了轨迹的几何特征,并且在以往研究的基础上充分考虑了每个用户和轨迹的独特性以及交通网的特征;最后,试验证明GSAs在精确度和时间复杂度上均优于其它现有算法。2)提出路线还原方法,将非连续Geo照片轨迹(带有地理位置信息的照片轨迹)还原成连续的GPS轨迹。GPS轨迹占据存储空间大,不易处理,原始Geo照片轨迹虽然易储存,但是不能提供和GPS轨迹同样丰富的信息,将Geo照片轨迹还原为GPS轨迹可以同时解决以上难题。本文首先提出区域兴趣度比,将景点排序。然后应用隐半马尔可夫模型(HSMM)解释旅游者的迁移规律,得到重要区域序列,在此基础上,提出均值算法将重要区域序列还原成完整的GPS序列。最后,提出基于留一交叉检验(LOOCV)的试验方法,检验还原路线与GPS路线的契合性,并且证明得到的连续GPS轨迹符合人群行走基本规律。3)挖掘照片轨迹的统计特征以及时空规律。人群行走活动的基本规律在路线规划、目的地预测和推荐系统中具有举足轻重的作用。首先,通过挖掘行走活动的共同统计特征,发现照片轨迹的一些变量符合对数正态分布,继而服从重尾法则,这些变量包括LAR,重要区域间的距离以及用户在重要区域的停留时间。其次,照片轨迹表现出高度的时空规律,本文主要从三方面来进一步研究这一问题:①重要区域间的距离符合对数正态分布的原因;②影响用户选择目的地的因素;③均方位移在照片轨迹中的特征。真实数据结果证明这些共同的统计特征和时空规律在不同的区域中是相同的。4)用生存分析的方法解决轨迹问题中删失数据的问题。已知在轨迹数据挖掘中存在删失数据的问题,传统模型不适用于这类数据,需要建立相应的删失数据模型。照片轨迹中照片之间的时间间隔为右删失数据,本文以这一数据为例,首先用Kaplan-Meier估计建立非参数模型,研究相应的生存模型和危险率模型。然后建立时间间隔关于拍照停留时间的Buckley-James模型,并且用经验似然的方法估计参数的置信区间。最后建立时间间隔关于拍照停留时间以及用户个人信息的半参数模型,并且提出基于删失数据的经验似然方法,得到经验似然函数比,证明渐进分布为标准卡方分布,简化了置信区间的求解步骤。轨迹挖掘技术为快速发展的信息技术做出了杰出的贡献,而迅猛发展的信息技术又为轨迹数据挖掘提供了更广阔的发展空间,两者相辅相成。本文提供了高效易行的方法和技术,所得到的研究和技术成果并不仅仅可局限于轨迹数据挖掘领域,方法和技术的相关算法可以应用到其它研究领域。本文所做的相关分析研究不仅丰富了轨迹数据挖掘领域的技术,并且对很多现实问题具有指导意义。

【Abstract】 Recent advancements of information and location-aware technologies have enhanced our capability of collecting individual trajectory data of people, vehicles, or other moving objects. The analysis of trajectory data which enables us to discover valuable information and infer new knowledge has been a hot research in the interdisciplinary field between computer science and geographic information science. Furthermore, a branch of geographic applications based on user-generated trajectory data has appeared on the Web and received considerable attention. Then the research on the spatial-temple data mining has been a hot issue. On the base of the recent researches, this PhD thesis mainly aims at making research on the GPS trajectory and geo-tagged photo trajectory based data mining methods. Such methods can provide effective service for the traffic, travel, personal service recommendation, and etc. The results of real trajectory data experiments show good performance.This thesis mainly focuses on the following four key points:1) Firstly, we propose a series of Geometric Similarity Algorithms (GSAs) to geographically analyze the real GPS trajectory. Such trajectory similarity is important to road networks, traffic and geographic systems by effectively retrieving the information with high relevance. In our approach, we first propose a Length-Angle Ratio to detect the significant regions in the trajectory, and then we measure the trajectory similarity by considering the differences between geometric features of two trajectories. Additionally, we take into account both the personality of each traveler and the uniqueness of each trajectory by fully analyzing the geometric features of them. In the experiment, we evaluate the proposed method using the collected actual geographic location data in the experiment. The results show a good performance, furthermore, the proposed method has an advantage over the existing method in accuracy and computing efficiency. 2) Secondly, we propose a novel travel route restoring method to analyze the geo-tagged photo trajectory. Sharing geo-tagged photos has been a hot social activity in the daily life because these photos not only contain geo information but also indicate people’s hobbies, intention and mobility patterns. However, the present raw geo-tagged photo routes cannot provide information as enough as complete GPS trajectories due to the defects hidden in them. In our approach we first propose an Interest Measure ratio to rank the hot spots based on density-based spatial clustering arithmetic. Then we apply the Hidden Semi-Markov model and Mean Value method to demonstrate migration discipline in the hot spots and restore the significant region sequence into complete GPS trajectory. At the end of the paper, a novel experiment method is designed to demonstrate that the approach is feasible in restoring route, and there is a good performance.3) Thirdly, we study the travel pattern hidden in the Geo-tagged photo trajectories. Mastering the basic laws of travel activity is significant in the application of travel planning, forecasting and recommending. Though there have been many similar researches, our understanding remains limited thanks to the lacks of tools to monitor the time-resolved location of individuals. Here we study the geo-tagged photo trajectories scrawled from the web of Flickr. We find that many parameters of travel walks follow power-law and further appear heavy tail (log-normal distribution), such as the Length-Angle Ratio which can help us find the significant regions (sig-regions), the stay time of travelers in sig-regions and the distance between sig-regions. Besides the common statistical features, the travel trajectories also show a high degree of temporal and spatial regularity. In order to further study this regularity," why log-normal distribution" about travel flight is explained, and a research on "how to decide the next destination" is made to go deep into the travel patterns. Additionally, our work points out that there exist differences between regular human walks and travel walks due to the big differences of properties hidden in them. These common statistical features and properties are important for the study of human travel activity and can also help in the further recommendation or forecasting applications.4) Finally, we analyze the censored data in trajectories with survival analysis method. For various reasons time intervals can be only observed and measured partially, it is censored-problem; ordinary regression models can not treat censored data, so it is necessary to establish censored data based models. The interval time T between two photos are right censored, and we take T as example to make survival analysis. We first establish nonparametric model of T with Kaplan-Meier estimator, and study the corresponding survival model and hazard function. Then we establish COX model and Buckley-James (B-J) model to illustrate the relationship between T and distance between photos, and estimate the confidence interval with empirical likelihood method. At last, we establish semi-parametric model of T and the basic information of users. The present study is mainly designed to use empirical likelihood (EL) method based on synthetic dependent data, and the result cannot be applied directly due to the weights in it. In this thesis, a censored empirical log-likelihood ratio is introduced to tackle this problem. Particularly, we demonstrate that its limiting distribution is a standard chi-squared distribution. This method is used to calculate the p-value and construct the confidence interval. Some simulation studies are conducted to highlight the performance of the proposed EL method, and the results show that it performs well.In this thesis, we provide technical support for the current fast development of information technology and spatial-temple data by proposing convenient and precise mining algorithms. The algorithms we proposed are not confined in the field of computing, their corresponding construction and technologies can be used in other researches and applications. The work in this thesis enriches the theory and methods in the field of spatial-temple data mining and has extensive applicability and practical significance.

  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2014年 03期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络