节点文献

面向交通服务的多源移动轨迹数据挖掘与多尺度居民活动的知识发现

Transport Service Oriented Multi-source Mobile Trajectory Data Mining and Multi-level Knowledge Discovery of Human Activities

【作者】 邓中伟

【导师】 季民河;

【作者基本信息】 华东师范大学 , 地图学与地理信息系统, 2012, 博士

【摘要】 在个人移动通讯、自动导航、以及云计算普遍使用的今天,交通信息服务已经进入到智慧地球和自发式地理信息时代,导致了居民活动数据的获取和分析方式的巨大变化。本文研究如何通过对海量异源异构数据进行有效整合,利用空间分析和智能化方法相结合的手段,提取和表达蕴含在居民出行轨迹数据中大量的人类活动信息和时空行为模式,对交通需求进行基于活动的预测管理,为城市交通拥挤问题提供解决方案。围绕该研究问题,论文做了以下几个方面的工作:(1)对反映居民活动的多源异构数据的获取、融合、组织和质量评价方面进行了探讨,在方法论上做了创新;(2)以时间地理学和活动理论等关于人类活动的各种时空制约条件理论为指导,充分利用地理背景信息,提出了一种情景-领域知识驱动的自上而下的轨迹数据分析方法,对轨迹数据中的居民活动信息进行了提取;(3)在活动表达和分析方面,提出了基于情景-领域知识的活动表达模型,并采用地理可视化的方法,在个体微观层次对居民活动的时空模式与居民特征之间的关系进行了分析;(4)采用大样本出租汽车车载GPS数据,从宏观层面对城市居民活动的时空分布和结构进行了研究,以此建立居民活动的数据获取、活动信息提取、活动表达与分析的从微观到宏观的分析框架。论文对推进城市居民行为学研究和交通领域中现代出行调查方法研究做出以下四个方面的创新贡献。第一,居民活动的海量异源异构数据的获取、融合、组织和质量评价。论文将多源异构的居民活动的数据分为面向特定研究主题的专业化居民出行调查、自发式地理信息和WEB网页数据挖掘、智能交通系统衍生的大量无特定研究意识的轨迹数据三类进行探讨。在调查方法上进行了两个重要的创新探索,提出并建立了耦合GPS和网络调查的居民出行调查法和基于WEB数据挖掘的居民自发式地理信息采集流程,打破了传统的居民活动数据收集和研究的空间尺度对行政区划的依赖,为建立“个人-小区-街道(乡镇)-区县-城市”的多空间尺度居民活动数据奠定了基础。具体包括三方面内容:(1)在专业化居民出行调查方面,提出了耦合GPS和网络的居民出行调查,采用GPS仪器被动式收集居民出行活动的动态信息,采用网络调查收集居民活动的家庭、社会经济属性等静态信息。为了验证该方法的适用性,设计并实施了四次居民出行调查,论证结果表明新的调查方式降低了受访者的负担和花费、提高了数据质量和精度,其中对于交通调查理解较弱和积极性较差者改善效果更明显,能够成为下一代居民出行调查的主要技术手段。(2)在自发式地理信息和WEB数据挖掘方面,围绕居民活动行为研究,采用WEB数据挖掘的方法对专业化的门户网站的自发式信息网页进行了信息抽取、数据清洗和组织,建立起了“小区”尺度的多维属性,拓展人类活动分析理论领域的数据收集方式和处理方法。(3)有效利用智能交通领域衍生的大量轨迹数据作为居民活动研究特别是城市活动的宏观分析的重要数据源,出租汽车调度系统衍生的车载GPS数据具有覆盖范围广、流动性强、样本量大等特点,论文对其应用进行了论述,并提出了基于此类来源数据的宏观层次上的居民活动时空分布研究方法。第二,基于情景-领域知识驱动的居民活动信息提取。传统的轨迹数据挖掘和分析方法主要是基于轨迹数据本身的时空特征,而忽略了轨迹所记录的主体(轨迹主体)自身的活动规律和领域知识。针对上述缺陷,论文提出了一种自上而下的情景-领域知识驱动的轨迹数据分析理论框架。框架采用异源异构的地理背景数据对活动发生的情景进行还原,将时间地理学和活动理论等领域的对人类活动的各种制约条件理论作为数据分析和逻辑推理的规则,充分利用轨迹记录主体自身的活动特征和属性,解释和挖掘轨迹数据所呈现的数据特征背后所蕴含的居民活动信息。为了支持情景-领域知识驱动的轨迹数据分析框架,论文提出了-种面向对象(活动)的轨迹数据表达方法,将轨迹表达为有序的“移动”和“停留”序列,与居民出行活动的“移动”和“停留”相对应,用“移动”、“停留”的数据特征和地理背景信息表达居民活动的特征和活动情景。并采用机器学习(决策树C5.0算法)的方法对居民活动的停靠点、出行交通方式、出行交通目的进行提取实验和分析,使得交通出行方式的判定和出行交通目的的识别不必依赖于研究者的个人经验。第三,基于情景-领域知识的居民活动表达与分析。基于本文所提出的面向对象(活动)的轨迹数据表达模型,对有序“移动(Movement)-停留(Stop)"序列进行了语义增强和扩展,提出了情景-领域知识框架下的居民出行活动的表达模型。用“移动”的起迄点的空间信息和时间信息来表达居民活动的路径和速度、出行时长、出行距离等移动特征,用“停留”点所处的情景窗口来表达活动地点的城市环境(机会),同时与个体社会、经济属性特征相关联。新的活动表达模型能够持面向“位置”、“时间”、“人”和“活动”的多维条件和复合条件的查询和分析。为了分析居民活动模式变化,将动物学领域的最小凸多边形的方法引入居民活动分析,用以衡量人们活动范围与居住地的关系及其随时间变化的特征,分析个人和家庭模式在空间上和时间上的变化以及不同群体间的差异比较,为活动分析提供了更深的视角来研究个人和家庭内部决策的空间复杂性。第四,基于大样本出租汽车车载GPS数据的城市活动时空分布结构分析。居民的出行活动需求与城市环境(机会)的分布密切相关,它们均具有相对的稳定性结构。论文在大样本宏观视角下将出租汽车乘客的活动假设为出租汽车的出行活动,通过对上海市出租车调度系统的9,349辆出租车一天的营运数据的分析,用情景-领域知识驱动的轨迹数据分析方法对出租汽车乘客的出行目的进行了提取和分析,获得了宏观层面的居民活动规律。尽管该方法还缺乏严密的论证,但是论文初步研究结果表明可以利用该方法进行宏观的城市活动时空分布结构分析,与基于个人的居民活动表达和分析相对应,从而建立从微观到宏观的居民活动分析框架。

【Abstract】 In the smart world and Volunteered Geographic Information (VGI) age, how to obtain and handle humongous amounts of human activity data generated from diverse sources has become a fundamental challenge to scientists and management practitioners in many relevant fields.One such issue is related to mass trajectory databeing generated on a daily basis from inner-city personal travels, which presumably can be utilized for city traffic improvement and transport management. This thesis was intended to seek answers to the research question of how trajectory data, when combined with heterogeneous data from other sources, can be effectively utilized to detect the social activity patterns of urban residents and model their spatiotemporal behavior in the city environment.To achieve this goal, several technical objectives were accomplished, including development ofa trajectory data mining framework, known asScene-Domain Knowledge Driven Framework (SDKDF), to represent and analyze human activities atboth individual and aggregation scales. In addition, data acquisition and processing methods for extraction of activity information were discussed. Major contributions of this thesis consist ofthe following four points.(1) The acquisition, fusion, organization, and quality assessment of heterogeneous human activity data from a variety of sources. The thesis groupedavailable human activity data into three categories:household travel survey (HTS) data, VGI data, and trajectory data automatically generated by intelligent transportation systems.Two innovative explorations were conducted on the HTS methods, i.e., design and implementation of a procedure of coupling passive GPS survey and web-based questionaire survey and a web-based data mining procedure with geospatial recalls for self-reports from respondents. These approaches provided potential to construct an activity database with multiple spatial scales to support hierarchical analsyes. Specifical works included the following.(a) In the approach of coupling GPS and web-based survey technologies, selected GPS data loggers were used to collect respondents’track data for accurate routes and trip detection, while web-based surveys were used to collect personal information from the survey respondents. Four case studies were conducted to test and improve the approach. Results indicated that passive GPS-web-based survey techniques could significantly elevate the accuracy and efficiency of travel data collection and drastically reduce respondents’burden and survey costs. It seemed promising to become a major means in the next generation of HTS.(b) VGI and its websiteswere new and rapidly growing data sources, which providedgreat opportunities for human activity studies at community scale. The thesis designed a standard workflow for data cleaning, fusion, and knowledge mining from several VGI websites.(c) Humongous trajectory data collected from taxi companies wereprocessed to provide large coverage and dynamic representation of human activities in Shanghai. This type of data was proven suitable for macro-level analysis of human behaviors.The thesis synthesized data of the three categories and organized them for the analyses of different scales.(2)Proposal of the Scene-Domain Knowledge Driven Framework (SDKDF) for extracting and analyzingactivityinformation. Traditional data mining only relies on spatiotemporal characteristics of the trajectory data and completelyignores the behavioral patternof and domain knowledge about the respondent. The SDKDF framework presented a top-down approach to reconstruct the activity scene from the heterogeneous geographical background data and formulate reasoning rules with respect to socioeconomic and personal constraints based on the theories of time geography and behavior science.In order to support SDKDF, an object-oriented activity representation data model was developed, which treateda trajectory as a sequence of ordered "stops"(representing locations of individual activities) and "moves"(representing directional movements between adjacent acitivity stops).Such semantic information as trips, travel mode, and trip purposes weresemi-automatically extracted from trajectory data with machine learningtechniques (i.e. C.50).(3)SDKDF based representation and analyses of human activities. According to the object-oriented trajectory representation model proposed above, the move-stop sequence wassemantically labeled to form a representation model of residential travel activity under SDKDF. The semantic labeling was performed by placing the move-stop sequencewithin its geographic background (named a scene window) and with reference to domain knowledge. Travel speed, duration, distance, and other travel characteristics were derived for both stops and moves from their respective spatial and temporal information.The scene window of crucial points (such as OD points)was used toexpress the urban environment (opportunities), and the socioeconomiccharacteristicsof individual respondents were associated with unique PIDs. Data respresented in the model could be readiliy queried and analyzedwith multiple and complex conditions relatedto location, time, personal attributes, and activities. In order to analyze changes in activity patterns, a geo-visualization method known asMinimum Convex Polygon (MCP) was introduced from the biologyfields. The spatiotemporal variation of activity patterns was analyzed using the MCP method to gain better understanding of the complex internal decision-making process under a given personal or family circumstances.(4)Detection and analysis of spatiotemporal structure of large-scale urban activities using mass GPS-based taxi service datasets. Admitting the stochastic nature of taxi movement at the individual level, the generic pattern of taxi services in the duration of a day should theoretically conform to the changing pattern of demand in relation to urban landuse settings and activity schedules. To verify the theory, a GPS-based taxi service dataset with a total of9,349ODrecords was explored to map the one-day activitiesof Shanghai residents in12different time slots. By assuming that the taxi’s travel destination be the passenger’s trip purpose, the SDKDF method was applied again to obtain the overall spatiotemporal structure of residents’ activities. Although further analyses were yet to be conducted for deeper revealation of the activity dynamics, the preliminary results proved the feasibility of this approach to large-scale urban activity analysis and the potential of integrating it into the analytical hierarchy of human activities at various spatial scales.

  • 【分类号】P228.4;U491
  • 【被引频次】7
  • 【下载频次】1966
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络