节点文献

面向物联网的RFID不确定数据清洗与存储技术研究

Research on Internet of Things Oriented Cleaning and Storage for Unreliable RFID Data Set

【作者】 樊华

【导师】 吴泉源;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2013, 博士

【摘要】 物联网技术旨在实现人类社会与物理世界的有机结合,使人类可以以更加精细和动态的方式认知世界,并进行管理与控制,从而提高整个社会的信息化水平。RFID技术是物联网中最重要的信息获取技术之一,被广泛应用于物流仓储、供应链管理、资产管理、人员监控和室内定位与跟踪等领域。RFID技术是一种非接触式的射频识别技术,通过对电子标签的扫描,阅读器可以实时地获取标签位置和相应时间等信息,从而实现对电子标签及相应物品的追踪与定位,相应的数据通常可以用三元组(tag_ID, loc, time)的形式表示。然而,由于大量漏读现象(约占30%)的存在使得通过阅读器采集到的原始数据存在严重的不确定性或不完整性,如何清洗出这些不确定数据,并进行高效的存储,是RFID技术在物联网应用领域的关键,也是本文研究的重点。本文基于对RFID数据的不确定性、高冗余性和海量性等特性深入分析的基础上,围绕提高基于RFID技术的物联网应用中信息查询的精度和效率,以及节约RFID数据的存储开销等问题,在物理层数据清洗、逻辑层数据填补,以及海量数据存储与查询优化等方面提出了相应的模型与解决方案。取得的主要成果有:1.提出了一种基于概率运动状态模型的RFID不确定数据清洗方法。在物理层数据清洗方面,针对由阅读器漏读现象导致的RFID物理层数据的缺失问题,首先通过伯努利二项分布对RFID数据流建模,并引入一种针对RFID标签的概率运动状态模型,建立RFID原始数据与标签运动状态信息(速率、方向和位移等)之间的转换关系,进而利用标签的运动状态信息对漏读数据进行填补,最后给出了一种数据序列逆向过滤机制,进一步确保捕获到标签的运动状态信息。实验结果表明,该清洗方法比经典的滑动窗口平滑技术具有更高的精确度。2.提出了一种基于隐马尔可夫模型的RFID运动轨迹清洗方法。在逻辑层数据填补方面,针对基于RFID技术的室内追踪与定位系统中存在的运动轨迹信息不完整问题,首先把系统中阅读器的读数序列映射为隐马尔科夫模型中的可观察状态序列,同时把标签对应的位置序列映射为隐马尔科夫模型中的隐藏状态序列,从而标签的运动轨迹清洗问题转化为一个经典的基于隐马尔科夫模型的解码问题;并在经典解码算法—维特比算法基础上,设计了一种关于标签路径解码的高效算法。实验结果表明,本文提出的算法可以对标签运动轨迹进行高效准确的填充,为相关信息的精确查询提供了保障,改进后的清洗算法在性能方面有明显提高,而且比传统方法具有更高的精确度。3.提出了一种基于贝叶斯推理的RFID不确定数据清洗模型。漏读问题可能造成供应链相关企业反应决策贻误,并带来损失,为实时准确地获取被跟踪物品的进出货等流通状态信息,本文首先设计了一种基于路径编码的路径匹配算法,可以高效的统计出与当前标签具有相同历史路径信息的标签分布状态;进而提出一种基于路径信息的差异决策模型,以贝叶斯推理为理论基础,为具有不同历史路径信息的漏读标签分别给出差异化的决策方案,使清洗结果更加准确;最后,引入一种滑动窗口时间模型,有效节省了模型的计算开销,并利用最大熵模型动态地调整时间窗口的大小,从而使模型计算的实时性与清洗结果的准确性之间达到较好平衡。实验结果表明,本文提出的清洗方案不但可以有效提高供应链等领域中数据清洗的精确度,而且表现出了较好的可扩展性。4.在RFID海量数据存储与查询优化方面,提出了一种基于拆分路径框架的RFID数据存储模型。针对RFID数据规模庞大和高冗余性带来的数据存储代价大与信息查询效率低等问题,本文首先在基于路径框架的存储方案基础上,设计了一种基于树结构的路径拆分算法,使系统可以根据用户需求智能地规划出路径拆分方案;进而通过基于拆分路径框架的RFID数据存储模型将RFID数据以路径段的形式进行组织、聚类并存储,以提高数据的聚合度,降低数据冗余度,节约存储空间;最后,构造了基于该存储模型的查询框架以支持数据的高效查询。实验结果表明,相比于直接基于路径框架的存储方案,本文所提出的存储模型可以显著提高面向路径的信息查询性能,且所需的存储开销仅为原始数据规模的约12%。

【Abstract】 Internet of Things (IoT) is proposed to achieve the organic combination of humansociety and the physical world, which can make the human cognize the world in a morerefined and dynamic way and realize the management and control to improve theinformation level as a whole. RFID technology is one of the most important informationtechnologies in the field of Internet of Things, which is widely used in the field oflogistics warehousing, supply chain management, asset management, personnelmonitoring, indoor positioning and tracking, and so on. RFID technology is anon-contact radio frequency identification technology. By scanning the RFID tags,readers can obtain the location and time information of the tags in real-time in order toachieve the tracking and positioning of the RFID tags and the corresponding itemswhile the corresponding data is usually expressed in the form of (tag_ID, loc, time).However, due to plenty of miss reading phenomenon (approximately30%RFID datahave been missed), there is serious unreliability and incompleteness of the original datacollected by the RFID readers. How to clean these unreliable data and store themefficiently is the key of RFID technology in the field of Internet of Things applications,but also the focus of this paper.Based on the in-depth analysis on unreliability, highly redundancy and massivefeatures of RFID data, this paper focuses on improving the precision and efficiency ofinformation inquiry in the RFID technology based Internet of Things applications aswell as reducing RFID data storage overhead, and presents the corresponding modelsand solutions for data cleansing at the physical layer, data filling at the logical layer aswell as storage of massive data and query optimization. The main contributions of thispaper are as follows:1. We propose an unreliable RFID data cleaning method based on a probabilitymodel for the motion of tags. At the physical layer, in the light of the problem of RFIDdata leakage caused by miss reading, we model the RFID data stream by Bernoullibinomial distribution and introduce a probability model of RFID tag motion state. Thenwe create a conversion relationship between the raw RFID data and motion stateinformation of tags (speed, direction and displacement) so that the missed data can befilled according to the motion state information of tags. Finally, a reverse filteringmechanism for a data sequence is proposed to further ensure that the motion stateinformation of tags can be captured. Experimental results show that the cleaning methodhas higher accuracy than the classic sliding window smoothing technique.2. We propose a Hidden Markov Model based RFID trajectory data cleaningmethod. At the logical layer, in the light of the incompleteness of trajectoryinformation in the indoor tracking and positioning system based on RFID technology, we first map the reading sequence of readers in the system to the observable statesequence of the Hidden Markov Model while we map the position sequencecorresponding to the tag to the hidden state sequence in the Hidden Markov Model, sothat the trajectory cleaning problem of the tags is transformed into a classic decodingproblem based on the Hidden Markov Model. Based on the classical decodingalgorithm-Viterbi algorithm, an efficient algorithm for path decoding is presented.Experimental results show that the proposed algorithm can efficiently and accurately fillthe missed trajectories, provide a guarantee for the accurate information query, and theaccuracy and processing performance of data cleaning have been greatly improved thantraditional methods.3. We propose a Bayesian inference based approach for unreliable RFID datacleaning. Miss reading can cause supply-chain companies to mistakenly respond to themarket demand and bring huge economic losses. In order to accurately obtain thereal-time receiving and shipping information of the items being tracked, this paper firstpresents the path code schema based path matching algorithm, which can efficientlyobtain the distribution information of the tags which have the same historical path withthe current tag. Thus, a path information based differentiated decision model isproposed to provide differentiated decision program for the missed tags with differenthistorical path information and make the cleaning results more accurate. Finally, thesliding time window model which can effectively save the computational overhead ofthe model is introduced and it uses the maximum entropy model to dynamically adjustthe size of the time window so that the efficiency and accuracy of the model canperform a better balance. Experimental results show that the proposed cleaning methodnot only can effectively improve the data cleansing accuracy of the supply chain field,but also have better scalability.4. For the massive RFID data storage and query optimization, we propose asplit-path schema-based RFID data storage model. More and more space and time areneeded to store and process such huge RFID data, and there is an increasing realizationthat the existing approaches cannot satisfy the requirement of RFID data management.First, on the basis of the path framework based storage solutions, a tree structure basedpath splitting approach is proposed to split the movement paths of products intelligentlyand automatically according to the requirement of users. Further, we present a split-pathschema based RFID data storage model. With a data separation mechanism, the massiveRFID data produced in supply chain manage systems can be clustered, stored andprocessed more efficiently. Finally, based on the proposed new storage model, wedesign the relational schema to store the path information and time information of tags,and some typical query templates and SQL statements are defined. Experimental resultsshow that compared with the path encoding schema-based storage model, the proposedstorage model can significantly improve the path-oriented query performance. Moreover, the storage overhead of our model is only about12%of that of the raw RFID data.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络