节点文献

海量电信数据的挖掘与异常分析

The Mining and Analysis of the Aberration of Mass Telecommunications Data

【作者】 廖凡迪

【导师】 吴斌;

【作者基本信息】 北京邮电大学 , 计算机科学与技术, 2013, 硕士

【摘要】 随着科学研究、通信技术、IT技术的快速发展,电信业务的数据量急剧增长,而电信行业间日益激烈的竞争也使电信运营商更加需要注重网络和服务的质量来提高行业竞争力。如何从大量数据中获取异常但有用的潜在信息是异常挖掘的主要任务,也是通信网络优化和获得良好的服务质量的关键。本文对相关的数据挖掘和并行计算技术展开了一系列研究,旨在从海量电信数据中挖掘异常信息,指导通信网络优化和服务质量提高。本文首先根据超频用户的特点,提出了结合离群点检测算法和聚类系数的异常分析算法,其中离群点检测算法改进了基于密度的LOF算法,主要体现在采用SimHash算法改进原LOF算法中的性能瓶颈K近邻查找算法。然后结合乒乓切换的特点,提出了利用多标签分类算法来进行乒乓切换解决方案预测,以随机游走图的多标签分类算法为基础,结合全概率公式和随机过程实现多标签分类算法。为了使本文中改进的算法能适用于大数据,所有的算法利用MapReduce的编程框架进行编写,并利用空间换时间的原理降低了算法的时间复杂度,实现了并行计算的目的。通过对多种实验数据的大量实验证明,本文中提出的并行超频分析算法和并行乒乓切换方案预测算法有较高的准确率和较大的性能优势。最后本文给出了异常分析的原型系统设计,结合Hive和MapReduce编程实现了对原始数据的预处理,并依据不同的业务逻辑进行了ETL和统计。通过并行化的不同数据挖掘算法的分析,得到具有业务意义的数据分析结果,并且在前台界面予以展示本文将不同的机器学习的算法引入专题应用,克服了人工进行异常检测的效率低下和正确率容易受主观因素影响等缺点。通过大量的实验说明,本文中提出的异常分析方法和系统相对传统的异常分析系统有很大的优势。

【Abstract】 With the rapid development of scientific research, telecommunication technology and IT technology, the traffic of telecom service grows significantly. Therefore, the fierce competitions between telecom services providers make them pay more attention to the quality of network and service in order to increase their industry competitiveness. One of the most important parts of ensuring the quality of services is to obtain anomalous and useful potential information from large amount of information. This kind of mining information is also the main task of aberrant data mining.This paper undertakes a series of studies on both the relevant data mining and parallel computation technique in order to improve network and communication quality by miningaberrant information fromlarge amount of telecom traffic.According to the characteristics of those abnormal users, this paper comes up with a analysis algorithmof aberration combining outlier detection and cluster coefficience. Outlier detection algorithm is an evolvement of LOF based on capacity by substituting K-neighboursearch algorithm by SimHash algorithm. Besides, combining the peculiarity of Ping-pong switching, this paper suggests a predication using multi-label classified algorithm to solve Ping-pong switching. Based on the multi-label of Random walk diagram, this paper combines total probability and random process to accomplish the multi-label classified algorithm. In order to enable the improved algorithm to tit with large amount of traffic, all of the algorithms are implemented by MapReduce framework. What’s more, this design reduces the time complexity by trading space for time and finally achieves the parallel computation. A lot of experiment results show that this aberration analyses algorithm and parallel Ping-pong switching predication algorithm prove to be relatively highly accurate and effective.This project finally suggests a prototype design of aberrant analyses, combining Hive and MapReduce which achieve data’s preprocessing, ETL and statistic according to different service logic. By parallelly analyzing difference data mining algorithm, we can draw a meaningful result of data analyses which can be demonstrated on the user interface.This paper introduces different machine learning algorithm into specific utilization, overcoming the artificial detection’s shortcomings which are low efficiency and easily affected by subject factors. Proved by a large number of experiments, This analyses system of aberration have an advantage over the traditional one.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络