节点文献
固网运营商客户流失预警模型研究
A Study on Churn Prediction of Fixed-Line Subscriber
【作者】 张莹莹;
【导师】 舒华英;
【作者基本信息】 北京邮电大学 , 信息管理与信息系统, 2008, 硕士
【摘要】 随着移动业务分流的加重、固网市场竞争的加剧,国内的固网运营商面临着巨大的挑战,客户流失现象日益严重。老客户流失带来的损失以及获取新客户的困难使得固网运营商意识到实施客户流失预警以及客户挽留的重要性。本论文针对固网运营商对客户流失预警的迫切需求以及国内相关研究和应用较少的现象,展开固网运营商客户流失预警模型的研究。本文应用CRISP-DM数据挖掘过程方法论,结合固网运营商的业务特点,详细阐述了建立固网运营商客户流失预警模型的各个步骤:商业理解、数据理解、数据准备、建模和模型评价。同时,在总结固网运营商客户流失预警数据特点的基础上,指出了固网运营商客户流失预警的关键问题。特征变量的构造和特征变量的选取对客户流失预警模型的学习效率以及最终模型的准确性和稳定性有很大影响。在分析和比较了众多变量关系分析理论的基础上,本文引入受试者操作特征曲线(ROC曲线)和信息论中的互信息量的概念来建立特征变量选取机制及具体方法:删除无分类预测能力的特征变量(ROC曲线的AUC小于等于0.5的变量),对于高相关的特征变量,优先保留高分类预测能力的特征变量,删除低分类预测能力的冗余变量。建模方法是预测结果是否有效的关键。本文在创新模型TreeLogit的基础上提出了mSTree-Logistic模型。该模型通过对使用多个样本集分别训练出的多棵决策树预测函数进行逻辑回归来得到最终的预测函数。本文对某固网运营商一市级分公司的客户数据进行上述方法的实证应用。应用结果证明了上述方法的可行性和有效性。
【Abstract】 With the growing substitution by mobile communication services and increasing competition in the fixed-line market, domestic fixed-line operators are facing great challenges. The increasing loss of subscribers is one of the biggest challenges. The huge loss caused by the switch of subscribers and the great difficulty of winning new ones make the fixed-line operators realize the importance of subscriber churn prediction and subscriber retention. In response to the fixed-line operators’ strong desires for churn prediction and the lack of researches and practices in the fixed-line market, this thesis studies how to apply data mining theories and technologies to churn prediction of fixed-line subscriber.Applying the CRISP-DM methodology, and combining it with the understandings of fixed-line business, this thesis elaborates the steps of building churn prediction model for fixed-line subscriber, including business understanding, data understanding, data preparation, modeling and evaluation. This thesis also points out the key issues of churn prediction of fixed-line subscriber, after summarizing the problems of the available data for churn prediction of fixed-line subscriber.The construction and selection of characteristics has great impact on the learning efficiency, accuracy and stability of final models. After analyzing various variable correlation theories, the thesis introduces ROC curves and mutual information theories to work out the method for characteristic selection. In the method, ROC curves are firstly applied to detect and deselect ineffective characteristics. Subsequently, mutual information is used to detect strongly correlated characteristics, among which characteristics with superior predictive performance are kept. Modeling method is the key to the effectiveness of prediction results. This thesis proposes mSTree-Logistic model, being inspired by TreeLogit model. In this model, a logistic regression function is induced from multiple decision trees, which are built based on different training sample sets respectively.A practice of churn prediction is conducted in a filiale of a fixed-line operator. The theories and methods proposed in this thesis are proved to be feasible and effective.
【Key words】 Churn prediction; ROC curve; Mutual information; Decision tree; Logistic regression;
- 【网络出版投稿人】 北京邮电大学 【网络出版年期】2008年 10期
- 【分类号】TN915
- 【被引频次】1
- 【下载频次】172