节点文献

城市污水处理厂数据挖掘及相关技术研究

The Research on Data Mining of WWTP and Related Technologies

【作者】 李晓东

【导师】 曾光明; 黄国和;

【作者基本信息】 湖南大学 , 环境工程, 2007, 博士

【摘要】 城市污水处理厂存在着“数据丰富,但信息贫乏”的现象。同时污水处理过程作为一个复杂的工业过程,它的数据跟商业、金融、生物学等领域的数据相比具有不同的特点:1.数据量巨大、高维且有较强的耦合性;2.工业噪声和过程中的不确定性;3.动态性与数据类型的多样性;4.多时标性与不完整性;5.多模态性。基于这样一个事实,本文从数据预处理技术、城市污水量的非线性动力学分析和预测、城市污水处理厂异常检测与异常征兆模式挖掘、活性污泥工艺仿真平台的构建和扩展等多个方面研究了城市污水处理厂数据挖掘的理论和方法,主要工作和创新总结如下:1.在数据预处理技术方面,对数据集成、数据清洗、数据转换和数据归约等一系列的数据预处理技术进行了研究,并给出了相应的一系列算法。在这些研究的基础上,提出主题和应用双层导向的数据预处理技术,并利用这种技术设计了三个不同主题应用的数据预处理过程。2.在城市污水量的非线性动力学分析和预测方面,首先对Grassberger & Procaccia(GP)算法、虚假近邻法、Cao方法、自相关函数法、互信息量法和CC算法等相空间重构技术进行了研究,并用这些技术进行了相空间重构。在此基础上,通过最大Lyapunov指数、邻近返回点图分析(CRP,Close returns plot)和替代数据法分析判断城市污水水量时间序列存在混沌。基于这一结论对时间序列进行预测。利用神经网络对相空间重构的输入、输出矢量进行学习和训练得到一个能拟合输入、输出的神经网络模型。然后利用此模型对城市污水水量进行短时预测,取得了较为满意的预测效果。3.在城市污水处理厂异常检测与异常征兆模式挖掘方面,针对污水处理厂数据类分布极不平衡和代价敏感等特点,利用RWLOO(α)对支持向量机(Support Vector Machine,SVM)进行改进,用于污水处理过程的异常检测。利用遗传算法(Genetic Algorithm,GA)对改进支持向量机的全局最优化问题进行求解。此外为这种改进的支持向量机设计了一种简化算法,提高了运算速度。实际的运算结果显示,这种改进的支持向量机与标准的SVM和神经网络比较,对城市污水处理厂的异常的检测可以取得较好的效果。异常产生前会出现一些征兆,为了尽早地识别这些征兆以便及时的采取相应措施,避免异常的产生,本文利用异常征兆模式挖掘的方法来解决这一问题。首先定义序列模式相异度来区别正常模式与异常征兆模式,在此基础上提出基于滑动窗口的异常征兆模式挖掘算法。实际应用的结果表明,该算法可以提前识别出异常的征兆模式。4.在活性污泥工艺仿真平台的构建和扩展方面,首先对国际水协会(International Water Association,IWA)和欧洲科学技术研究合作组织(European Co-operation in the field of Scientific and Technical Research,COST 624)提出的污水仿真标准平台进行了研究。在此基础上,对COST的污水仿真标准平台进行扩展,使扩展后的平台能够对处理过程的异常进行仿真。其中扩展的重点是活性污泥膨胀模型:生物反应器的部分用模型概括了动力学选择理论、微生物衰减理论、营养物扩散理论和有机物贮存理论,二沉池部分根据丝状菌骨架理论对双指数沉降模型进行了改进。整个模型能够模拟污水量、曝气量、底物浓度和氮浓度四个变量引发的污泥膨胀。模拟结果跟理论分析是一致的。此外模型还在传感器故障和执行装置故障等方面进行了扩展,使扩展后的平台能仿真污水处理过程的大部分异常。扩展后的仿真平台在过程模型化、控制策略构造和评价、员工培训、趋势预报、环境风险评测等方面有广泛的应用前景。

【Abstract】 The phenomenon of "data rich, but the information is deficient" in urban waste water treatment plant (WWTP) is serious. As a complex industrial process, waste water treatment process is different from business, finance and biology in data characteristics: 1. Large amount and dimension, strongly coupled; 2. process noise and uncertain; 3. dynamic and type varied; 4. multi-temporal and incomplete; 5. Multi-modal. Because of these backgrounds, several techniques in following different aspect: data preprocessing, nonlinear dynamic analysis and forecast of influent time series, fault diagnosis and sign pattern mining, construction and extension of the IWA COST simulation benchmark are investigated in this dissertation. The main contributions of this dissertation are described as follows.1. In data preprocessing phase, a series of studies of data integration, cleaning, transformation and reduction are made, and a lot of corresponding algorithms are proposed. Based on these, subject and application oriented data preprocessing technique is presented, which is used to design 3 different data preprocessing course.2. In nonlinear dynamic analysis and forecast of influent time series phase. Firstly, the phase space reconstruction techniques of Grassberger-Procaccia (GP) algorithm, False Near Neighbor (FNN) method, Cao method, Autocorrelation Function method, Mutual Information method and CC algorithm are studied, and are used to reconstruct the phase space. Based on these, using the largest Lyapunov exponent, Close returns plot (CRP) and surrogate data analyses, it is concluded that influent time series of WWTP is chaos. Based on this conclusion, the time series is forecasted. A neural network (NN) is used to learn and train according to the results of phase space reconstruction, and then a good fitted of input/output NN model is gain. The trained NN model is used to forecast the influent time series of WWTP, and the results indicate that reasonable forecasting is achieved through such a method.3. In fault diagnosis and sign pattern mining phase. Because of the unbalanced distribution of the fault classes data quantity or importance, the risk functional RWLOO with weight coefficient based on leave-one-out errors is presented; then Genetic Algorithm (GA) is used to globally optimize the risk functional RWLOO Because of the size of the data is large, a simple algorithm of RWLOO is presented to reduce the amount of calculation. The improved Support Vector Machine (SVM) is used to classify dataset of WWTP, and the results indicate that compared with the standard SVM and neural network (NN), the improved one can gain higher classification accuracy. There were some signs before faults. For distinguishing these signs from normal quickly, to apply necessary measures, then avoid fault, fault sign pattern mining is proposed. Firstly, serial pattern dissimilarity measure is used to distinguishing fault sign from normal. Then based on dissimilarity measure, a fault sign pattern mining arithmetic based on sliding window is proposed. The practical application of the fault sign pattern mining arithmetic indicates that fault sign pattern can be identified beforehand.4. In construction and extension of simulation benchmark phase. Firstly, the simulation benchmark, proposed by International Water Association (IWA) and European Co-operation in the field of Scientific and Technical Research (COST), is studied. Then the IWA COST simulation benchmark is extended to simulate faults of activated sludge treatment process. The highlight of the extension is activated sludge bulking model: kinetic selection theory, bacterial decay theory, nutrient diffusion theory and storage theory are included in the reactor part; Double-exponential settling model is improved, according to filamentous backbone theory. The extended model can simulate bulking caused by influent, aeration rate, organic substrates and N-NH concentration. The simulation results conform to theoretical analysis. Further, extensions include faults caused by sensors and actuator. The extended model can simulate major faults of activated sludge treatment process, which illustrates a good prospect of application in process modeling, construction and evaluate of control strategy, staff training, trend prediction and environmental risk assessment.

  • 【网络出版投稿人】 湖南大学
  • 【网络出版年期】2007年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络