节点文献

基于集成神经网络的便携式空气质量监测电子鼻系统性能的提升

Performance Improvement of a Portable Electronic Nose for Indoor Air Quality Monitoring Based on Neural Networks Ensemble

【作者】 Chaibou Kadri

【导师】 田逢春;

【作者基本信息】 重庆大学 , 电路与系统, 2013, 博士

【摘要】 如果说有一种现代化进程自它诞生以来便成功地改变了人类的生活条件,那么它必然是工业化进程。但是,正如所有的进程化一样,工业化也带来许多负面的影响。环境污染,特别是室内外空气污染,便是其带来的负面影响之一。因为空气质量,尤其是我们长期处于的室内空气质量,可能会引发人们严重的健康问题。因此,人们对用于空气质量实时检测,灵巧的传感器系统展现了越来越浓厚的兴趣。电子鼻系统被认为是现有技术如:由人组成的专家组,基于气相色谱和质变的分析方法等的较好的替换方法。此外,电子鼻系统可以由低成本的现有的金属氧化物半导体传感器构成。但是,此类传感器容易出现漂移,并且易受非分析的目标气体的干扰。如果没有漂移补偿和基于模式识别算法的干扰消除或抑制,这些特性可能会降低电子鼻的性能。因此,鲁棒信号处理算法将这些因素视为电子鼻中最重要的部分。设计出这样的信号处理算法是本论文的主要目标。虽然电子鼻系统的概念已经出现几十年了,但直到现在,许多潜在用户对电子鼻系统依然所知甚少。因此,论文首先介绍了电子鼻系统的发展历史、一些关键概念和架构(包括采样及传输系统、传感器阵列、信号预处理、特征提取和模式识别等)及其应用。列举了一些当前可用的商用电子鼻系统。最后,指出了关于电子鼻当前的发展方向和存在的一些问题。电子鼻系统的校正需要一些原始数据,这些数据是通过在控制气体环境下的一些实验得到的。论文介绍了自制的电子鼻系统和产生原始数据库的实验装置和实验流程。这些数据构成了数据预处理和模式识别的基础。正交信号校正(OSC)是一种被成功应用于电子鼻系统的信号预处理方法。为了研究OSC的效果,论文研究了基于两个不同的多变量回归模型、多层感知器(MLP),偏最小二乘法(PLS)。通过6种室内空气污染物的实验结果表明,综合OSC和MLP的方法仅在有非常强的背景噪声下效果明显;而结合OSC和PLS在任何强度下的背景噪声下都能取得非常好的效果。但在需要非线性识别的情况下,MLP的效果比PLS更好。人工神经网络(ANN)和支持向量机(SVM)是广泛应用于电子鼻系统中的模式识别算法。人工神经网络是基于经验风险最小化的方法;而支持向量机则是在统计学习理论框架下基于结构风险最小化的方法。本文详细分析了MLP的人工神经网络和支持向量机的基本原理。考虑到遗传算法因为具备全局搜索能力,论文采用遗传算法分别优化MLP的初始权值和支持向量机的超参数。通过使用5种室内空气污染物数据库的实验结果表明,虽然MLP和支持向量机模型都可以提供令人满意的结果,但支持向量机具有更好的泛化性,这和理论分析的结果一致。但是,从嵌入式系统的应用角度考虑,MLP模型比支持向量机模型的计算复杂度更小。。有许多方法可以提升MLP神经网络的泛化性能,包括规范化,交叉验证,训练样本加噪声,以及集成神经网络方法等。其中集成神经网络是论文的研究重点。集成方法的成功可以从基于偏差和方差的分解进行解释,集成方法可以减少方差及偏离。集成学习参考了通过传统机器学习方法产生多重基础模型的技术,然后将其综合到一个集成模型中。生成阶段的目标是创造出在预测方面不同的精确的基础模型。这可以通过三类方法完全:基于学习集的修正的方法(如:bagging,boosting),基于训练算法修正的方法(如:负相关学习),和基于选择的方法(如:ambiguity based method, GASEN)。因为在组合阶段,线性方法(如:简单平均,加权和),非线性方法(中位数规则,“stacked generalization”)被广泛使用。另一个重要的可以替换创建基础模型集合的方法是专家组的混合,但是此方法并不在本文的研究范围以内。由于bagging和boosting法具有好的实验效果和理论支持,因而被广泛应用于集成学习算法。另外,bagging法在不稳定估计中被发现有比支持向量机、人工神经网络等更好的效果。本文提出了一种新的基于集成方法选择的方法。此方法综合了用来多样性度量的差异度量方差膨胀因子(VIF)方法,性能测量(均方差或者预测的平均绝对相对误差)方法和遗传算法从打包的神经网络池(pool)中挑选最优数量的基本模型。实验结果表明,本文提出的方法仅在非常少的情况比类似的方法效果差;另外,此方法的效果比最优的基本网络和标准打包方法(standardbagging methods)要好,更多关于VIF规则的研究可望显著提高此方法的性能。长期、短期漂移是气体传感器面临的最严重问题之一。如果没有进行抗漂移处理,会对电子鼻的性能造成严重的影响。本文提出了一种可以处理漂移的集成方法用于在线气体浓度估计,实验结果表明此方法不仅效果好,而且在与其它集成方法的比对中也显得很有吸引力。

【Abstract】 If there is a modernization process that has successfully changed human livingconditions since its inception, it is undoubtedly the industrialization process. However,like any process, industrialization has many negative aspects. Environmental pollution,more specifically the outdoor as well as the indoor air pollution is one of these negativeaspects. Consequently, there has been a resurgence of interest in the development ofsmart sensing systems for real-time air quality monitoring, especially in indoorenvironments where we usually spend most of our time and from which serioushealth-related problems may emerge. Electronic nose (E-nose) systems are found asgood alternative to existing techniques such as the use of human panel, analyticalmethods based on gas chromatography or mass spectrometry, to name a few. Moreover,these systems can be constructed using cost-effective off-the-shelf metal oxidesemiconductor gas sensors. However, these sensors are prone to drift and interferencefrom non-target gas analytes, which may jeopardize the performance of E-nose systemsif drift counteraction and interference removal are not accounted by the patternrecognition algorithm. Therefore, robust signal processing algorithms that considerthese factors are of paramount importance in E-nose systems. Designing such signalprocessing algorithms is the main objective of this thesis work.Although E-nose systems have been around for decades, up to now, many potentialusers do not know about these systems. Therefore, the history, key concepts andarchitecture (this includes sampling and delivery systems, sensor array, signalpreprocessing, feature extraction, and pattern recognition) of E-nose systems are firstdiscussed, and then some of their applications are described. Furthermore, someavailable commercial E-nose systems are enumerated. Finally, current developmentsand problems associated with E-nose systems are pointed out.Calibration of E-nose systems requires some initial data sets that are mostlygenerated through several experiments under controlled atmospheric conditions. Aself-made E-nose system is first introduced, and then the experimental setup andprocedure to generate initial data sets are described. These data sets constitute a basisfor data preprocessing and pattern recognition.Orthogonal signal processing (OSC) is a preprocessing technique that has beensuccessfully applied in electronic nose systems. To investigate the effectiveness of OSC, an empirical study using two different multivariate regression methods, multilayerperceptron (MLP) and partial least squares (PLS), was carried out. Experimental resultsusing data sets of six indoor air pollutants show that, combination of OSC and MLP iseffective only in the presence of very strong background noise, whereas thecombination of OSC and PLS is very effective regardless of the level of backgroundnoise. However, the performance of MLP was better than that of PLS, which implies theneed for nonlinear pattern recognition methods.Artificial neural networks (ANNs) and support vector machines (SVMs) arepattern recognition methods that are widely used in E-nose systems. ANNs are based onempirical risk minimization; while SVMs are grounded in the framework of statisticallearning theory which is based on structural risk minimization. The basic principle ofANNs (with emphasis on MLP) and SVMs was thoroughly discussed. Owing to itsglobal search capability, genetic algorithm was used to optimize the initial weights andthe hyper-parameters of MLP and SVM, respectively. Experimental results using datasets of five indoor air pollutants show that, although both MLP and SVM modelsprovide satisfactory results, the latter have better generalization performance, which isin line with the theoretical assumption. However, for embedded applications, MLPmodels involve less computational complexity than SVM models. This is the rationalebehind stressing on MLP models, at the cost of improving their generalizationperformance.There are many methods to improve the generalization performance of MLP neuralnetworks. These include regularization, cross-validation, training with jitter (noise), andensemble method. The latter is the focus of this thesis work. The success of ensemblemethod can be explained based on the bias-variance decomposition of error, whichshows that ensemble method can reduce variance and also bias. Ensemble learningrefers to techniques which generate multiple base models using traditional machinelearning algorithms and combine them into an ensemble model. In the generation stagethe objective is to create base models that are sufficiently accurate and diverse in theirpredictions. This can be done through three categories of methods: methods based onthe modification of the learning set (e.g. bagging, boosting), methods based on themodification of the training algorithm (e.g. Negative Correlation Learning), andmethods based on selection (e.g. ambiguity based method, GASEN). As in thecombination stage, linear methods (e.g. simple averaging, weighted sum), and nonlinearmethods (median rule,“stacked generalization”) are commonly used. Another important alternative for creating an ensemble of base models is the mixture of experts, which isbeyond the scope of this thesis.Owing to their good empirical results and theoretical support, bagging andboosting are the most widely used ensemble learning algorithms. Moreover, bagging hasbeen found to be more effective on unstable estimators (predictors) such as supportvector machines, artificial neural networks, to name a few. A new selection basedensemble method is proposed and discussed. The method combines variance inflationfactor (VIF) as diversity metric, performance measure (either the mean squared error, orthe mean absolute relative error of prediction) and genetic algorithm to select optimumnumber of base networks (models) from a pool of bagged neural networks. Results fromtwo empirical studies show that the proposed method compares unfavorably with othersimilar methods in only few cases. Moreover, this method performs better than the bestbase network and the standard bagging method. More research on the rules regardingVIF will significantly improve the performance of this method.Long or short term drift is one of the most serious problems associated with gassensors. It can drastically affect the performance of electronic nose systems if nocounteraction is performed. In the last empirical study, an ensemble method that cancope with drift problem is proposed for online gas concentration estimation.Experimental results show that the method is not only effective but also attractive whencompared with other ensemble methods.

  • 【网络出版投稿人】 重庆大学
  • 【网络出版年期】2014年 02期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络