节点文献

苹果近红外光谱数据库系统关键算法研究及原型系统开发

Key Algorithms for Apple Near-infrared Spectral Database and Development of the Prototype System

【作者】 周万怀

【导师】 应义斌; 谢丽娟;

【作者基本信息】 浙江大学 , 生物系统工程, 2014, 博士

【摘要】 近红外(Near-infrared, NIR)光谱的信息量丰富,图谱稳定性高且容易采集,其NIR漫反射分析不需要对样品做任何化学处理,因此NIR光谱分析技术具有快速、无损和绿色的特点。计算机技术的广泛应用和化学计量学的不断发展促使NIR光谱分析方法在诸多领域倍受青睐。但是,建立性能优良的NIR模型必须同时具备:规范、合理的光谱采集标准,性能稳定、精度符合要求的光谱仪,丰富的样品资源,准确测量样品成分浓度的技术以及具有丰富经验的建模人员等条件。对于普通单位而言,很难同时具备以上条件;另一方面,具备以上条件的单位所取得的NIR光谱分析成果受限于现有光谱数据管理方法而难以大范围应用。因此,为了推广NIR光谱分析技术的应用范围和共享NIR光谱分析结果,建立光谱数据库系统(Spectral database system, SDBS)是非常必要的。本课题以苹果为检测对象,探索构建用于苹果NIR光谱及其分析结果管理的数据库系统的方法。首先,研究了苹果NIR光谱匹配算法(Spectral matching algorithm, SMA)。根据杰卡德相似系数(Jaccard similarity coefficient, JSC)构造全光谱匹配算法(Spectral matching algorithm based on JSC, SMA-JSC),将曲线线形作为光谱匹配指标引入到光谱匹配中来。其次,本文还利用曲线平滑算法、曲线谱峰识别方法对苹果NIR光谱有效特征峰识别进行研究。优选了苹果NIR光谱曲线谱峰识别参数,实现了苹果NIR光谱有效特征峰的自动识别。并在此基础上对光谱特征峰匹配算法(Spectral matching algorithm with peak information, SMA-P)进行研究。最后,根据以上研究所得出的结论,开发了苹果NIR SDBS原型系统。本文的主要内容和研究结果如下:(1)分析和研究了苹果NIR光谱特征峰自动识别方法,优选了苹果NIR光谱特征峰识别参数。由于常用曲线平滑算法容易导致光谱特征峰波段产生较大的形变,导致特征峰参数产生偏差,且无法满足特征峰自动识别的要求。本文提出一种基于数据点加权的曲线平滑算法,在固定宽度的滑动窗口内根据曲线波动频率对中心数据点加权,对权重不同的数据点采用不同的平滑算法进行平滑。当权重阈值大于0.5时,经过平滑的曲线均方根(Root mean square, RMS)值变化不明显,当窗口大小为21时,对特征峰波段的保护效果最优。选择峰宽和峰形指数作为假性峰(Pseudo peak, PP)过滤指标,测试了20个水平的峰宽闽值(Tpw:3~41)对PP的过滤效果,当Tpw达到29时,无法通过继续增大Tpw过滤其他PP;继续采用峰形指数阈值(Tpws过滤其他PP,当阈值为0.005时,滤除效果最佳。比较了8~128cm-1分辨率下的光谱特征峰识别情况,在32或64cm-1分辨率下的特征峰识别效果最佳,当分辨率继续降低时,光谱数据点数逐渐减小,无法满足Tpw的要求,特征峰识别效果变差。结果表明:当光谱分辨率为32cm-1,加权窗口为21,权重阈值为0.7,平滑窗口大小为21,Tpw为29,Tps为0.005时,特征峰位1正确识别率为100%,特征峰位2正确识别率为99.50%,可以实现苹果NIR光谱特征峰及相关参数的自动计算.(2)研究了用于苹果NIR光谱的SMA-P.对SMA-P区分不同样品光谱的能力进行验证,采用阿克苏红富士,山东红将军,陕西红富士和陕西黄金帅4个类别,每个类别100个样品,共400个样品进行测试。在400条试验样品光谱中随机抽取20条与所有样品光谱进行比较。分别采用特征峰个数、峰位、峰面积和峰宽作为光谱匹配指标进行匹配,抽取的20条光谱与总体样品光谱中多条光谱完全匹配的比率分别为100%、25.00%、10.00%和0。因此,采用特征峰宽或面积指标区分不同样品光谱效果较好。进一步采用特征峰宽和面积作为光谱匹配指标对样品光谱进行分类测试,平均分类正确率分别为47.25%和55.00%。此结果表明:SMA-P对不同类别的苹果样品分类识别能力较差,不能胜任苹果NIR SDBS对未知样品进行分类初选的任务.(3)研究了用于苹果NIR光谱的全光谱匹配算法(Spectral matching algorithm with full spectral information, SMA-FS).对SMA-FS,包括绝对差异法(Absolute distance, AD)、总体平方差法(Sum of square difference, SSD)、欧式距离法(Euclidean distance, ED)、相关系数法(Correlation coefficient, CC)和光谱角法(Spectral angle, SA),区分不同样品光谱的能力进行验证。仍采用(2)中所描述的测试样品和测试方法进行测试,结果表明上述5种SMA-FS均能够正确区分不同样品光谱。进一步采用这5种SMA-FS对样品光谱进行分类测试,平均分类识别正确率分别为65.50%、66.00%、73.00%、64.75%和62.75%,分类结果明显优于SMA-P的分类结果,但正确率仍有待进一步提高。根据JSC原理构造全光谱匹配算法SMA-JSC.采用(2)中所描述的测试样品和测试方法进行测试,结果表明SMA-JSC能够正确区分不同样品的光谱。进一步采用SMA-JSC对样品光谱进行分类测试,对应平均分类识别正确率为:94.50%(校正)和95.00%(内部验证);进一步扩大测试范围,采用甘肃红富士、山东红富士和陕西红富士三个类别的苹果,每个类别100个样品,共300个样品进行上述测试。结果进一步证实SMA-JSC能够正确区分不同样品的光谱,对应平均分类识别正确率为:93.67%和93.33%;为了验证扩大测试样品集对算法的影响,将两个测试样品集合并后再进行上述测试。结果表明SMA-JSC仍然能够正确区分所有不同样品的光谱,对应平均分类识别正确率为:94.14%和94.29%,算法性能并未因测试样品集的扩大而降低。采用判别分析法(Discriminant analysis, DA)对以上样品光谱进行分类测试。两两分类平均精度分别为:98.60%(原始光谱)、95.90%(一阶导数)和96.30%(二阶导数),随着样类别数量的增加,分类正确率逐渐下降,当对以上7类样品进行分类时,正确率降低为:88.00%、56.40%和58.40%。以上结果表明:SMA-JSC对多类别的苹果样品分类识别正确率远高于SMA-P和常见的SMA-FS,具有受样品类别数量影响小,分类精度高和性能稳定的优点。相比较而言,分类样品类别的增加将导致传统分类算法效果变差。因此,在上述几类算法中,SMA-JSC最适用于苹果NIR SDBS的分类筛选任务,为苹果NIR SDBS的查询分析正确率的提高提供了有力保障。(4)制定苹果NIR SDBS入库光谱规范。从标准样品选择、样品预处理方法、光谱采集仪器、仪器参数设置、光谱采集试验环境等影响光谱品质的因素着手,充分利用前人的研究成果和领域知识,并结合本文的研究结论制定了苹果NIR SDBS入库光谱规范。此项工作为苹果NIR SDBS的数据规范性和一致性提供了理论依据和指导。(5)基于苹果NIR SDBS入库光谱规范、苹果NIR光谱特征峰识别方法和SMA的研究结论,开发了苹果NIR SDBS原型系统,规划和设计了苹果NIR SDBS实用平台原型。此项工作为理论和方法研究提供了测试平台,同时也为后续的研究工作做好铺垫。

【Abstract】 Near-infrared (NIR) spectra can carry abundant sample information. They are with high stability and easy to collect, especialy, in NIR diffuse reflectance analysis, no chemical pretreatment was required. Therefore, NIR spectroscopy analysis method is regarded as a rapid, non-destructive and non-polluting method. Due to the wide application of computer technology and the continuous development in chemometrics, the NIR analysis method is becoming more and more popular in many fields. However, a reasonable standard, stable and accurate spectrometer, accurate measurement technology for sample component concentration and experienced scientists are needed for excellent NIR models. It is difficult to meet the above requirements for ordinary organizations. Therefore, standard NIR spectral information centers are necessary to promote the application of NIR analytical results.In this work, apples were selected as the targets of interests, and spectral matching algorithms (SMA) of apple NIR spectra were studied. A new spectral matching algorithm with full spectra (SMA-FS) based on the jaccard similarity coefficient (JSC) has been established. It is the first time to explore the available spectral matching method based on the shape of curve. In addition, an automatic peak detection algorithm was also studied. A prototype apple NIR SDBS has been developed to test the new explored algorithms.The main contents and results are summarized as follows:(1) An automatic peak detection algorithm was proposed and the parameters included in the algorithm were optimized. The commonly used spectral smoothing algorithms usually bring about large deformation in peak bands and large bias of peak parameters and can not meet the demands of automatic peak recognition. Thus, a new spectral smoothing algorithm based on weighted spectrl data points was proposed. In this algorithm, the center datapoint of a slide window with a fixed width was weighted according to the fluctuation frequency in the slide window. Where, all of the weights were normalized into0-1and large weights mean low noise levels while small weights mean high noise levels. The optimal weight threshold (WT) and window width (WW) were explored. The results of this study showed that:the noise content of the smoothed spectra did not change siginificantly when WT was larger than0.5and the peak band were best protected when the WW equaled to21. Peak width threshold (Tpw) and peak shape threshold (Tps) were adopted to filter pseudo peaks (PP) which were flat or narrow. In total,20different levels of Tpw from3to41were tested and the results indicated that all narrow PP were eliminated when Tpw reached29. Following this, a Tps equaled to0.005was used to filter flat PP. Effect of resolution on peak detection was also studied. Seven different levels of resolutions from2to128cm-1were tested and the results showed that resolution in the range of32~64cm-1was ideal for peak detection. It was concluded that apples NIR spectral peaks could be automatically detected under conditions that resolution was32cm-1or64cm-1, WW was21, WT was0.7, Tpw was29and Tps was0.005. The recognition rate of peak at5150cm-1was100%and peak at6900cm-1was99.50%.(2) SMA-P for apple NIR spectra was studied. The numbers of peaks, peak positions, peak areas and peak shapes have been used as the spectral matching indexes. The ability of SMA-P to distinguish between different spectra with peak information was validated. In total,400samples from4classes (100in each) were selected as the reference group and5samples were randomly selected from each class as the target group. The results indicated that different spectra could be distinguished from each other with their peak width and peak area. Based on this conclusion, classification tests were done with peak width and peak area. The slassification accuracies were47.25%and55.00%. So, it was concluded that the SMA-P could not be applied for sample classification in apple NIR SDBS.(3) SMA-FS for apple NIR spectra was studied. Normal SMA-FS, including absolutely distance (AD), square derivative (SD), euclidean distance (ED), correlation coefficient (CC) and spectral angle (SA) were used to distinguish different samples. The test datas introduced in (2) were also used for this test and the results indicated that all of these five methods could distinguish different spectra accurately. The classification accuracies were65.50%,66.00%,73.00%,64.75%and62.75%, which were much higher than the accuracies of the SMA-P. However, these algorithms were also unable to meet the requirements of SDBS for the accuracies were still not high engough. This might because the normal SMA-FS relied on spectral absolute intensity to much. Hence, a new SMA-FS based on JSC (SMA-JSC) was proposed to match spectra by the shape of spectral curves. In this method, the monotonicities of spectral curves in the same bands were compared for the similarity among different spectra. The experimental results showed that the new proposed algorithm could distinguish different spectra accurately, with the calibration accuracy being94.50%and the validation accuracy being95.00%. Futher verification was done with another group of datas which contained300samples of3classes (100in each). The calibration accuracy was93.67%and the validation accuracy was93.33%. Also, the mixing datas of these two batches were also used for the test, in which the calibration accuracy was94.14%and the validation accuracy was94.29%. A comparsion has been done between the SMA-JSC and the discriminant analysis (DA). The average classification accuracies between two varieties reached98.60%(raw spectra),95.90%(the first derivate) and96.30%(the second derivate). However, as the classes increasing, the accuracies declined rapidly and the classification accuracy of the total seven classes of samples droped to88.00%,56.40%and58.40%. It was concluded that the SMA-JSC was with high accuracy and stable and it was far superior to normal SMA-FS and DA method. SMA-JSC is optimal for classification task in apple NIR SDBS.(4) Standards of spectra which could be uploaded into the apple NIR SDBS were drawn up. These standards were based on previous research results, professional knowledge and our own conclusions, mainly involving four factors, i.e. sample pretreatment, spectrometer, spectrometer configuration and experimental environment. This work provided foundation for the reliability and the reference values of information stored in the apple NIR SDBS. Finally, an apple NIR SDBS prototype system was developed based on the above researches.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2014年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络