节点文献

多示例学习方法在乳腺钼耙病灶图像检索中的应用研究

The Application of Multi-instance Learning Method for Mass Retrieval in Digitized Mammograms

【作者】 卢鹏飞

【导师】 厉力华; 刘伟;

【作者基本信息】 杭州电子科技大学 , 模式识别与智能系统, 2012, 硕士

【摘要】 乳腺癌是一种严重威胁中年女性生命与健康的恶性肿瘤。近年来乳腺癌在中国的发病率呈上升趋势。早发现、早诊断、早治疗能有效提高乳腺癌治愈率和乳腺病人的存活率。钼靶X线摄影成为临床上乳腺癌检测的最常用的手段。研究表明计算机辅助诊断(Computer-AidedDiagnosis,CAD)技术可以有效辅助医生帮助提高诊断效率,目前CAD中对肿块检测还存在许多困难。近年来,许多钼靶CAD系统引入了基于内容的图像检索技术(Content-basedImage Retrieval,CBIR)。相关研究表明,CBIR技术可以辅助医师提高肿块检测精度。临床诊断中,肿块病灶在影像中往往表现为多语义问题,一个病灶往往既含有病变部分又含有正常乳腺组织。CBIR的基本技术框架是“按例检索(query-by-example,QBE)”,仅仅基于特征匹配的QBE框架不能很好地解决图像检索中的“语义沟”问题,往往需要融合(有监督)机器学习方法以提高检索精度。由于医师提交的疑似病灶图像存在不确定性使得用传统的有监督学习方法来解决肿块病灶检索问题并非是一个最佳选择。多示例学习(Multi-instance learning, MIL)方法是用于解决上述不确定性问题的一种新的机器学习框架。与有监督学习不同,MIL框架中训练集是由包含概念标记的包组成,而包中示例是没有概念标记的。一个包被标记为正包要求包中至少有一个示例是正例,否则该包被标记为负包。学习算法从由有标记的包组成的训练集中学习出概念来预测新包的标记。MIL应用于CBIR时将每一幅图像视为一个包,分割后的每一个区域视为包中的示例。然后利用学习算法从训练集学习用户感兴趣的概念,并检索包含类似概念的相关图像。本文研究目的是将MIL方法应用于钼靶肿块病灶检索中。在乳腺钼靶检索系统中,查询病灶通常是不确定的且难以被描述,因为其既包含病变部分又含有正常乳腺组织。如果查询病灶被视为图像包,那么就可以利用MIL技术解决存在的不确定性问题。本文提出了三种不同的包生成器算法并用MIL算法进行概念学习,学习得到的概念用于检索。本文通过大量实验比较了不同的MIL算法下各包的检索性能。本文研究主要从以下三个部分进行。第一部分,提出三种在MIL框架下用于乳腺钼靶肿块病灶检索的包生成器方法,分别是基于JSEG分割图像的J-Bag,基于视觉注意计算模型的A-Bag以及基于改进的k-means聚类分割图像的K-Bag。最后病灶图像被转换成一个包含4个示例的包,其中每个示例包含4维特征向量。第二部分,建立本文实验所需的数据库,一个是DDSM数据库,另一个是病灶图像采集自浙江省肿瘤医院的数据库。第三部分,从训练数据集中随机挑选一定数量的正包和负包组成训练集,用给定的包生成器对病灶图像进行处理并计算各包,然后分别采用MIL算法(DD、EM-DD、BP-MIP)进行学习。学习所得的概念用于对测试数据集中的图像进行检索。实验中比较了MIL框架中不同包生成器和学习算法的性能,同时将本文提出的三种包生成器算法与SBN算法进行比较。从实验结果来看,MIL方法可以用于乳腺钼靶肿块病灶图像检索;本文提出的A-Bag和K-Bag包检索性能要好于经典的SBN包。使用的MIL算法中EM-DD算法检索性能最佳。最后总结了本论文的工作,并展望了未来研究需要改进的几个方面。

【Abstract】 Breast cancer is one of the leading causes of death among the middle-aged women。In China, theincidence of breast cancer presents persistent high growth. Early diagnosis and treatment caneffective increase the chances of survival for the patients of breast cancer. Mammography hasbecome one of the most popular approaches for early detection of breast cancer in the currentclinical environment. The studies show that computer-aided diagnosis (CAD) techniques can assistradiologists to detect masses and micro-calcifications in mammograms, but, accuracy to detectmasses with current CAD is still poor. Recently, content-based image retrieval (CBIR) techniqueshave been used widely in various CAD schemes. Relevant studies show that CBIR techniques canhelp clinicians to improve mass detection precision.In clinical diagnosis, the benign or malignant lesion and the normal tissue are physically adjacent ina ROI. The classical technique framework for CBIR is query by example (QBE), however, the QBEframework only based on feature matching can not solve the“semantic gap”problem well in imageretrieval, and often needs to be combined with (supervised) machine learning approaches toimprove the retrieval precision. The query mass given by clinicians is often ambiguity and difficultto be described which makes it not a best choice to apply supervised learning based approaches todeal with mass retrieval problem. Multi-Instance Learning (MIL) is a new machine learningframework for learning from ambiguity mentioned before. Unlike supervised learning, the trainingset is a composition of bag and its label; the labels are only marked to bags of instance. A bag islabeled positive if at least one instance in that bag is positive, otherwise the bag is labeled negative.The goal of MIL is to predict the labels of new bags based on the labeled bags as the training set.MIL is applied in the CBIR systems, in which each image is deemed as a labeled bag, and thesegmented regions in the images correspond to the instances in that bag. Then the MIL algorithmsare used to learn from the concept of insterests to users, and retrieval relevant images containsimilar concept.The objective of this paper is to research the implement of the MIL techniques in mass retrievaltask. In mammogram retrieval system, the query mass is ambiguity and difficult to be describedbecause in which the lesion and the normal tissue are physically adjacent. If the query mass can beprocessed as an image bag, then the ambiguity can be tackled by MIL techniques. In this paper, weproposed three image bag generators and used MIL algorithms to learn the target points andretrieval. An experimental study was taken to make a comparison of retrieval performance of threebag generators under different MIL algorithms. In the experiment, a bag generator called SBN is compared with three bag generators. This paper consists of three parts.In the first part, three image bag generators, which named J-Bag, A-Bag and K-Bag respectively,were proposed. J-Bag is based on the JSEG image segementation algorithm, A-Bag is based on asaliency-based bottom-up visual attention computational model and K-Bag is based on themodified k-means clustering image segementation algorithm. Finally the mass image is thenconverted into a corresponding image bag consisting of four 4-dimensional feature vectors. In thesecond part, two different mass databases were created. One is DDSM database, the other database,where the images were collected from the Zhejiang Cancer Hospital in China. In the last part, in thetraining phase, for each mass type, several positive query examples and several negative examplesare randomly selected. After that, a bag generator is chosen for transforming the mass images intoimage bags, and then the target concept is learned by Diverse Density (DD), EM-DD and BP-MIP,respectively. After the target concept has been learned, the remaining mass images in the test set areranked based on their distance to the learned concept. Experimental results show that: The MILtechniques can be applied to mammograms retrieval systems. The proposed bag generators A-Bagand K-Bag can achieve more efficient results than the existing bag generator SBN. EM-DDalgotithm get the best retrieval performance.Finally, give the summaries and predict some areas need to improvement in furtrue work.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络