节点文献

视频搜索及语义提取

Video Retrieval and Semantic Information Extraction

【作者】 南晓明

【导师】 蔡安妮;

【作者基本信息】 北京邮电大学 , 通信与信息系统, 2010, 硕士

【摘要】 伴随着网络和多媒体技术的发展,视频信息急剧膨胀。如何对海量的视频信息进行有效地检索和查询,已经成为目前迫切需要解决的问题。因此,基于内容的视频检索(Content-Based Video Retrieval, CBVR)技术受到广泛关注。本文分别从低层视觉特征提取、高层语义特征提取以及语义视频搜索三个层次就基于内容的视频检索进行研究,提出了一些新的算法和框架,主要内容如下:在低层视觉特征的选择和提取方面,全面分析和比较了基于关键点、纹理、边缘和颜色信息的四大类视觉特征在概念检测中的性能。首先采用基于Bag-of-Visual-Words的关键点投影算法,有效地量化高维关键点特征;其次改进了采用不同检测子的SIFT、SURF特征进行特征级融合的方法,最后在TRECVID数据集上,测试了不同视觉特征的检测性能。实验结果显示,经过融合后的SIFT、SURF特征较融合前原始特征的性能有显著提高。在高层语义特征的提取方面,提出了一种视频语义概念的提取框架。使用颜色、Gabor小波、边缘直方图和SIFT四种视觉特征,为每种视觉特征训练支持向量机作为分类器,经过分类器的决策级融合后,得到概念检测结果。随后提出了多种决策级融合算法,并在自测实验中进行测试。实验结果表明,混合各概念最佳融合算法构成的混合融合算法,对性能提高最大。TRECVID 2008高层特征提取的评测结果显示,本系统的整体性能高于所有参赛队伍的平均值。在视频搜索方面,提出了基于语义的视频搜索框架。分析了基于示例样本的搜索方式和基于语义概念的搜索方式,并分别采用基于语义相似性的方法和基于样本相关性的方法建立概念与语义查询的映射关系,实现了语义信息的自动提取,完成用户查询请求。在TRECVID 2009自动视频搜索评测中排名第一,充分验证了本文算法的有效性。

【Abstract】 With the development of network and multimedia technologies,video data is expanding rapidly. So how to effectively retrieve the interested video information from the large-scale dataset has become an urgent issue. Therefore, the Content-Based Video Retrieval has received great attention.In this paper, Content-Based Video Retrieval is studied at three different levels:low-level visual feature extraction,high-level semantic feature extraction and content-based video search,and some novel algorithms and frameworks are put forward.The major tasks in this paper are:In the low-level feature selection and extraction,a large number of visual features are extracted and analyzed in this paper, which can be summed up in four categories:key-point feature, texture feature, edge feature and color feature.First of all, the Bag-of-Visual-Words algorithm is proposed to effectively quantify the high-dimensional key-point feature. Then, the feature fusion strategy between SIFT and SURF is explored.At last, experiments are performed on TRECVID datasets to evaluate performance of different visual features.The experiment results show that the fusion between SIFT and SURF can significantly improve retrieval performance.In the high-level semantic feature extraction, a novel framework for video semantic concept detection is proposed, in which the color, Gabor wavelet, edge histogram and SIFT are used as visual descriptors and a support vector machine is trained for each feature as classifier. After decision-level fusion among classifiers, conceptual test results are acquired.Then various decision-level fusion strategies are put forward in this paper, and are evaluated in self-test experiment, which shows that the mix fusion strategy improves the retrieval performance best by mixing best fusion strategy in each concept.The evaluation results of TRECVID 2008 HLF show that the system’s overall detection performance is higher than the average detection performance of all the participants.In the video search,the semantic-based video search framework is proposed, in which the visual example based search approach and the semantic concept based search approach are analyzed.Additionally, the semantic similarity based method and the example correlation based method,are used respectively to establish the mapping relations between concepts and semantic queries, so that the semantic information could be extracted automatically and the video search task is completed.In the TRECVID 2009 automatic video search evaluation, the performance of our framework ranked the first place among all participants,fully verifying the effectiveness of our algorithm.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络