节点文献
一种多教师模型知识蒸馏深度神经网络模型压缩算法
A multi-teacher knowledge distillation model compression algorithm for deep neural network
【摘要】 为了能将庞大的深度学习模型压缩后部署到算力和存储能力有限的设备中时尽可能减小精度损失,对知识蒸馏模型压缩方法进行研究,提出了一种改进后带筛选的多教师模型知识蒸馏压缩算法。利用多教师模型的集成优势,以各教师模型的预测交叉熵为筛选的量化标准筛选出表现更好的教师模型对学生进行指导,并让学生模型从教师模型的特征层开始提取信息,同时让表现更好的教师模型在指导中更具有话语权。在CIFAR100数据集上的VGG13等分类模型实验结果表明,与其他压缩算法相比在最终得到的学生模型大小相同的情况下,精度上有着更好的表现。
【Abstract】 In order to minimize the accuracy loss when compressing huge deep learning models and deploying them to devices with limited computing power and storage capacity, a knowledge distillation model compression method is investigated and an improved multi-teacher model knowledge distillation compression algorithm with filtering is proposed. Taking advantage of the integration of multi-teacher models, the better-performing teacher models are screened for student instruction using the predicted cross-entropy of each teacher model as the quantitative criterion for screening, and the student models are allowed to extract information starting from the feature layer of the teacher models, while the better-performing teacher models are allowed to have more say in the instruction. The experimental results of classification models such as VGG13 on the CIFAR100 dataset show that the multi-teacher model compression method in this paper has better performance in terms of accuracy compared with other compression algorithms with the same size of the final obtained student models.
【Key words】 model compression; distillation of knowledge; multi-teacher model; cross entropy; feature layer;
- 【文献出处】 电子技术应用 ,Application of Electronic Technique , 编辑部邮箱 ,2023年08期
- 【分类号】TP183;TP391.41
- 【下载频次】23