卷积神经网络及其研究进展

doi:10.3969/j.issn.1000-1565.2017.06.012

摘要/Abstract

摘要： 深度学习是目前机器学习领域最热门的研究方向,轰动全球的AlphaGo就是用深度学习算法训练的.卷积神经网络是用深度学习算法训练的一种模型,它在计算机视觉领域应用广泛,而且获得了巨大的成功.本文的主要目的有2个:一是帮助读者深入理解卷积神经网络,包括网络结构、核心概念、操作和训练;二是对卷积神经网络的近期研究进展进行综述,重点综述了激活函数、池化、训练及应用4个方面的研究进展.另外,还对其面临的挑战和热点研究方向进行了讨论.本文将为从事相关研究的人员提供很好的帮助.

关键词: 机器学习, 深度学习, 卷积神经网络, 计算机视觉, 训练算法

Abstract: Deep learning is the most popular research topic in the field of machine learning,AlphaGo which overwhelmingly impacts the world is trained with deep learning algorithms.Convolution neural network(CNN)is a model trained with deep learning algorithm,CNN is widely and successfully applied in computer version.The main purpose of this paper includes two aspects:one is to provide readers with some insights into CNN including its architecture,related concepts,operations and its training; the other is to present a comprehensive survey on research advances of CNN,mainly focusing on 4 aspects: activation functions,pooling,training and applications of CNN.Furthermore,the emerging challenges and hot research topics of CNN are also discussed.This paper can be very helpful to researchers in related field.

Key words: machine learning, deep learning, convolutional neural network, computer version, training algorithms

中图分类号:

TP181

翟俊海,张素芳,郝璞. 卷积神经网络及其研究进展[J]. 河北大学学报(自然科学版), 2017, 37(6): 640-651.

ZHAI Junhai,ZHANG Sufang,HAO Pu. Convolutional neural network and its research advances[J]. Journal of Hebei University (Natural Science Edition), 2017, 37(6): 640-651.

参考文献

[1] MCCULLOCH W S,PITTS W.A logical calculus of the ideas immanent in nervous activity[J].Bulletin of Mathematical Biology,1943,52(4):99-115.DOI:10.1007/BF02478259.DOI:10.1007/BF02478259.
[2] ROSENBLATT F.The perception: a probabilistic model for information storage and organization in the brain[J].Psychological Review,1958,65(6):386-408.DOI:10.1037/h0042519.
[3] MINSKY M,PAPERT S.Perceptrons[M].Oxford: MIT Press,1969.
[4] WERBOS P.Beyond regression: New tools for prediction and analysis in the behavioral sciences[D].Boston:PhD Thesis,Harvard University,1974.
[5] RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536.DOI:10.1038/323533a0.
[6] CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20(3):273-297.DOI: 10.1007/BF00994018.
[7] HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.DOI: 10.1126/science.1127647.
[8] LECUN Y,BENGIO Y,HINTON G E.Deep learning[J].Nature,2015,521:436-444.DOI:10.1038/nature14539.
[9] 余凯,贾磊,陈雨强,等.深度学习的昨天、今天和明天[J].计算机研究与发展,2013,50(9):1799-1804.DOI:10.7544/issn1000-1239.2013.20131180. YU K,JIA L,CHEN Y Q,et al.Deep learning: Yesterday,Today and Tomorrow [J].Journal of Computer Research and Development,2013,50(9):1799-1804.DOI:10.7544/issn1000-1239.2013.20131180.)
[10] LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition [J].Proceedings of the IEEE,1998,86(11):2278-2324.DOI: 10.1109/5.726791.
[11] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks [Z].International Conference on Neural Information Processing Systems,Lake Tahoe,Nevada,USA,2012.DOI::10.1145/3065386.
[12] MAAS A L,HANNUN A Y,NG A Y.Rectifier nonlinearities improve neural network acoustic models [Z] The 30 th International Conference on Machine Learning,Atlanta,Georgia,USA,2013.
[13] HE K M,ZHANG X Y,REN S Q,et al.Delving deep into rectifiers: surpassing human-level performance on imagenet classification [Z] IEEE International Conference on Computer Vision(ICCV),Santiago,Chile,2015.DOI:10.1109/ICCV.2015.123.
[14] LI J C,NG W W Y,YEUNG D S,et al.Bi-firing deep neural networks[J].International Journal of Machine Learning & Cybernetics,2014,5(1):73-83.DOI:10.1007/s13042-013-0198-9.
[15] GOODFELLOW I,BENGIO Y,COURVILLE A.Deep Learning[M].Massachusetts:MIT Press,2016.
[16] YU D,WANG H,CHEN P,et al.Mixed pooling for convolutional neural networks [Z].The 9th international conference on rough sets and knowledge technology,Shanghai,China,2014.
[17] GULCEHRE C,CHO K,PASCANU R,et al.Learned-norm pooling for deep feedforward and recurrent neural networks[Z].European Conference on Machine Learning and Knowledge Discovery in Databases,Nancy,France,2014.DOI:10.1007/978-3-662-44848-9_34.
[18] ESTRACH J B,SZLAM A,LECUN Y.Signal recovery from Pooling Representations [Z].The 31st International Conference on Machine Learning,Beijing,China,2014.
[19] HE K M,ZHANG X Y,REN S Q,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2015,37(9):1904-16.DOI: 10.1109/TPAMI.2015.2389824.
[20] XIE L,TIAN Q,WANG M,et al.Spatial pooling of heterogeneous features for image classification[J].IEEE Transactions on Image Processing,2014,23(5):1994-2008. DOI: 10.1109/TIP.2014.2310117.
[21] LEE H,KIM G,KIM H G,et al.Deep CNNs along the time axis with intermap pooling for robustness to spectral variations[J].IEEE Signal Processing Letters,2016,23(10):1310-1314.DOI: 10.1109/LSP.2016.2589962.
[22] PERLAZA S M,FAWAZ N,LASAULCE S,et al.From spectrum pooling to space pooling: opportunistic interference alignment in MIMO cognitive networks[J].IEEE Transactions on Signal Processing,2010,58(7):3728-3741.DOI: 10.1109/TSP.2010.2046084.
[23] WU HBB,GU X D.Towards dropout training for convolutional neural networks[J].Neural Networks,2015,71:1-10.DOI:10.1016/j.neunet.2015.07.007.
[24] WANG J Z,WANG W M,WANG R G,et al.CSPS: An adaptive pooling method for image classification[J].IEEE Transactions on Multimedia,2016,18(6):1000-1010.DOI: 10.1109/TMM.2016.2544099.
[25] SUN M L,SONG Z J,JIANG X H,et al.Learning pooling for convolutional neural network[J].Neurocomputing,2017,224:96-104.DOI:10.1016/j.neucom.2016.10.049.
[26] 黄文坚,唐源.TensorFlow实战[M].北京: 电子工业出版社,2017.
[27] 乐毅,王斌.深度学习-Caffe之经典模型详解与实战[M].北京: 电子工业出版社,2017.
[28] LIN M,CHEN Q,YAN.Network in network [J/OL].[2014-03-04].https://arxiv.org/abs/1312.4400v3.
[29] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale Image recognition [J/OL].[2014-09-15].http://arxiv.org/abs/1409.1556v2.
[30] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[Z].IEEE Conference on Computer Vision and Pattern Recognition(CVPR2015),Boston,MA,USA,2015.DOI:10.1109/CVPR.2015.7298594.
[31] IOFFE S,SZEGEDY C.Batch normalization: accelerating deep network training by reducing internal covariate shift[J/OL].[2015-03-02].https://arxiv.org/abs/1502.03167v3.
[32] SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[Z].IEEE Conference on Computer Vision and Pattern Recognition(CVPR2016),Las Vegas,NV,United States,2016.DOI:10.1109/CVPR.2016.308.
[33] SZEGEDY C,IOFFE S,VANHOUCKE V,et al.Inception-v4,inception-ResNet and the impact of residual connections on learning [J/OL].[2016-08-23].https://arxiv.org/abs/1602.07261.
[34] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[Z].IEEE Conference on Computer Vision and Pattern Recognition(CVPR2016),Las Vegas,NV,United States,2016.DOI: 10.1109/CVPR.2016.90.
[35] ZEILER M D,FERGUS R.Visualizing and Understanding Convolutional Networks[C] //Computer Vision-ECCV 2014,Lecture Notes in Computer Science,2014,8689:818-833.DOI:10.1007/978-3-319-10590-1_53.
[36] IANDOLA F,MOSKEWICZ M,KARAYEV S,et al.DenseNet: implementing efficient ConvNet descriptor pyramids [J/OL].[2014-04-07]. https://arxiv.org/abs/1404.1869.
[37] IANDOLA F N,HAN S,MOSKEWICZ M W,et al.SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size[J/OL].[2016-09-04].https://arxiv.org/abs/1602.07360v4.
[38] CHOPRA S,HADSELL R,LECUN Y.Learning a similarity metric discriminatively,with application to face verification[Z].IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2005),San Diego,California,2005.DOI:10.1109/CVPR.2005.202.
[39] HADSELL R,CHOPRA S,LECUN Y.Dimensionality reduction by learning an invariant mapping[Z].IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 2006),New York,2006.DOI: 10.1109/CVPR.2016.435.
[40] KINGMA D P,WELLING M.Auto-encoding variational Bayes [J/OL].[2014-05-01].https://arxiv.org/abs/1312.6114.
[41] HAYKIN S.神经网络与机器学习(影印版)[M].北京:机械工业出版社,2009.
[42] BOTTOU L.Large-scale machine learning with stochastic gradient descent [Z].19th International Conference on Computational Statistics,Paris France,2010.DOI:10.1007/978-3-7908-2604-3_16.DOI:10.1016/j.neucom.2016.11.046.
[43] LI S J,DOU Y,NIU X,et al.A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection[J].Neurocomputing,2017,230:48-59.
[44] MATHIEU M,HENAFF M,LECUN Y.Fast training of convolutional networks through PPTs [J/OL].[2014-03-06].https://arxiv.org/abs/arXiv:1312.5851v5.
[45] RAJESWAR M S,SANKAR A R,BALASUBRAMANIAM V N,et al.Scaling up the training of deep CNNs for human action recognition [Z].IEEE International Parallel and Distributed Processing Symposium Workshop(IPDPSW2015),Hyderabad,INDIA,2015.DOI: 10.1109/IPDPSW.2015.93.
[46] ZLATESKI A,LEE K,SEUNG H S.ZNN-A fast and scalable algorithm for training 3D convolutional networks on multi-core and many-core shared memory machines [Z].IEEE International Parallel and Distributed Processing Symposium(IPDPS2016),Chicago,IL,USA,2016.DOI:10.1109/IPDPS.2016.119.
[47] MORCEL R,EZZEDDINE M,AKKARY H.FPGA-based accelerator for deep convolutional neural networks for the SPARK environment [Z].IEEE International Conference on Smart Cloud(SMART-CLOUD2016),New York,USA,2016.DOI:10.1109/SmartCloud.2016.31.
[48] GIRSHICK R.Fast R-CNN [Z].2015 IEEE International Conference on Computer Vision(ICCV),Santiago, Chile,2015.
[49] REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN: towards real-time object detection with region proposal networks [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,online first,DOI: 10.1109/TPAMI.2016.2577031.
[50] GUSMÃO P P B.Fast training of convolutional neural networks via kernel recalling [J/OL].[2016-10-12].https://arxiv.org/abs/1610.03623.
[51] CONG J S,XIAO B J.Minimizing computation in convolutional neural networks [Z].The 24th International Conference on Artificial Neural Networks,Hamburg,Germany,2014.DOI:10.1007/978-3-319-11179-7_36.
[52] LAVIN A,GRAY S.Fast algorithms for convolutional neural networks [Z].2016 IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas,Nevada,USA,2016.DOI: 10.1109/CVPR.2016.435.
[53] ZHANG X Y,ZOU J H,HE K M,et al.Accelerating very deep convolutional networks for classification and detection[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2016,38(10):1943-1955.DOI:10.1109/TPAMI.2015.2502579.
[54] KORYTKOWSKI M,STASZEWSKI P,WOLDAN P.Fast computing framework for convolutional neural networks [Z]. 2016 IEEE International Conferences on Big Data and Cloud Computing(BDCloud),Social Computing and Networking(SocialCom),Sustainable Computing and Communications(SustainCom),Atlanta,GA,USA,2016.DOI:10.1109/BDCloud-SocialCom-SustainCom.2016.28.
[55] ZHENG S,VISHNU A,DING C.Accelerating Deep Learning with Shrinkage and Recall [J/OL].[2016-09-19].http://arxiv.org/abs/1605.01369v2.
[56] KIM J,KIM J,JANG G J,et al.Fast learning method for convolutional neural networks using extreme learning machine and its application to lane detection[J].Neural Networks,2017,87:109-121.DOI: 10.1016/j.neunet.2016.12.002.
[57] GRINSVEN M J J P V,GINNEKEN B V,HOYNG C B,et al.Fast convolutional neural network training using selective data sampling: Application to hemorrhage detection in color fundus images[J].IEEE Transactions on Medical Imaging,2016,35(5):1273-1284.DOI: 10.1109/TMI.2016.2526689.
[58] DENG J,DONG W,SOCHER R,et al.ImageNet: A large-scale hierarchical image database [Z].IEEE Conference on Computer Vision and Pattern Recognition,2009(CVPR 2009),Miami,FL,USA,2009.DOI: 10.1109/CVPR.2009.5206848.
[59] HAN S,MAO H,DALLY W J.Deep compression: compressing deep neural networks with pruning,trained quantization and huffman coding[Z].International Conference on Learning Representations 2016(ICLR 2016),San Juan,Puerto Ri 2016.DOI: 10.1109/TIP.2015.2510583.
[60] LI H,LI Y,PORIKLI F.DeepTrack: learning discriminative feature representations online for robust visual tracking[J].IEEE Transactions on Image Processing,2016,25(4):1834-1848.DOI: 10.1109/TIP.2015.2510583.
[61] FAN J L,XU W,WU Y,et al.Human tracking using convolutional neural networks[J].IEEE Transactions on Neural Networks,2010,21(10):1610-1623.DOI: 10.1109/TNN.2010.2066286.
[62] MA C,XU Y,NI B B,et al.When correlation filters meet convolutional neural networks for visual tracking[J].IEEE Signal Processing Letters,2016,23(10):1454-1458.DOI: 10.1109/LSP.2016.2601691.
[63] GIRSHICK R,DONAHUE J,DARRELL T,et al.Region-based convolutional networks for accurate object detection and segmentation[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2016,38(1):142-158.DOI: 10.1109/TPAMI.2015.2439281.
[64] GIRSHICK R.Fast R-CNN [Z].2015 IEEE International Conference on Computer Vision(ICCV2015),Santiago,Chile,2015.DOI: 10.1109/ICCV.2015.169.
[65] ZHANG X Y,ZOU J H,HE K M,et al.Accelerating very deep convolutional networks for classification and detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(10):1943-1955.DOI: 10.1109/TPAMI.2015.2439281.
[66] TOME D,MONTI F,BAROFFIO L,et al.Deep convolutional neural networks for pedestrian detection[J].Signal Processing: Image Communication,2016,47:482-489.DOI:10.1016/j.image.2016.05.007.
[67] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once: unified,real-time object detection[Z].2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR),Las Vegas,NV,United States,2016.DOI:10.1109/CVPR.2016.91.
[68] LIU W,ANGUELOV D,ERHAN D,et al.SSD: single shot multiBox detector[Z].European Conference on Computer Vision(ECCV 2016),Amsterdam,The Netherlands,2016.DOI: 10.1007/978-3-319-46448-0_2.
[69] SAINATH T N,KINGSBURY B,SAON G,et al.Deep convolutional neural networks for large-scale speech tasks[J].Neural Networks,2015,64:39-48.DOI:10.1016/j.neunet.2014.08.005.
[70] QIAN Y M,BI M X,TAN T,et al.Very deep convolutional neural networks for noise robust speech recognition[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2016,24(2):2263-2276.DOI: 10.1109/TASLP.2016.2602884.
[71] DONG C,LOY C C,HE K M,et al.Image super-resolution using deep convolutional networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(2):295-307.DOI: 10.1109/TPAMI.2015.2439281.