基于FPGA的卷积神经网络加速系统

doi:10.3969/j.issn.1000-1565.2019.01.017

河北大学学报(自然科学版) ›› 2019, Vol. 39 ›› Issue (1): 99-105.DOI: 10.3969/j.issn.1000-1565.2019.01.017

基于FPGA的卷积神经网络加速系统

李小燕¹, 张欣¹, 闫小兵¹, 任德亮¹, 李彦青², 傅长娟²

收稿日期:2018-09-02 出版日期:2019-01-25 发布日期:2019-01-25
通讯作者: 张欣(1966—),男,河北承德人,河北大学教授,主要从事机器视觉和图像处理方向的研究.E-mail: zhangxin@hbu.edu.cn
作者简介:李小燕(1994—),女,湖北十堰人,河北大学在读硕士研究生,主要从事忆阻器等新型电子器件集成和用于集成的逻辑控制嵌入式电路设计研究. E-mail:18612969742@163.com
基金资助:
国家自然科学基金资助项目(61674050)

Convolutional neural network acceleration system based on FPGA

LI Xiaoyan¹, ZHANG Xin¹, YAN Xiaobing¹, REN Deliang¹, LI Yanqing², FU Changjuan²

1. College of Telecommunications and Information Engineering, Hebei University, Baoding 071002, China; 2. Baoding Yonghong Foundry Machinery Factory, Baoding 072150, China

Received:2018-09-02 Online:2019-01-25 Published:2019-01-25

摘要/Abstract

摘要： 以在现场可编程门阵列(FPGA)上部署卷积神经网络为背景,提出了卷积神经网络在硬件上进行并行加速的方案.主要是通过分析卷积神经网络的结构特点,对数据的存储、读取、搬移以流水式的方式进行,对卷积神经网络中的每一层内的卷积运算单元进行展开,加速乘加操作. 基于FPGA特有的并行化结构和流水线的处理方式可以很好地提升运算效率,从对ciafr-10数据集的物体分类结果看,在不损失正确率的前提下,当时钟工作在800 MHz时,相较于中端的Intel处理器,可实现4倍左右的加速.卷积神经网络通过循环展开并行处理以及多级流水线的处理方式,可以加速卷积神经网络的前向传播,适合于实际工程任务中的需要.

关键词: 现场可编程门阵列(FPGA), 卷积神经网络, 并行化, 流水线, 分类, 加速

Abstract: In this paper, the convolutional neural network is deployed on the Field Programmable Gate Array(FPGA). As a background, a convolutional neural network is proposed to accelerate hardware. The paper analyzes the structural characteristics of convolutional neural networks, stores, reads, and moves data in a stream-style manner. Next, the convolution unit in each layer of the convolutional neural network is expanded to speed up the multiplication and addition operations. Based on the(FPGA)unique parallel structure, pipeline processing method can effectively improve the efficiency of the operation. From object classification results for the ciafr-10 dataset, at 800MHz operating frequency and without loss of accuracy, FPGA compared to General purpose processor can achieve 4 times speed up, Convolutional neural network through parallel process and multi-stage pipeline process can accelerate forward propagation of convolutional neural networks, being suitable for the demand of practical engineering tasks.

Key words: field programmable gate array(FPGA), convolutional neural network, parallelization, stream-style, classification, accelerate

中图分类号:

TP391

李小燕, 张欣, 闫小兵, 任德亮, 李彦青, 傅长娟. 基于FPGA的卷积神经网络加速系统[J]. 河北大学学报(自然科学版), 2019, 39(1): 99-105.

LI Xiaoyan, ZHANG Xin, YAN Xiaobing, REN Deliang, LI Yanqing, FU Changjuan. Convolutional neural network acceleration system based on FPGA[J]. Journal of Hebei University (Natural Science Edition), 2019, 39(1): 99-105.

参考文献

[1] LI H, FAN X, JIAO L, et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks[C] //IEEE, 26th International Conference on Field Programmable Logic and Applications, 2016:1-9. DOI: 10.1109/FPL. 2016. 7577308.
[2] ZHANG C, LI P, SUN G, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C] //ACM, Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015:161-170. DOI: 10.1145/2684746. 2689060.
[3] FARABET C, POULET C, HAN J Y, et al. CNP: An FPGA-based processor for convolutional networks[C] // IEEE, International Conference on Field Programmable Logic and Applications, 2009:32-37. DOI: 10.1109/FPL. 2009. 5272559.
[4] PERKO M, FAJFAR I, TUMA T, et al. Low-cost, high-performance CNN simulator implemented in FPGA[C] //IEEE, Proceedings of the 2000 6th IEEE International Workshop on Cellular Neural Networks and Their Applications, 2000:277-282. DOI: 10. 1109/CNNA. 2000. 876858.
[5] FARABET C, POULET C, LECUN Y. An FPGA-based stream processor for embedded real-time vision with Convolutional Networks[C] // IEEE, 12th International Conference on Computer Vision Workshops, 2009:878-885. DOI: 10. 1109/ICCVW. 2009. 5457611.
[6] SANKARADAS M, JAKKULA V, CADAMBI S, et al. A massively parallel coprocessor for convolutional neural networks[C] //IEEE, 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2009:53-60. DOI: 10.1109/ASAP.2009.25.
[7] MA Y, CAO Y, VRUDHULA S, et al. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks[C] //ACM, Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017:45-54. DOI:10.1145/3020078.3021736.
[8] CHANG J W, KANG S J. Optimizing FPGA-based convolutional neural networks accelerator for image super-resolution[C] //IEEE Press, Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018: 343-348. DOI: 10.1109/ASPDAC.2018.8297347.
[9] GARCIA C, DELAKIS M. A neural architecture for fast and robust face detection [C] //IEEE, 16th International Conference on Pattern Recognition. 2002, 2(11):44-47. DOI: 10.1109/ICPR.2002.1048232.
[10] DELAKIS M, GARCIA C. Text detection with convolutional neural networks[C] // Proceedings of the Third International Conference on Computer Vision Theory and Applications, 2008:290-294.
[11] ZHANG C, LI P, SUN G, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C] //ACM, Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015:161-170. DIO: 10.1145/2684746.2689060.
[12] PEEMEN M, SETIO A A A, MESMAN B, et al. Memory-centric accelerator design for convolutional neural networks[C] //Proceedings of the 2013 IEEE 31th International Conference on Computer Design, 2013:13-19. DOI: 10.1109/ICCD.2013.6657019.
[13] LI N, TAKAKI S, TOMIOKAY Y, et al. A multistage dataflow implementation of a deep convolutional neural network based on FPGA for high-speed object recognition[C] //IEEE, 2016 IEEE Southwest Symposium on Image Analysis and Interpretation, 2016:165-168. DOI: 10.1109/SSIAI.2016.7459201.
[14] BACIS M, NATALE G, SOZZO E D, et al. A pipelined and scalable dataflow implementation of convolutional neural networks on FPGA[C] //IEEE, Parallel and Distributed Processing Symposium Workshops, 2017:90-97. DOI:10.1109/IPDPSW.2017.44.
[15] NATALE G, BACIS M, SANTAMBROGIO M D. On how to design dataflow FPGA-based accelerators for convolutional neural networks[C] //IEEE, 2017 IEEE Computer Society Annual Symposium on VLSI, 2017:639-644. DOI: 10.1109/ISVLSI.2017.126.

基于FPGA的卷积神经网络加速系统

Convolutional neural network acceleration system based on FPGA

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	申爽,尹新明,席玉强. 王朗国家级自然保护区隐芒蝇属分类研究及1新种记述[J]. 河北大学学报(自然科学版), 2023, 43(3): 292-297.
[2]	杨昆,原嘉成,高聪,孙宇锋,路宇飞,常世龙,薛林雁. 基于改进的Faster R-CNN的息肉目标检测和分类方法[J]. 河北大学学报(自然科学版), 2023, 43(1): 103-112.
[3]	牛晓颖,牟晓晴,孙杰,张春江,赵志磊,田特. 基于营养成分和主成分分析的驴肉与其他肉类分类研究[J]. 河北大学学报(自然科学版), 2022, 42(5): 483-493.
[4]	魏明军,张鑫楠,刘亚志,周太宇. 一种基于SSA-BRF的网络入侵检测方法[J]. 河北大学学报(自然科学版), 2022, 42(5): 552-560.
[5]	许旺,曾清怀,唐戎,蔡余杰. 中国刃颚隐翅虫属1新纪录种(鞘翅目:隐翅虫科)[J]. 河北大学学报(自然科学版), 2022, 42(4): 421-423.
[6]	刘帅奇,雷钰,庞姣,赵淑欢,苏永钢,孙晨阳. 基于生成对抗网络的SAR图像去噪[J]. 河北大学学报(自然科学版), 2022, 42(3): 306-313.
[7]	哈艳,孟翔杰,田俊峰. 基于近邻样本联合学习模型的疟疾识别算法[J]. 河北大学学报(自然科学版), 2022, 42(2): 208-216.
[8]	卢艳敏,裴素俭,刘海鹏,马惠钦. 石蜈蚣属1新种(石蜈蚣目:石蜈蚣科)记述及分子系统学分析[J]. 河北大学学报(自然科学版), 2022, 42(1): 60-67.
[9]	王芳,张玉,张春红,夏红岩. 基于GA-BP算法的太赫兹波鉴定苜蓿草品种[J]. 河北大学学报(自然科学版), 2021, 41(6): 666-671.
[10]	张锋,余锟. 中国西藏2种锥螲蟷属蜘蛛记述(原蛛下目:盘腹蛛科)[J]. 河北大学学报(自然科学版), 2021, 41(5): 581-586.
[11]	潘昭,魏佳烁,任国栋. 中国蜂大花蚤属分类记述(鞘翅目:大花蚤科)[J]. 河北大学学报(自然科学版), 2021, 41(3): 285-289.
[12]	李凯,曹可凡,沈皓凝. 基于步态序列的跨视角步态识别[J]. 河北大学学报(自然科学版), 2021, 41(3): 311-320.
[13]	李凯,李洁. 基于pinball损失的一对一加权孪生支持向量机[J]. 河北大学学报(自然科学版), 2020, 40(6): 647-656.
[14]	董斌,王云涛,贾立男,王娅南. 改进的PSO优化SVM的病理图像分类算法[J]. 河北大学学报(自然科学版), 2020, 40(5): 543-551.
[15]	白瑞蒲,刘培,张艳. 低维半结合3-代数的分类[J]. 河北大学学报(自然科学版), 2020, 40(2): 113-118.