Toward Multi-FPGA Acceleration of the Neural Networks

Saman Biookaghazadeh; Pravin Kumar Ravi; Ming Zhao

doi:10.1145/3432816

Loading next page...

References (35)

Yufei Ma, Minkyu Kim, Yu Cao, Sarma Vrudhula, Jae-Sun Seo (2017)
End-to-end scalable FPGA accelerator for deep residual networks
Proceedings of the 2017 IEEE International Symposium on Circuits and Systems (ISCAS’17). IEEE, 2017
Saman Biookaghazadeh, Ming Zhao, Fengbo Ren (2018)
Are FPGAs suitable for edge computing? In Proceedings of the USENIX Workshop on Hot Topics in Edge Computing (HotEdge’18)
Are FPGAs suitable for edge computing? In Proceedings of the USENIX Workshop on Hot Topics in Edge Computing (HotEdge’18).
Weiwen Jiang, E. Sha, Xinyi Zhang, Lei Yang, Qingfeng Zhuge, Yiyu Shi, J. Hu (2019)
Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference
ACM Transactions on Embedded Computing Systems (TECS), 18
Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, J. Cong (2016)
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster
Proceedings of the 2016 International Symposium on Low Power Electronics and Design
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, S. Guadarrama, Trevor Darrell (2014)
Caffe: Convolutional Architecture for Fast Feature Embedding
Proceedings of the 22nd ACM international conference on Multimedia
Saman Biookaghazadeh, Fengbo Ren, Ming Zhao (2018)
Are FPGAs Suitable for Edge Computing?
ArXiv, abs/1804.06404
Intel
n
Fog Reference Unit. Retrieved February 22, 2021 from https://www.intel.com/content/www/us/en/internet-of-things/fog-reference-design-overview.html, 22
U. Aydonat, Shane O'Connell, D. Capalija, A. Ling, Gordon Chiu (2017)
An OpenCL™ Deep Learning Accelerator on Arria 10
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Martín Abadi, P. Barham, Jianmin Chen, Z. Chen, Andy Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, Sherry Moore, D. Murray, Benoit Steiner, P. Tucker, Vijay Vasudevan, P. Warden, M. Wicke, Yuan Yu, Xiaoqiang Zhang (2016)
TensorFlow: A system for large-scale machine learning
Intel FPGA SDK for Open CL Programming Guide
G. Litjens, Thijs Kooi, B. Bejnordi, A. Setio, F. Ciompi, Mohsen Ghafoorian, J. Laak, B. Ginneken, C. Sánchez (2017)
A survey on deep learning in medical image analysis
Medical image analysis, 42
Pytorch
n
Home Page. Retrieved February 22, 2021 from https://pytorch.org, 22
Yufei Ma, Minkyu Kim, Yu Cao, S. Vrudhula, Jae-sun Seo (2017)
End-to-end scalable FPGA accelerator for deep residual networks
2017 IEEE International Symposium on Circuits and Systems (ISCAS)
Kai Arulkumaran, M. Deisenroth, Miles Brundage, A. Bharath (2017)
Deep Reinforcement Learning: A Brief Survey
IEEE Signal Processing Magazine, 34
Shuiwang Ji, W. Xu, Ming Yang, Kai Yu
Ieee Transactions on Pattern Analysis and Machine Intelligence 1 3d Convolutional Neural Networks for Human Action Recognition
Dong Wang, Ke Xu, Diankun Jiang (2017)
PipeCNN: An OpenCL-based open-source FPGA accelerator for convolution neural networks
Proceedings of the 2017 International Conference on Field Programmable Technology (ICFPT’17). IEEE, 2017
Dong Wang, Ke Xu, Diankun Jiang (2017)
PipeCNN: An OpenCL-based open-source FPGA accelerator for convolution neural networks
2017 International Conference on Field Programmable Technology (ICFPT)
Chen Zhang, Guangyu Sun, Zhenman Fang, Peipei Zhou, Peichen Pan, Jason Cong (2018)
Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38
Daniel Maturana, S. Scherer (2015)
VoxNet: A 3D Convolutional Neural Network for real-time object recognition
2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Junzhong Shen, Y. Huang, Zelong Wang, Yuran Qiao, M. Wen, Chunyuan Zhang (2018)
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Norman Jouppi, C. Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Taraneh Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Ho, Doug Hogberg, John Hu, R. Hundt, Dan Hurt, Julian Ibarz, A. Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, R. Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, Doe Yoon (2017)
In-datacenter performance analysis of a tensor processing unit
2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
Lin Sun, K. Jia, D. Yeung, Bertram Shi (2015)
Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks
2015 IEEE International Conference on Computer Vision (ICCV)
Du Tran, Lubomir Bourdev, R. Fergus, L. Torresani, Manohar Paluri (2014)
Learning Spatiotemporal Features with 3D Convolutional Networks
2015 IEEE International Conference on Computer Vision (ICCV)
Andrew Boutros, Sadegh Yazdanshenas, Vaughn Betz (2018)
You cannot improve what you do not measure: FPGA vs
ASIC efficiency gaps for convolutional neural network inference. ACM Transactions on Reconfigurable Technology and Systems, 11
(2018)
A Configurable Cloud-Scale DNN Processor for Real-Time AI
2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)
Andrew Boutros, S. Yazdanshenas, Vaughn Betz (2018)
You Cannot Improve What You Do not Measure
ACM Transactions on Reconfigurable Technology and Systems (TRETS), 11
Kartik Hegde, R. Agrawal, Yulun Yao, Christopher Fletcher (2018)
Morph: Flexible Acceleration for 3D CNN-Based Video Understanding
2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Naveen Suda, V. Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, S. Vrudhula, Jae-sun Seo, Yu Cao (2016)
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, J. Cong (2015)
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Zhiqiang Liu, P. Chow, Jinwei Xu, Jingfei Jiang, Y. Dou, Jie Zhou (2019)
A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs
Electronics
Andrew Lavin, S. Gray (2015)
Fast Algorithms for Convolutional Neural Networks
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Yufei Ma, Yu Cao, S. Vrudhula, Jae-sun Seo (2018)
Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 26
Chen Zhang, Zhenman Fang, Peipei Zhou, P. Pan, J. Cong (2016)
Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks
2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
S. Winograd (1980)
On Multiplication of Polynomials Modulo a Polynomial
SIAM J. Comput., 9
Shuiwang Ji, Wei Xu, Ming Yang, Kai Yu (2012)
3D convolutional neural networks for human action recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence, 35

Publisher: Association for Computing Machinery
Copyright: Copyright © 2021 ACM
ISSN: 1550-4832
eISSN: 1550-4840
DOI: 10.1145/3432816
Publisher site: See Article on Publisher Site

Abstract

High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.

Journal

ACM Journal on Emerging Technologies in Computing Systems (JETC) – Association for Computing Machinery

Published: Apr 29, 2021

Keywords: FPGA

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Toward Multi-FPGA Acceleration of the Neural Networks

Toward Multi-FPGA Acceleration of the Neural Networks

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Toward Multi-FPGA Acceleration of the Neural Networks

Toward Multi-FPGA Acceleration of the Neural Networks

References (35)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies