Access the full text.
Sign up today, get DeepDyve free for 14 days.
Yufei Ma, Minkyu Kim, Yu Cao, Sarma Vrudhula, Jae-Sun Seo (2017)
End-to-end scalable FPGA accelerator for deep residual networksProceedings of the 2017 IEEE International Symposium on Circuits and Systems (ISCAS’17). IEEE, 2017
Saman Biookaghazadeh, Ming Zhao, Fengbo Ren (2018)
Are FPGAs suitable for edge computing? In Proceedings of the USENIX Workshop on Hot Topics in Edge Computing (HotEdge’18)Are FPGAs suitable for edge computing? In Proceedings of the USENIX Workshop on Hot Topics in Edge Computing (HotEdge’18).
Weiwen Jiang, E. Sha, Xinyi Zhang, Lei Yang, Qingfeng Zhuge, Yiyu Shi, J. Hu (2019)
Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN InferenceACM Transactions on Embedded Computing Systems (TECS), 18
Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, J. Cong (2016)
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA ClusterProceedings of the 2016 International Symposium on Low Power Electronics and Design
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, S. Guadarrama, Trevor Darrell (2014)
Caffe: Convolutional Architecture for Fast Feature EmbeddingProceedings of the 22nd ACM international conference on Multimedia
Saman Biookaghazadeh, Fengbo Ren, Ming Zhao (2018)
Are FPGAs Suitable for Edge Computing?ArXiv, abs/1804.06404
Intel
nFog Reference Unit. Retrieved February 22, 2021 from https://www.intel.com/content/www/us/en/internet-of-things/fog-reference-design-overview.html, 22
U. Aydonat, Shane O'Connell, D. Capalija, A. Ling, Gordon Chiu (2017)
An OpenCL™ Deep Learning Accelerator on Arria 10Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Martín Abadi, P. Barham, Jianmin Chen, Z. Chen, Andy Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, Sherry Moore, D. Murray, Benoit Steiner, P. Tucker, Vijay Vasudevan, P. Warden, M. Wicke, Yuan Yu, Xiaoqiang Zhang (2016)
TensorFlow: A system for large-scale machine learning
Intel FPGA SDK for Open CL Programming Guide
G. Litjens, Thijs Kooi, B. Bejnordi, A. Setio, F. Ciompi, Mohsen Ghafoorian, J. Laak, B. Ginneken, C. Sánchez (2017)
A survey on deep learning in medical image analysisMedical image analysis, 42
Pytorch
nHome Page. Retrieved February 22, 2021 from https://pytorch.org, 22
Yufei Ma, Minkyu Kim, Yu Cao, S. Vrudhula, Jae-sun Seo (2017)
End-to-end scalable FPGA accelerator for deep residual networks2017 IEEE International Symposium on Circuits and Systems (ISCAS)
Kai Arulkumaran, M. Deisenroth, Miles Brundage, A. Bharath (2017)
Deep Reinforcement Learning: A Brief SurveyIEEE Signal Processing Magazine, 34
Shuiwang Ji, W. Xu, Ming Yang, Kai Yu
Ieee Transactions on Pattern Analysis and Machine Intelligence 1 3d Convolutional Neural Networks for Human Action Recognition
Dong Wang, Ke Xu, Diankun Jiang (2017)
PipeCNN: An OpenCL-based open-source FPGA accelerator for convolution neural networksProceedings of the 2017 International Conference on Field Programmable Technology (ICFPT’17). IEEE, 2017
Dong Wang, Ke Xu, Diankun Jiang (2017)
PipeCNN: An OpenCL-based open-source FPGA accelerator for convolution neural networks2017 International Conference on Field Programmable Technology (ICFPT)
Chen Zhang, Guangyu Sun, Zhenman Fang, Peipei Zhou, Peichen Pan, Jason Cong (2018)
Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38
Daniel Maturana, S. Scherer (2015)
VoxNet: A 3D Convolutional Neural Network for real-time object recognition2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Junzhong Shen, Y. Huang, Zelong Wang, Yuran Qiao, M. Wen, Chunyuan Zhang (2018)
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGAProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Norman Jouppi, C. Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Taraneh Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Ho, Doug Hogberg, John Hu, R. Hundt, Dan Hurt, Julian Ibarz, A. Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, R. Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, Doe Yoon (2017)
In-datacenter performance analysis of a tensor processing unit2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
Lin Sun, K. Jia, D. Yeung, Bertram Shi (2015)
Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks2015 IEEE International Conference on Computer Vision (ICCV)
Du Tran, Lubomir Bourdev, R. Fergus, L. Torresani, Manohar Paluri (2014)
Learning Spatiotemporal Features with 3D Convolutional Networks2015 IEEE International Conference on Computer Vision (ICCV)
Andrew Boutros, Sadegh Yazdanshenas, Vaughn Betz (2018)
You cannot improve what you do not measure: FPGA vsASIC efficiency gaps for convolutional neural network inference. ACM Transactions on Reconfigurable Technology and Systems, 11
(2018)
A Configurable Cloud-Scale DNN Processor for Real-Time AI2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)
Andrew Boutros, S. Yazdanshenas, Vaughn Betz (2018)
You Cannot Improve What You Do not MeasureACM Transactions on Reconfigurable Technology and Systems (TRETS), 11
Kartik Hegde, R. Agrawal, Yulun Yao, Christopher Fletcher (2018)
Morph: Flexible Acceleration for 3D CNN-Based Video Understanding2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Naveen Suda, V. Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, S. Vrudhula, Jae-sun Seo, Yu Cao (2016)
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural NetworksProceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, J. Cong (2015)
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural NetworksProceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Zhiqiang Liu, P. Chow, Jinwei Xu, Jingfei Jiang, Y. Dou, Jie Zhou (2019)
A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAsElectronics
Andrew Lavin, S. Gray (2015)
Fast Algorithms for Convolutional Neural Networks2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Yufei Ma, Yu Cao, S. Vrudhula, Jae-sun Seo (2018)
Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGAIEEE Transactions on Very Large Scale Integration (VLSI) Systems, 26
Chen Zhang, Zhenman Fang, Peipei Zhou, P. Pan, J. Cong (2016)
Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
S. Winograd (1980)
On Multiplication of Polynomials Modulo a PolynomialSIAM J. Comput., 9
Shuiwang Ji, Wei Xu, Ming Yang, Kai Yu (2012)
3D convolutional neural networks for human action recognitionIEEE Transactions on Pattern Analysis and Machine Intelligence, 35
High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.
ACM Journal on Emerging Technologies in Computing Systems (JETC) – Association for Computing Machinery
Published: Apr 29, 2021
Keywords: FPGA
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.